如何实现python在xml标签之间查找值?


问题内容

我正在使用Google网站检索天气信息,我想在XML标签之间查找值。以下代码为我提供了城市的天气状况,但是我无法获取其他参数,例如温度,并且如果可能的话,请解释代码中暗含的分割函数的工作:

import urllib

def getWeather(city):

    #create google weather api url
    url = "http://www.google.com/ig/api?weather=" + urllib.quote(city)

    try:
        # open google weather api url
        f = urllib.urlopen(url)
    except:
        # if there was an error opening the url, return
        return "Error opening url"

    # read contents to a string
    s = f.read()

    # extract weather condition data from xml string
    weather = s.split("<current_conditions><condition data=\"")[-1].split("\"")[0]

    # if there was an error getting the condition, the city is invalid


    if weather == "<?xml version=":
        return "Invalid city"

    #return the weather condition
    return weather

def main():
    while True:
        city = raw_input("Give me a city: ")
        weather = getWeather(city)
        print(weather)

if __name__ == "__main__":
    main()

谢谢


问题答案:

好吧,这是针对您的 特定 情况的非完整解析器解决方案:

import urllib

def getWeather(city):
    ''' given city name or postal code,
        return dictionary with current weather conditions
    '''
    url = 'http://www.google.com/ig/api?weather='
    try:
        f = urllib.urlopen(url + urllib.quote(city))
    except:
        return "Error opening url"
    s = f.read().replace('\r','').replace('\n','')
    if '<problem' in s:
        return "Problem retreaving weather (invalid city?)"

    weather = s.split('</current_conditions>')[0]  \
               .split('<current_conditions>')[-1]  \
               .strip('</>')                       
    wdict = dict(i.split(' data="') for i in weather.split('"/><'))
    return wdict

和使用示例:

>>> weather = getWeather('94043')
>>> weather
{'temp_f': '67', 'temp_c': '19', 'humidity': 'Humidity: 61%', 'wind_condition': 'Wind: N at 21 mph', 'condition': 'Sunny', 'icon': '/ig/images/weather/sunny.gif'}
>>> weather['humidity']
'Humidity: 61%'
>>> print '%(condition)s\nTemperature %(temp_c)s C (%(temp_f)s F)\n%(humidity)s\n%(wind_condition)s' % weather
Sunny
Temperature 19 C (67 F)
Humidity: 61%
Wind: N at 21 mph

PS。请注意,对Google输出格式进行相当琐碎的更改将打破这种情况-
例如,如果他们要在标签或属性之间添加额外的空格或制表符。他们避免减小HTTP响应的大小。但是,如果这样做,我们必须熟悉正则表达式和re.split()

PPS。str.split(sep)文档中解释了工作原理,以下是摘录: 使用sep作为分隔符字符串,返回字符串中单词的列表。…
sep参数可能包含多个字符(例如‘1 <> 2 <>
3’.split(’<>’)返回[‘1’,‘2’,‘3’])

。因此'text1<tag>text2</tag>text3'.split('</tag>')给我们['text1<tag>text2', 'text3'],然后[0]选取第一个元素'text1<tag>text2',然后在处拆分并选取“
text2”,其中包含我们感兴趣的数据。确实很陈旧。