如何实现python在xml标签之间查找值?
问题内容:
我正在使用Google网站检索天气信息,我想在XML标签之间查找值。以下代码为我提供了城市的天气状况,但是我无法获取其他参数,例如温度,并且如果可能的话,请解释代码中暗含的分割函数的工作:
import urllib
def getWeather(city):
#create google weather api url
url = "http://www.google.com/ig/api?weather=" + urllib.quote(city)
try:
# open google weather api url
f = urllib.urlopen(url)
except:
# if there was an error opening the url, return
return "Error opening url"
# read contents to a string
s = f.read()
# extract weather condition data from xml string
weather = s.split("<current_conditions><condition data=\"")[-1].split("\"")[0]
# if there was an error getting the condition, the city is invalid
if weather == "<?xml version=":
return "Invalid city"
#return the weather condition
return weather
def main():
while True:
city = raw_input("Give me a city: ")
weather = getWeather(city)
print(weather)
if __name__ == "__main__":
main()
谢谢
问题答案:
好吧,这是针对您的 特定 情况的非完整解析器解决方案:
import urllib
def getWeather(city):
''' given city name or postal code,
return dictionary with current weather conditions
'''
url = 'http://www.google.com/ig/api?weather='
try:
f = urllib.urlopen(url + urllib.quote(city))
except:
return "Error opening url"
s = f.read().replace('\r','').replace('\n','')
if '<problem' in s:
return "Problem retreaving weather (invalid city?)"
weather = s.split('</current_conditions>')[0] \
.split('<current_conditions>')[-1] \
.strip('</>')
wdict = dict(i.split(' data="') for i in weather.split('"/><'))
return wdict
和使用示例:
>>> weather = getWeather('94043')
>>> weather
{'temp_f': '67', 'temp_c': '19', 'humidity': 'Humidity: 61%', 'wind_condition': 'Wind: N at 21 mph', 'condition': 'Sunny', 'icon': '/ig/images/weather/sunny.gif'}
>>> weather['humidity']
'Humidity: 61%'
>>> print '%(condition)s\nTemperature %(temp_c)s C (%(temp_f)s F)\n%(humidity)s\n%(wind_condition)s' % weather
Sunny
Temperature 19 C (67 F)
Humidity: 61%
Wind: N at 21 mph
PS。请注意,对Google输出格式进行相当琐碎的更改将打破这种情况-
例如,如果他们要在标签或属性之间添加额外的空格或制表符。他们避免减小HTTP响应的大小。但是,如果这样做,我们必须熟悉正则表达式和re.split()
PPS。str.split(sep)
文档中解释了工作原理,以下是摘录: 使用sep作为分隔符字符串,返回字符串中单词的列表。…
sep参数可能包含多个字符(例如‘1 <> 2 <>
3’.split(’<>’)返回[‘1’,‘2’,‘3’])
。因此'text1<tag>text2</tag>text3'.split('</tag>')
给我们['text1<tag>text2', 'text3']
,然后[0]
选取第一个元素'text1<tag>text2'
,然后在处拆分并选取“
text2”,其中包含我们感兴趣的数据。确实很陈旧。