熊猫:无法从DataFrame列中剥离HTML标签
问题内容:
我有一个text
包含HTML列的Pandas DataFrame 。我只想获取文本,也就是剥离标签。我尝试如下操作:
from bs4 import BeautifulSoup
result_df['text'] = BeautifulSoup(result_df['text']).get_text()
但是,我最终收到此错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我做错了什么?
谢谢!
问题答案:
尝试这个:
from bs4 import BeautifulSoup
result_df['text'] = [BeautifulSoup(text).get_text() for text in result_df['text'] ]