熊猫:无法从DataFrame列中剥离HTML标签


问题内容

我有一个text包含HTML列的Pandas DataFrame 。我只想获取文本,也就是剥离标签。我尝试如下操作:

from bs4 import BeautifulSoup
result_df['text'] = BeautifulSoup(result_df['text']).get_text()

但是,我最终收到此错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我做错了什么?

谢谢!


问题答案:

尝试这个:

from bs4 import BeautifulSoup
result_df['text'] = [BeautifulSoup(text).get_text() for text in result_df['text'] ]