用Selenium进行Web抓取
问题内容:
我正尝试在此网站上搜索company names, code, industry, sector, mkt cap, etc
硒表中的清单。我是新手,并编写了以下代码:
path_to_chromedriver = r'C:\Documents\chromedriver'
browser = webdriver.Chrome(executable_path=path_to_chromedriver)
url = r'http://sgx.com/wps/portal/sgxweb/home/company_disclosure/stockfacts'
browser.get(url)
time.sleep(15)
output = browser.page_source
print(output)
但是,我可以获取以下标签,但不能获取其中的数据。
<div class="table-wrapper results-display">
<table>
<thead>
<tr></tr>
</thead>
<tbody></tbody>
</table>
</div>
<div class="pager results-display"></div>
我以前也尝试过BS4进行刮擦,但失败了。任何帮助深表感谢。
问题答案:
该 结果是在一个iframe -切换到它,然后得到.page_source
:
iframe = driver.find_element_by_css_selector("#mainContent iframe")
driver.switch_to.frame(iframe)
我还要添加一个等待表加载的方法:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
# locate and switch to the iframe
iframe = driver.find_element_by_css_selector("#mainContent iframe")
driver.switch_to.frame(iframe)
# wait for the table to load
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.companyName')))
print(driver.page_source)