pandas.read_html只返回未滚动的HTML页面上存在的表数据。因此,本应通过滚动返回的表数据不在返回的数据帧列表中。如何使它仅在遵循给定步骤之后才返回数据帧列表:
G 210
我的守则:
import pandas as pd
url = 'https://finance.yahoo.com/quote/GOOG/history?period1=1566844200&period2=1598466600&interval=1d&filter=history&frequency=1d'
dfs = pd.read_html(url)
print(dfs[0])实际结果:
Date Open High Low Close* Adj Close** Volume
0 Aug 26, 2020 1608.00 1659.22 1603.60 1652.38 1652.38 3993400
1 Aug 25, 2020 1582.07 1611.62 1582.07 1608.22 1608.22 2247100
2 Aug 24, 2020 1593.98 1614.17 1580.57 1588.20 1588.20 1409900
3 Aug 21, 2020 1577.03 1597.72 1568.01 1580.42 1580.42 1446500
4 Aug 20, 2020 1543.45 1585.87 1538.20 1581.75 1581.75 1706900
... ... ... ... ... ... ... ...
96 Apr 09, 2020 1224.08 1225.57 1196.73 1211.45 1211.45 2175400
97 Apr 08, 2020 1206.50 1219.07 1188.16 1210.28 1210.28 1975100
98 Apr 07, 2020 1221.00 1225.00 1182.23 1186.51 1186.51 2387300
99 Apr 06, 2020 1138.00 1194.66 1130.94 1186.92 1186.92 2664700
100 *CPA *CPA *CPA *CPA *CPA *CPA *CPA
[101 rows × 7 columns]预期结果:
Date Open High Low Close* Adj Close** Volume
0 Aug 26, 2020 1608.00 1659.22 1603.60 1652.38 1652.38 3993400
1 Aug 25, 2020 1582.07 1611.62 1582.07 1608.22 1608.22 2247100
2 Aug 24, 2020 1593.98 1614.17 1580.57 1588.20 1588.20 1409900
3 Aug 21, 2020 1577.03 1597.72 1568.01 1580.42 1580.42 1446500
4 Aug 20, 2020 1543.45 1585.87 1538.20 1581.75 1581.75 1706900
... ... ... ... ... ... ... ...
249 Apr 30, 2019 1224.08 1225.57 1196.73 1211.45 1211.45 2175400
250 Apr 29, 2019 1206.50 1219.07 1188.16 1210.28 1210.28 1975100
251 Apr 27, 2019 1221.00 1225.00 1182.23 1186.51 1186.51 2387300
252 Aug 26, 2019 1138.00 1194.66 1130.94 1186.92 1186.92 2664700
253 *CPA *CPA *CPA *CPA *CPA *CPA *CPA
[253 rows × 7 columns]发布于 2020-10-26 20:10:40
熊猫方法读取HTML只加载在开始时填充的HTML,您必须使用selenium这样的方法,它实际上是通过在真正的浏览器中打开页面来实现的,然后可以使实例向下滚动,直到得到所有的数据。
像这样的东西应该会有帮助:
from selenium import webdriver
import time
browser=webdriver.Firefox()
browser.get("urlhere")
browser.execute_script("window.scrollTo(0,document.body.scrollHeight)")这将加载整个页面,然后可以使用基本的selium代码从其中获取条目,如
elems= browser.find_elements_by_class_name("thelementsyouwant")发布于 2020-10-26 20:02:56
如果您想要的只是价格数据,我建议您使用yfinance。
import yfinance as yf
goog = yf.Ticker("GOOG")
hist = goog.history(period="max")https://stackoverflow.com/questions/64543948
复制相似问题