在尝试从github端点导入流行的UCL bank marketing dataset时,我遇到了一些问题。read语句未正确获取17列的数据集。我检查了分隔符和标题,但我不确定如何更正索引。
# URL endoint
url = 'https://raw.githubusercontent.com/ThamuMnyulwa/bankMarketing/main/bank-additional-full.csv'
column_names = ["age","job","marital","education","default","balance","housing","loan","contact","day","month"
,"duration","campaign","pdays","previous","poutcome", "y"]
raw_dataset = pd.read_csv(url, names=column_names,
na_values='?',sep=';'
, skipinitialspace=False, index_col=None)
取而代之的是,它给了我这样的东西:
如何使用pandas read_csv
从URL正确导入数据集(link)?
发布于 2021-11-20 10:53:26
你不需要设置头部。它已经在CSV中附带了头文件。你的代码看起来很奇怪,是因为你的标头列表中缺少了3个值,这就是为什么它被偏移了3。
发布于 2021-11-20 11:02:22
以下语法显示了一致的结果:
raw_dataset = pd.read_csv(url, sep=";")
https://stackoverflow.com/questions/70048799
复制相似问题