我希望将URL "https://www.treasury.gov/ofac/downloads/sdn.csv“中提供的数据直接加载到一个名为sdn的表中。我唯一想做的更改是将所有具有此值的列的所有'-0-‘替换为'’。
我试着用熊猫来做这件事,但我的做法看上去并不干净。
import requests
import pandas as pd
sdnURL = "https://www.treasury.gov/ofac/downloads/sdn.csv"
altURL = "https://www.treasury.gov/ofac/downloads/alt.csv"
addURL = "https://www.treasury.gov/ofac/downloads/add.csv"
sdnCommentsURL = "https://www.treasury.gov/ofac/downloads/sdn_comments.csv"
sdnHeader = ["sdn_id", "sdn_name", "sdn_type", "program", "title", "call_sign", "vessel_type", "tonnage", "gross_tonnage", "vessel_flag", "vessel_owner", "remarks"]
altHeader = ["sdn_id", "alt_id", "alt_type", "alt_name", "remarks"]
addHeader = ["sdn_id", "address_id", "address", "city_state_post", "country", "remarks"]
sdnCommentsHeader = ["sdn_id", "remarks"]
sdn = pd.read_csv(sdnURL, names = sdnHeader, header = None)
alt = pd.read_csv(altURL, names = altHeader, header = None)
add = pd.read_csv(addURL, names = addHeader, header = None)
sdnComments = pd.read_csv(sdnCommentsURL, names = sdnCommentsHeader, header = None)
sdn.to_csv('sdn.csv', index = False)
alt.to_csv('alt.csv', index = False)
add.to_csv('add.csv', index = False)
sdnComments.to_csv('sdnComments.csv', index = False)
此外,我计划将csv加载到mysql表中。在我的方法中我有两个顾虑-
最终编辑:感谢@Jimmar给出的答案,我最终写了这样的代码-
import requests
import pandas as pd
files = {
"sdn" : ["sdn_id", "sdn_name", "sdn_type", "program", "title", "call_sign", "vessel_type", "tonnage", "gross_tonnage", "vessel_flag", "vessel_owner", "remarks"],
"alt" : ["sdn_id", "alt_id", "alt_type", "alt_name", "remarks"],
"add" : ["sdn_id", "address_id", "address", "city_state_post", "country", "remarks"],
"sdn_comments" : ["sdn_id", "remarks"]
}
def fetch_csv(file, headers):
df = pd.read_csv("https://www.treasury.gov/ofac/downloads/"+file+".csv", names=headers, header=None)
df = df.replace('-0- ', '')
df.to_csv(file+'.csv', index=False)
for file, headers in files.items():
fetch_csv(file, headers)
发布于 2020-08-18 21:53:28
你可以用这种方式组织你的代码(我只做2)
import requests
import pandas as pd
def fetch_csv(url, headers, file_name):
df = pd.read_csv(url, names=headers, header=None)
df = df.replace('-0- ', '')
df.to_csv(file_name, index=False)
sources = [
{
"url": "https://www.treasury.gov/ofac/downloads/sdn.csv",
"headers": ["sdn_id", "sdn_name", "sdn_type", "program", "title", "call_sign", "vessel_type", "tonnage", "gross_tonnage", "vessel_flag", "vessel_owner", "remarks"],
"file_name": "sdn.csv"
},
{
"url": "https://www.treasury.gov/ofac/downloads/alt.csv",
"headers": ["sdn_id", "alt_id", "alt_type", "alt_name", "remarks"],
"file_name": "alt.csv"
} # add the rest in the same pattern
]
for source in sources:
fetch_csv(source['url'], source['headers'], source['file_name'])
如果需要将其写入数据库,则应将df.to_csv
行替换为sql。
https://stackoverflow.com/questions/63476929
复制相似问题