首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >清理后直接将csv数据从URL加载到mysql表。

清理后直接将csv数据从URL加载到mysql表。
EN

Stack Overflow用户
提问于 2020-08-18 21:33:30
回答 1查看 58关注 0票数 2

我希望将URL "https://www.treasury.gov/ofac/downloads/sdn.csv“中提供的数据直接加载到一个名为sdn的表中。我唯一想做的更改是将所有具有此值的列的所有'-0-‘替换为'’。

我试着用熊猫来做这件事,但我的做法看上去并不干净。

代码语言:javascript
运行
复制
import requests
import pandas as pd


sdnURL = "https://www.treasury.gov/ofac/downloads/sdn.csv"
altURL = "https://www.treasury.gov/ofac/downloads/alt.csv"
addURL = "https://www.treasury.gov/ofac/downloads/add.csv"
sdnCommentsURL = "https://www.treasury.gov/ofac/downloads/sdn_comments.csv"

sdnHeader = ["sdn_id", "sdn_name", "sdn_type", "program", "title", "call_sign", "vessel_type", "tonnage", "gross_tonnage", "vessel_flag", "vessel_owner", "remarks"]
altHeader = ["sdn_id", "alt_id", "alt_type", "alt_name", "remarks"]
addHeader = ["sdn_id", "address_id", "address", "city_state_post", "country", "remarks"]
sdnCommentsHeader = ["sdn_id", "remarks"]


sdn = pd.read_csv(sdnURL, names = sdnHeader, header = None)
alt = pd.read_csv(altURL, names = altHeader, header = None)
add = pd.read_csv(addURL, names = addHeader, header = None)
sdnComments = pd.read_csv(sdnCommentsURL, names = sdnCommentsHeader, header = None)

sdn.to_csv('sdn.csv', index = False)
alt.to_csv('alt.csv', index = False)
add.to_csv('add.csv', index = False)
sdnComments.to_csv('sdnComments.csv', index = False)

此外,我计划将csv加载到mysql表中。在我的方法中我有两个顾虑-

  1. 我不想为每个文件编写命令。
  2. 一次替换所有列中的“-0-”

最终编辑:感谢@Jimmar给出的答案,我最终写了这样的代码-

代码语言:javascript
运行
复制
import requests
import pandas as pd

files = {
         "sdn" : ["sdn_id", "sdn_name", "sdn_type", "program", "title", "call_sign", "vessel_type", "tonnage", "gross_tonnage", "vessel_flag", "vessel_owner", "remarks"],
         "alt" : ["sdn_id", "alt_id", "alt_type", "alt_name", "remarks"],
         "add" : ["sdn_id", "address_id", "address", "city_state_post", "country", "remarks"],
         "sdn_comments" : ["sdn_id", "remarks"]
        }

def fetch_csv(file, headers):
    df = pd.read_csv("https://www.treasury.gov/ofac/downloads/"+file+".csv", names=headers, header=None)
    df = df.replace('-0- ', '')
    df.to_csv(file+'.csv', index=False)

for file, headers in files.items():
    fetch_csv(file, headers)
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-08-18 21:53:28

你可以用这种方式组织你的代码(我只做2)

代码语言:javascript
运行
复制
import requests
import pandas as pd

def fetch_csv(url, headers, file_name):
    df = pd.read_csv(url, names=headers, header=None)
    df = df.replace('-0- ', '')
    df.to_csv(file_name, index=False)

sources = [
    {
       "url": "https://www.treasury.gov/ofac/downloads/sdn.csv",
       "headers": ["sdn_id", "sdn_name", "sdn_type", "program", "title", "call_sign", "vessel_type", "tonnage", "gross_tonnage", "vessel_flag", "vessel_owner", "remarks"],
       "file_name": "sdn.csv"
    },
    {
       "url": "https://www.treasury.gov/ofac/downloads/alt.csv",
       "headers":  ["sdn_id", "alt_id", "alt_type", "alt_name", "remarks"],
       "file_name": "alt.csv"
    } # add the rest in the same pattern
]

for source in sources:
    fetch_csv(source['url'], source['headers'], source['file_name'])

如果需要将其写入数据库,则应将df.to_csv行替换为sql

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/63476929

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档