下面是我的xml数据
<?xml version="1.0" ?>
<ALLRECORDS>
<RECORD>
<RecordType>Forward</RecordType>
<Env SINGLE="Y" TYPE="EntList">
<ENV>
<InputDate>20201118</InputDate>
<Company>CompanyName1</Company>
<Book>Book1</Book>
<Status>DONE</Status>
</ENV>
</Env>
<Back SINGLE="Y" TYPE="EntList">
<BACK>
<PendingUser>User1</PendingUser>
<FairValueLevel>LEVEL2</FairValueLevel>
</BACK>
</Back>
<Financial SINGLE="Y" TYPE="EntList">
<FINANCIAL>
<PorS>S</PorS>
<UserSelectedFlag>SELL</UserSelectedFlag>
<BoughtAmt TYPE="Numeric">860000000</BoughtAmt>
<Valdate>20220513</Valdate>
<Rate TYPE="Numeric">103.0575</Rate>
</FINANCIAL>
</Financial>
</RECORD>
<RECORD>
<RecordType>Forward</RecordType>
<Env SINGLE="Y" TYPE="EntList">
<ENV>
<InputDate>20201119</InputDate>
<Company>CompanyName2</Company>
<Book>Book2</Book>
<Status>DONE</Status>
</ENV>
</Env>
<Back SINGLE="Y" TYPE="EntList">
<BACK>
<PendingUser>User2</PendingUser>
<FairValueLevel>LEVEL3</FairValueLevel>
</BACK>
</Back>
<Financial SINGLE="Y" TYPE="EntList">
<FINANCIAL>
<PorS>S</PorS>
<UserSelectedFlag>SELL</UserSelectedFlag>
<BoughtAmt TYPE="Numeric">860000000</BoughtAmt>
<Valdate>20220512</Valdate>
<Rate TYPE="Numeric">103.09</Rate>
</FINANCIAL>
</Financial>
</RECORD>
</ALLRECORDS>
所以,基本上,这有一个标签ALLRECORDS。每条记录都由记录表示。有趣的是,每个记录都有三个子标签: Env、Back和Financial。
问题是:如何为每个记录创建一行数据(不用担心列名,因为FINANCIAL下的标记会发生变化)。
以下是我到目前为止所做的工作:
import xml.dom.minidom
import xml.etree.ElementTree as ET
import pandas as pd
from os import listdir
from os.path import isfile, join
import datetime
def get_row(record):
envrecords = record.find('Env')
env = envrecords.find('ENV')
InputDate = env.find('InputDate').text
Company = env.find('Company').text
Book = env.find('Book').text
Status = env.find('Status').text
backrecords = record.find('Back')
back = backrecords.find('BACK')
PendingUser = back.find('PendingUser').text
FairValueLevel = '' if back.find('FairValueLevel') is None else back.find('FairValueLevel').text
finrecords = record.find('Financial')
fin = finrecords.find('FINANCIAL')
PorS = fin.find('PorS').text
UserSelectedFlag = fin.find('UserSelectedFlag').text
BoughtAmt = fin.find('BoughtAmt').text
Valdate = fin.find('Valdate').text
Rate = fin.find('Rate').text
row = [InputDate, Company, Book, Status, PendingUser, FairValueLevel,
PorS, UserSelectedFlag, BoughtAmt, Valdate, Rate]
return row
def get_xml_data(xmlpath):
cols = ['InputDate', 'Company', 'Book', 'Status', 'PendingUser', 'FairValueLevel',
'PorS', 'UserSelectedFlag', 'BoughtAmt', 'Valdate', 'Rate']
rows = []
allfiles = [f for f in listdir(xmlpath) if isfile(join(xmlpath, f)) and f.endswith('xml')]
for filename in allfiles:
fullfilename = join(xmlpath, filename)
root = ET.parse(fullfilename).getroot()
tradetypes = ['RECORD']
for tt in tradetypes:
for rec in root.findall(tt):
eachrow = get_row(rec)
eachrowdict = dict(zip(cols,eachrow))
rows.append(eachrowdict)
return rows
filename = 'C:/Temp/FX/xmldata/stupidsample'
rows = get_xml_data(filename)
for a in rows:
print(a)
如您所见,我为标记"FairValueLevel“添加了一个例外。现在,在“金融”部分,我不知道会出现哪些新的列名。如果在FINANCIAL下面看到一个名为"BusinessDate“的新标签,我该如何更改代码?
发布于 2020-12-05 02:59:58
如果节点发生变化,请考虑没有硬编码节点的动态解决方案。下面使用Python 3.5+中提供的latest method合并字典:
import xml.etree.ElementTree as ET
import pandas as pd
doc = ET.parse("FinancialXMLSingleRow.xml")
df = pd.DataFrame([ {**{i.tag:i.text.strip() for i in rec.findall("Env/ENV/*")},
**{i.tag:i.text.strip() for i in rec.findall("Back/BACK/*")},
**{i.tag:i.text.strip() for i in rec.findall("Financial/FINANCIAL/*")} }
for rec in doc.findall(".//RECORD")])
print(df)
# InputDate Company Book Status PendingUser FairValueLevel PorS UserSelectedFlag BoughtAmt Valdate Rate
# 0 20201118 CompanyName1 Book1 DONE User1 LEVEL2 S SELL 860000000 20220513 103.0575
# 1 20201119 CompanyName2 Book2 DONE User2 LEVEL3 S SELL 860000000 20220512 103.09
https://stackoverflow.com/questions/65110808
复制相似问题