我有以下数据框
sentence Entity
0 The 7250 IXR-e series uses the SR OS and is ma... [['SR OS', 'Operating_System'], ['NSP', 'Operating_System']]
1 The 7250 IXR is managed by the NSP, which prov... [['NSP', 'Operating_System']]
2 Nokia’s feature-rich 64-bit SR OS addresses th... [['SR OS', 'Operating_System'],['IP routing', 'Feature']]
3 The 7250 IXR-R6 uses the SR OS and is managed ... [['SR OS', 'Operating_System'], ['NSP', 'Operating_System']]
4 The 7250 IXR-R6 is managed by the NSP [['NSP', 'Operating_System']]
5 The NSP provides end-to-end service-aware man... [['NSP', 'Operating_System'], ['Cloud', 'Innovation'],['IP/MPLS', 'Feature']]我想将entity列中的元素拆分为4个其他列
sentence Entity e1 et1 e2 et2
0 The 7250 IXR-e series uses the SR OS and is ma.. SR OS Operating_System NSP Operating_System如果只有一对列表列表,如第二行,我将删除该行,如果实体列有两个以上列表列表,如最后一行,我也希望删除该行。
我能够存储第一个元素并将其添加到列表中,发布的想法是压缩所有列表并创建一个数据框,但我不知道如何提取其他行……
e1 = []
for i in range(10):
a = (eval(data['Entity'].values.tolist()[i]))
b = a[0]
entity_one.append((b[0]))对于建议的清晰理解:
{‘句子’:{0:‘7250 IXR-e系列使用SR操作系统,由诺基亚网络服务平台( NSP )管理。’,1:‘7250 IXR由NSP管理,NSP提供跨IP网络的集成网络管理。’,2:‘诺基亚功能丰富的64位SR OS可满足所有IP路由要求。’,3:‘7250 IXR-R6使用SR OS,由诺基亚网络服务平台(NSP)管理。’,4:‘7250 IXR-R6由NSP管理’},‘实体’:{0:"['SR OS','Operating_System','NSP','Operating_System']",1:"['NSP','Operating_System']",2:"['SR OS','Operating_System','IP routing','Feature']",3:"['SR OS','Operating_System','NSP','Operating_System']",4:"['NSP','Operating_System']"}}
发布于 2020-11-24 04:29:49
您可以先通过applying len选择想要的列表,然后使用列表理解表达式直接解压缩列表列。
数据
df = pd.read_csv(io.StringIO("""
sentence Entity
0 The 7250 IXR-e series uses the SR OS and is ma... [['SR OS', 'Operating_System'], ['NSP', 'Operating_System']]
1 The 7250 IXR is managed by the NSP, which prov... [['NSP', 'Operating_System']]
2 Nokia’s feature-rich 64-bit SR OS addresses th... [['SR OS', 'Operating_System'],['IP routing', 'Feature']]
3 The 7250 IXR-R6 uses the SR OS and is managed ... [['SR OS', 'Operating_System'], ['NSP', 'Operating_System']]
4 The 7250 IXR-R6 is managed by the NSP [['NSP', 'Operating_System']]
5 The NSP provides end-to-end service-aware man... [['NSP', 'Operating_System'], ['Cloud', 'Innovation'], ['IP/MPLS', 'Feature']]
"""), sep=r"\s{2,}", engine="python")
# Convert literal list expression to list. Not needed in real use.
import ast
df["Entity"] = df["Entity"].apply(ast.literal_eval)代码
# 1. select wanted columns (Entity length = 2)
df_ans = df[df["Entity"].apply(len) == 2]
# 2. unpack Entity column, convert to a dataframe, and merge back horizontally
df_ans = pd.concat([
df_ans[["sentence"]],
pd.DataFrame(df_ans["Entity"].apply(lambda lsls: [item for ls in lsls for item in ls]).to_list(),
columns=["e1", "et1", "e2", "et2"])
], axis=1)
# drop nan's
df_ans = df_ans[~df_ans["sentence"].isna()]结果
print(df_ans)
sentence ... et2
0 The 7250 IXR-e series uses the SR OS and is ma... ... Operating_System
2 Nokia’s feature-rich 64-bit SR OS addresses th... ... Operating_System
3 The 7250 IXR-R6 uses the SR OS and is managed ... ... NaN
print(df_ans[["e1", "et1", "e2", "et2"]])
e1 et1 e2 et2
0 SR OS Operating_System NSP Operating_System
2 SR OS Operating_System NSP Operating_System
3 NaN NaN NaN NaNhttps://stackoverflow.com/questions/64973806
复制相似问题