首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从数据框中的列表中提取元素

从数据框中的列表中提取元素
EN

Stack Overflow用户
提问于 2020-11-24 02:03:26
回答 1查看 54关注 0票数 0

我有以下数据框

代码语言:javascript
复制
    sentence    Entity
0   The 7250 IXR-e series uses the SR OS and is ma...   [['SR OS', 'Operating_System'], ['NSP', 'Operating_System']]
1   The 7250 IXR is managed by the NSP, which prov...   [['NSP', 'Operating_System']]
2   Nokia’s feature-rich 64-bit SR OS addresses th...   [['SR OS', 'Operating_System'],['IP routing', 'Feature']]
3   The 7250 IXR-R6 uses the SR OS and is managed ...   [['SR OS', 'Operating_System'], ['NSP', 'Operating_System']]
4   The 7250 IXR-R6 is managed by the NSP   [['NSP', 'Operating_System']]
5   The NSP provides end-to-end service-aware man...    [['NSP', 'Operating_System'], ['Cloud', 'Innovation'],['IP/MPLS', 'Feature']]

我想将entity列中的元素拆分为4个其他列

代码语言:javascript
复制
   sentence    Entity                                    e1    et1               e2        et2     
0   The 7250 IXR-e series uses the SR OS and is ma..    SR OS  Operating_System   NSP    Operating_System

如果只有一对列表列表,如第二行,我将删除该行,如果实体列有两个以上列表列表,如最后一行,我也希望删除该行。

我能够存储第一个元素并将其添加到列表中,发布的想法是压缩所有列表并创建一个数据框,但我不知道如何提取其他行……

代码语言:javascript
复制
e1 = []
for i in range(10):
    a = (eval(data['Entity'].values.tolist()[i]))
    b = a[0]
    entity_one.append((b[0]))

对于建议的清晰理解:

{‘句子’:{0:‘7250 IXR-e系列使用SR操作系统,由诺基亚网络服务平台( NSP )管理。’,1:‘7250 IXR由NSP管理,NSP提供跨IP网络的集成网络管理。’,2:‘诺基亚功能丰富的64位SR OS可满足所有IP路由要求。’,3:‘7250 IXR-R6使用SR OS,由诺基亚网络服务平台(NSP)管理。’,4:‘7250 IXR-R6由NSP管理’},‘实体’:{0:"['SR OS','Operating_System','NSP','Operating_System']",1:"['NSP','Operating_System']",2:"['SR OS','Operating_System','IP routing','Feature']",3:"['SR OS','Operating_System','NSP','Operating_System']",4:"['NSP','Operating_System']"}}

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-11-24 04:29:49

您可以先通过applying len选择想要的列表,然后使用列表理解表达式直接解压缩列表列。

数据

代码语言:javascript
复制
df = pd.read_csv(io.StringIO("""
    sentence                                            Entity
0   The 7250 IXR-e series uses the SR OS and is ma...   [['SR OS', 'Operating_System'], ['NSP', 'Operating_System']]
1   The 7250 IXR is managed by the NSP, which prov...   [['NSP', 'Operating_System']]
2   Nokia’s feature-rich 64-bit SR OS addresses th...   [['SR OS', 'Operating_System'],['IP routing', 'Feature']]
3   The 7250 IXR-R6 uses the SR OS and is managed ...   [['SR OS', 'Operating_System'], ['NSP', 'Operating_System']]
4   The 7250 IXR-R6 is managed by the NSP               [['NSP', 'Operating_System']]
5   The NSP provides end-to-end service-aware man...    [['NSP', 'Operating_System'], ['Cloud', 'Innovation'], ['IP/MPLS', 'Feature']]
"""), sep=r"\s{2,}", engine="python")

# Convert literal list expression to list. Not needed in real use.
import ast
df["Entity"] = df["Entity"].apply(ast.literal_eval)

代码

代码语言:javascript
复制
# 1. select wanted columns (Entity length = 2)
df_ans = df[df["Entity"].apply(len) == 2]

# 2. unpack Entity column, convert to a dataframe, and merge back horizontally
df_ans = pd.concat([
    df_ans[["sentence"]],
    pd.DataFrame(df_ans["Entity"].apply(lambda lsls: [item for ls in lsls for item in ls]).to_list(),
                 columns=["e1", "et1", "e2", "et2"])
], axis=1)

# drop nan's
df_ans = df_ans[~df_ans["sentence"].isna()]

结果

代码语言:javascript
复制
print(df_ans)
                                            sentence  ...               et2
0  The 7250 IXR-e series uses the SR OS and is ma...  ...  Operating_System
2  Nokia’s feature-rich 64-bit SR OS addresses th...  ...  Operating_System
3  The 7250 IXR-R6 uses the SR OS and is managed ...  ...               NaN

print(df_ans[["e1", "et1", "e2", "et2"]])
      e1               et1          e2               et2
0  SR OS  Operating_System         NSP  Operating_System
2  SR OS  Operating_System         NSP  Operating_System
3    NaN               NaN         NaN               NaN
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64973806

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档