文章/答案/技术大牛

发布

社区首页 >问答首页 >从数据框中的列表中提取元素

问从数据框中的列表中提取元素
EN

Stack Overflow用户

提问于 2020-11-24 02:03:26

回答 1查看 54关注 0票数 0

我有以下数据框

    sentence    Entity
0   The 7250 IXR-e series uses the SR OS and is ma...   [['SR OS', 'Operating_System'], ['NSP', 'Operating_System']]
1   The 7250 IXR is managed by the NSP, which prov...   [['NSP', 'Operating_System']]
2   Nokia’s feature-rich 64-bit SR OS addresses th...   [['SR OS', 'Operating_System'],['IP routing', 'Feature']]
3   The 7250 IXR-R6 uses the SR OS and is managed ...   [['SR OS', 'Operating_System'], ['NSP', 'Operating_System']]
4   The 7250 IXR-R6 is managed by the NSP   [['NSP', 'Operating_System']]
5   The NSP provides end-to-end service-aware man...    [['NSP', 'Operating_System'], ['Cloud', 'Innovation'],['IP/MPLS', 'Feature']]

我想将entity列中的元素拆分为4个其他列

   sentence    Entity                                    e1    et1               e2        et2     
0   The 7250 IXR-e series uses the SR OS and is ma..    SR OS  Operating_System   NSP    Operating_System

如果只有一对列表列表，如第二行，我将删除该行，如果实体列有两个以上列表列表，如最后一行，我也希望删除该行。

我能够存储第一个元素并将其添加到列表中，发布的想法是压缩所有列表并创建一个数据框，但我不知道如何提取其他行……

e1 = []
for i in range(10):
    a = (eval(data['Entity'].values.tolist()[i]))
    b = a[0]
    entity_one.append((b[0]))

对于建议的清晰理解：

{‘句子’：{0：‘7250 IXR-e系列使用SR操作系统，由诺基亚网络服务平台( NSP )管理。’，1：‘7250 IXR由NSP管理，NSP提供跨IP网络的集成网络管理。’，2：‘诺基亚功能丰富的64位SR OS可满足所有IP路由要求。’，3：‘7250 IXR-R6使用SR OS，由诺基亚网络服务平台(NSP)管理。’，4：‘7250 IXR-R6由NSP管理’}，‘实体’：{0："['SR OS'，'Operating_System'，'NSP'，'Operating_System']"，1："['NSP'，'Operating_System']"，2："['SR OS'，'Operating_System'，'IP routing'，'Feature']"，3："['SR OS'，'Operating_System'，'NSP'，'Operating_System']"，4："['NSP'，'Operating_System']"}}

python

pandas

list

dataframe

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-11-24 04:29:49

您可以先通过applying len选择想要的列表，然后使用列表理解表达式直接解压缩列表列。

数据

df = pd.read_csv(io.StringIO("""
    sentence                                            Entity
0   The 7250 IXR-e series uses the SR OS and is ma...   [['SR OS', 'Operating_System'], ['NSP', 'Operating_System']]
1   The 7250 IXR is managed by the NSP, which prov...   [['NSP', 'Operating_System']]
2   Nokia’s feature-rich 64-bit SR OS addresses th...   [['SR OS', 'Operating_System'],['IP routing', 'Feature']]
3   The 7250 IXR-R6 uses the SR OS and is managed ...   [['SR OS', 'Operating_System'], ['NSP', 'Operating_System']]
4   The 7250 IXR-R6 is managed by the NSP               [['NSP', 'Operating_System']]
5   The NSP provides end-to-end service-aware man...    [['NSP', 'Operating_System'], ['Cloud', 'Innovation'], ['IP/MPLS', 'Feature']]
"""), sep=r"\s{2,}", engine="python")

# Convert literal list expression to list. Not needed in real use.
import ast
df["Entity"] = df["Entity"].apply(ast.literal_eval)

代码

# 1. select wanted columns (Entity length = 2)
df_ans = df[df["Entity"].apply(len) == 2]

# 2. unpack Entity column, convert to a dataframe, and merge back horizontally
df_ans = pd.concat([
    df_ans[["sentence"]],
    pd.DataFrame(df_ans["Entity"].apply(lambda lsls: [item for ls in lsls for item in ls]).to_list(),
                 columns=["e1", "et1", "e2", "et2"])
], axis=1)

# drop nan's
df_ans = df_ans[~df_ans["sentence"].isna()]

结果

print(df_ans)
                                            sentence  ...               et2
0  The 7250 IXR-e series uses the SR OS and is ma...  ...  Operating_System
2  Nokia’s feature-rich 64-bit SR OS addresses th...  ...  Operating_System
3  The 7250 IXR-R6 uses the SR OS and is managed ...  ...               NaN

print(df_ans[["e1", "et1", "e2", "et2"]])
      e1               et1          e2               et2
0  SR OS  Operating_System         NSP  Operating_System
2  SR OS  Operating_System         NSP  Operating_System
3    NaN               NaN         NaN               NaN

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64973806

复制

相似问题

问从数据框中的列表中提取元素
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从数据框中的列表中提取元素EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从数据框中的列表中提取元素
EN