我已更新我的问题,以提供一个更清楚的例子。
是否可以使用Pandas中的drop_duplicates方法根据列id删除重复行,其中的值包含一个列表。考虑一下“三”列,它由列表中的两项组成。是否有一种方法可以删除重复的行,而不是迭代执行(这是我当前的解决办法)。
我列举了以下例子,概述了我的问题:
import pandas as pd
data = [
{'one': 50, 'two': '5:00', 'three': 'february'},
{'one': 25, 'two': '6:00', 'three': ['february', 'january']},
{'one': 25, 'two': '6:00', 'three': ['february', 'january']},
{'one': 25, 'two': '6:00', 'three': ['february', 'january']},
{'one': 90, 'two': '9:00', 'three': 'january'}
]
df = pd.DataFrame(data)
print(df)
one three two
0 50 february 5:00
1 25 [february, january] 6:00
2 25 [february, january] 6:00
3 25 [february, january] 6:00
4 90 january 9:00
df.drop_duplicates(['three'])
结果出现以下错误:
TypeError: type object argument after * must be a sequence, not map
发布于 2016-06-15 10:51:47
我认为这是因为列表类型是不可接受的,这会扰乱重复的逻辑。作为一种解决方法,您可以像这样将元组转换为:
df['four'] = df['three'].apply(lambda x : tuple(x) if type(x) is list else x)
df.drop_duplicates('four')
one three two four
0 50 february 5:00 february
1 25 [february, january] 6:00 (february, january)
4 90 january 9:00 january
https://stackoverflow.com/questions/37792999
复制相似问题