我有一个具有重复索引的系列s
:
>>> s
STK_ID RPT_Date
600809 20061231 demo_str
20070331 demo_str
20070630 demo_str
20070930 demo_str
20071231 demo_str
20060331 demo_str
20060630 demo_str
20060930 demo_str
20061231 demo_str
20070331 demo_str
20070630 demo_str
Name: STK_Name, Length: 11
我只想通过以下方式保留唯一的行和重复行的一个副本:
s[s.index.unique()]
Pandas 0.10.1.dev-f7f7e13
给出以下错误消息
>>> s[s.index.unique()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "d:\Python27\lib\site-packages\pandas\core\series.py", line 515, in __getitem__
return self._get_with(key)
File "d:\Python27\lib\site-packages\pandas\core\series.py", line 558, in _get_with
return self.reindex(key)
File "d:\Python27\lib\site-packages\pandas\core\series.py", line 2361, in reindex
level=level, limit=limit)
File "d:\Python27\lib\site-packages\pandas\core\index.py", line 2063, in reindex
limit=limit)
File "d:\Python27\lib\site-packages\pandas\core\index.py", line 2021, in get_indexer
raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects
>>>
那么如何有效地删除序列中多余的重复行,保留唯一行和重复行的唯一副本呢?(在一行中更好)
发布于 2013-01-18 22:10:30
您可以按索引分组,并应用一个函数,该函数为每个索引组返回一个值。在这里,我取第一个值:
In [1]: s = Series(range(10), index=[1,2,2,2,5,6,7,7,7,8])
In [2]: s
Out[2]:
1 0
2 1
2 2
2 3
5 4
6 5
7 6
7 7
7 8
8 9
In [3]: s.groupby(s.index).first()
Out[3]:
1 0
2 1
5 4
6 5
7 6
8 9
更新
解决BigBug在向Series.groupby()传递MultiIndex时出现崩溃的问题:
In [1]: s
Out[1]:
STK_ID RPT_Date
600809 20061231 demo
20070331 demo
20070630 demo
20070331 demo
In [2]: s.reset_index().groupby(s.index.names).first()
Out[2]:
0
STK_ID RPT_Date
600809 20061231 demo
20070331 demo
20070630 demo
发布于 2015-12-10 18:43:18
您可以使用index
的duplicated
(默认情况下保留第一个值)来子集您的数据。@Zelazny7示例:
s = pd.Series(range(10), index=[1,2,2,2,5,6,7,7,7,8])
In [130]: s[~s.index.duplicated()]
Out[130]:
1 0
2 1
5 4
6 5
7 6
8 9
dtype: int64
发布于 2013-01-18 20:56:25
一种方法是使用drop
和index.get_duplicates
In [43]: df
Out[43]:
String
STK_ID RPT_Date
600809 20061231 demo_string
20070331 demo_string
20070630 demo_string
20070930 demo_string
20071231 demo_string
20060331 demo_string
20060630 demo_string
20060930 demo_string
20061231 demo_string
20070331 demo_string
20070630 demo_string
In [44]: df.drop(df.index.get_duplicates())
Out[44]:
String
STK_ID RPT_Date
600809 20070930 demo_string
20071231 demo_string
20060331 demo_string
20060630 demo_string
20060930 demo_string
https://stackoverflow.com/questions/14395678
复制相似问题