我有下面的dataframe,它有4列。我们称它为df。
ID Start transfer Finish transfer Ward
0 7685933 04/11/2015 12:07 05/11/2015 12:49 General surgery
1 7685933 05/11/2015 12:49 11/11/2015 14:42 Anestesiology
2 7685933 11/11/2015 14:42 11/11/2015 16:12 Anestesiology
3 7685933 11/11/2015 16:12 18/11/2015 21:24 General surgery
4 7685933 18/11/2015 21:24 02/01/2016 06:45 ICU
5 7690142 06/11/2015 17:24 30/11/2015 18:11 Internal Medicine
6 7690142 30/11/2015 18:11 02/12/2015 17:04 Internal Medicine
7 7690142 02/12/2015 17:04 03/12/2015 20:40 Internal Medicine
8 7690142 03/12/2015 20:40 11/01/2016 18:00 Internal Medicine
9 7691888 08/11/2015 16:28 16/11/2015 17:11 Internal Medicine
10 7691888 16/11/2015 17:11 20/11/2015 18:13 Internal Medicine
11 7691888 20/11/2015 18:13 04/01/2016 18:02 Internal Medicine
12 7691888 04/01/2016 18:02 04/01/2016 21:13 Internal Medicine现在,我想根据列“ID”对数据进行分组,然后查找类似的连续病房,其中Ward的“Finish Transfer”与下一个连续的类似Ward名称的“开始传输”相同。一旦确定了这一点,我需要从最后一个连续的病房行复制Finish transfer条目,并用该值更新该特定病房的第一个条目。例如,索引1和索引2处的row1和row2都有相似的病房,如果您查看row1 (index1)的Finish Transfer条目,它类似于row2的开始传输(Index2)。沃德也一样。我想要的是只有一行连续数据,其中开始传输是来自row1的数据,而Finish传输是来自row2的数据。
我希望输出如下内容(可能在一个新的dataframe中):
ID Start transfer Finish transfer Ward
0 7685933 04/11/2015 12:07 05/11/2015 12:49 General surgery
1 7685933 05/11/2015 12:49 11/11/2015 16:12 Anestesiology
2 7685933 11/11/2015 16:12 18/11/2015 21:24 General surgery
3 7685933 18/11/2015 21:24 02/01/2016 06:45 ICU
4 7690142 06/11/2015 17:24 11/01/2016 18:00 Internal Medicine
5 7691888 08/11/2015 16:28 04/01/2016 21:13 Internal Medicine提前谢谢你的帮助。
发布于 2018-03-12 14:27:22
IIUC
df.groupby(['ID','Ward']).agg({'Start transfer':'first','Finish transfer':'last'}).reset_index()
Out[151]:
ID Ward Start transfer Finish transfer
0 7685933 Anestesiology 05/11/2015 12:49 11/11/2015 16:12
1 7685933 General surgery 04/11/2015 12:07 18/11/2015 21:24
2 7685933 ICU 18/11/2015 21:24 02/01/2016 06:45
3 7690142 Internal Medicine 06/11/2015 17:24 11/01/2016 18:00
4 7691888 Internal Medicine 08/11/2015 16:28 04/01/2016 21:13更新
df.assign(Key=(df.Ward.shift()!=df.Ward).cumsum()).groupby(['ID','Ward','Key']).agg({'Start transfer':'first','Finish transfer':'last'}).reset_index().sort_values('Key')
Out[181]:
ID Ward Key Start transfer Finish transfer
1 7685933 General surgery 1 04/11/2015 12:07 05/11/2015 12:49
0 7685933 Anestesiology 2 05/11/2015 12:49 11/11/2015 16:12
2 7685933 General surgery 3 11/11/2015 16:12 18/11/2015 21:24
3 7685933 ICU 4 18/11/2015 21:24 02/01/2016 06:45
4 7690142 Internal Medicine 5 06/11/2015 17:24 11/01/2016 18:00
5 7691888 Internal Medicine 5 08/11/2015 16:28 04/01/2016 21:13https://stackoverflow.com/questions/49237433
复制相似问题