我在pandas中有以下数据框架
df = pd.DataFrame({
"Name": [ "N1", "N2", "N3", "N1", "N1", "N2", "N3", "N2" ],
"Date": [ "31-10-2021", "31-10-2021" , "31-10-2021", "15-10-2021", "14-10-2021", "13-10-2021", "12-10-2021", "11-10-2021" ],
"Feature": [ 4, 5, 6, 3, 1, 6, 3, 3 ]
}) Name Date Feature
0 N1 31-10-2021 4
1 N2 31-10-2021 5
2 N3 31-10-2021 6
3 N1 15-10-2021 3
4 N1 14-10-2021 1
5 N2 13-10-2021 6
6 N3 12-10-2021 3
7 N2 11-10-2021 3我希望根据给定名称的特征的当前值与上次出现在数据帧中的该名称的特征的值之间的差值创建一个新列,否则为零。
因此,根据上表,它应该是:
Name Date Feature New_column
0 N1 31-10-2021 4 1
1 N2 31-10-2021 5 -1
2 N3 31-10-2021 6 3
3 N1 15-10-2021 3 2
4 N1 14-10-2021 1 0
5 N2 13-10-2021 6 3
6 N3 12-10-2021 3 0
7 N2 11-10-2021 3 0有没有一种矢量化/高效的方法来做到这一点?提前谢谢。
发布于 2021-11-01 12:24:21
您可以在groupby中使用shift
import pandas as pd
import numpy as np
df = pd.DataFrame({
"Name": [ "N1", "N2", "N3", "N1", "N1", "N2", "N3", "N2" ],
"Date": [ "31-10-2021", "31-10-2021" , "31-10-2021", "15-10-2021", "14-10-2021", "13-10-2021", "12-10-2021", "11-10-2021" ],
"Feature": [ 4, 5, 6, 3, 1, 6, 3, 3 ]
})
df.sort_values(by = ['Name', 'Date'], inplace = True)
df['New_column'] = df['Feature'] - df.groupby('Name')['Feature'].shift()
df['New_column'] = df['New_column'].replace(np.nan, 0, regex = True)代码的最后一行是因为名称的第一行将有一个NaN,但在您的示例中,您希望有一个0。
发布于 2021-11-01 11:27:33
我们可以做到
result_df = df.assign(New_column=df.sort_values('Date', ascending=False)
.groupby('Name')['Feature'].diff().fillna(0))https://stackoverflow.com/questions/69796254
复制相似问题