我有一个事件日志的数据帧,其中包含一个带有前一个事件ID的列,但它们不是按顺序排列的,所以我想这样做。如果我们获取以下事件名称、If和前一个事件If的数据帧并对其进行混洗,我们将获得:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
'Event_name': ['First', 'Second', 'Third', 'Fourth', 'Fifth', 'Sixth', 'Seventh', 'Eigth', 'Ninth', 'Tenth'],
'Event_Ids': ['QXT364', 'YKD306', 'GJJ60', 'RSK547', 'GNN259', 'DKW368', 'OAN385', 'PGF213', 'NGJ285', 'OLG594'],
'Previous_Event_Ids': [np.nan,'QXT364', 'YKD306', 'GJJ60', 'RSK547', 'GNN259', 'DKW368', 'OAN385', 'PGF213', 'NGJ285']
}
)
df = df.sample(frac=1).reset_index(drop=True)
print(df)
它输出:
Event_name Event_Ids Previous_Event_Ids
0 Fourth RSK547 GJJ60
1 Eigth PGF213 OAN385
2 First QXT364 NaN
3 Third GJJ60 YKD306
4 Fifth GNN259 RSK547
5 Sixth DKW368 GNN259
6 Seventh OAN385 DKW368
7 Ninth NGJ285 PGF213
8 Second YKD306 QXT364
9 Tenth OLG594 NGJ285
可以使用什么代码对此进行排序,从而使DataFrame的结果如下所示?
Event_name Event_Ids Previous_Event_Ids
0 First QXT364 NaN
1 Second YKD306 QXT364
2 Third GJJ60 YKD306
3 Fourth RSK547 GJJ60
4 Fifth GNN259 RSK547
5 Sixth DKW368 GNN259
6 Seventh OAN385 DKW368
7 Eigth PGF213 OAN385
8 Ninth NGJ285 PGF213
9 Tenth OLG594 NGJ285
发布于 2020-12-10 12:27:09
您需要有一个将字符串值映射到int
的dict
,然后对整数值进行排序:
In [301]: vars_map = {'First': 1, 'Second': 2, 'Third': 3, 'Fourth':4, 'Fifth':5, 'Sixth':6, 'Seventh': 7, 'Eigth':8, 'Ninth':9, 'Tenth':10}
In [305]: df1 = df.assign(vals=df.Event_name.map(vars_map)).sort_values('vals').drop('vals', 1)
In [306]: df1
Out[306]:
Event_name Event_Ids Previous_Event_Ids
1 First QXT364 NaN
3 Second YKD306 QXT364
5 Third GJJ60 YKD306
7 Fourth RSK547 GJJ60
9 Fifth GNN259 RSK547
2 Sixth DKW368 GNN259
8 Seventh OAN385 DKW368
0 Eigth PGF213 OAN385
6 Ninth NGJ285 PGF213
4 Tenth OLG594 NGJ285
发布于 2020-12-10 15:51:58
我可以用下面的代码来解决这个问题:
# Step 1: Initialize the dictionary
var_map = dict.fromkeys(df.index.values)
# Step 2: Find our start value, NaN
nanLoc,_ = np.where(df.isna())
# Step 3: Put NaN in the first slot of the dictionary
var_map[0] = df.loc[nanLoc].values.tolist()[0]
# Step 4: Iterate through the dataframe
for x in df.index.values[:-1]:
key = var_map[x][1]
var_map[x+1] = df.loc[df['Previous_Event_Ids'] == key].values.tolist()[0]
#Step 5: Turn the dictionary into a DataFrame
df2 = pd.DataFrame.from_dict(var_map, orient='index', columns=['Event_name', 'Event_Ids', 'Previous_Event_Ids'])
print(df2)
https://stackoverflow.com/questions/65228463
复制相似问题