如何从两个独立的数据帧中按日期查找不匹配的值?
我的数据帧如下所示:
import pandas as pd
data_1 = {'date':['2019-05-01','2019-05-02'],
'id': ['1122','1133']}
data_2 = {'id': ['1122','1133','1144']}
df1 = pd.DataFrame(data_1, columns=['date','id'])
df2 = pd.DataFrame(data_2, columns=['id'])我需要的结果是一个数据帧,它包含来自df2的任何id,该id与df1和df1的日期不匹配。
所以结果应该是这样的;
date id
2019-05-01 1133
2019-05-01 1144
2019-05-02 1122
2019-05-02 1144发布于 2019-06-06 04:22:46
append和drop_duplicates
创建一个起始数据帧,其中包含来自df1的日期和来自df2的ids的乘积
d = pd.DataFrame([
[d, i] for d in df1.date
for i in df2.id
], columns=df1.columns)
d.append(df1).drop_duplicates(keep=False)
date id
1 2019-05-01 1133
2 2019-05-01 1144
3 2019-05-02 1122
5 2019-05-02 1144或者我们可以在构建步骤中跳过已有的
tups = {*zip(*map(df1.get, df1))}
pd.DataFrame([
[d, i] for d in df1.date
for i in df2.id if (d, i) not in tups
], columns=df1.columns)
date id
0 2019-05-01 1133
1 2019-05-01 1144
2 2019-05-02 1122
3 2019-05-02 1144或者使用itertools.product和一些set逻辑
顺便说一句,我的最爱!
from itertools import product
pd.DataFrame(
{*product(df1.date, df2.id)} - {*zip(*map(df1.get, df1))},
columns=df1.columns
)
date id
0 2019-05-01 1144
1 2019-05-01 1133
2 2019-05-02 1144
3 2019-05-02 1122发布于 2019-06-06 04:24:34
这更像是带有过滤器的CJ(交叉联接)问题
df1.assign(key=1).merge(df2.assign(key=1),on='key').loc[lambda x : x['id_x']!=x['id_y']].drop(['key','id_x'],1)
Out[262]:
date id_y
1 2019-05-01 1133
2 2019-05-01 1144
3 2019-05-02 1122
5 2019-05-02 1144发布于 2019-06-06 04:23:53
尝试:
# first we change `df1.id` to columns by crosstab:
u = pd.crosstab(df1.date, df1.id)
# extend the id with df2.id
u = u.reindex(df2['id'], axis=1, fill_value=0).stack()那么你要找的是
u[u.eq(0)].index.to_frame().reset_index(drop=True)输出:
date id
0 2019-05-01 1133
1 2019-05-01 1144
2 2019-05-02 1122
3 2019-05-02 1144https://stackoverflow.com/questions/56467291
复制相似问题