首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >具有多个条件的Python Pandas和

具有多个条件的Python Pandas和
EN

Stack Overflow用户
提问于 2019-01-09 05:14:14
回答 1查看 1.8K关注 0票数 0

以下是我的样本数据:

代码语言:javascript
运行
复制
        Customer   Document Date   Clearing Date   Invoice_Amount
0       A          09/13/2016      11/04/2016      2,007,324
1       A          04/18/2016      07/11/2016      631,714
2       A          09/13/2016      09/16/2016      4,000,000
3       A          07/11/2017      09/23/2017      5,000,000
4       A          05/03/2016      06/17/2016      2,000,000
---     ---        ---             ---             ---
1158    H          04/21/2017      06/28/2017      3,000,000
1159    H          04/25/2017      05/19/2017      1,000,000
1160    H          11/03/2017      12/11/2017      4,500,000
1161    H          03/15/2018      05/27/2018      3,500,000
1162    H          02/21/2018      05/03/2018      1,500,000

我想要创建一个新变量(在Invoice_Amount之后添加一个新列) No_Paid,它计算“客户新发票的文档日期之前的已付款发票数量”。

预期输出如下..。

代码语言:javascript
运行
复制
        Customer   Document Date   Clearing Date   Invoice_Amount No_Paid*
0       A          09/13/2016      11/04/2016      2,007,324          8 
1       A          04/18/2016      07/11/2016      631,714            1
2       A          09/13/2016      09/16/2016      4,000,000          8
3       A          07/11/2017      09/23/2017      5,000,000          6
4       A          05/03/2016      06/17/2016      2,000,000          1
---     ---        ---             ---             ---              ---
1158    H          04/21/2017      06/28/2017      3,000,000          5 
1159    H          04/25/2017      05/19/2017      1,000,000          3
1160    H          11/03/2017      12/11/2017      4,500,000          7
1161    H          03/15/2018      05/27/2018      3,500,000         37
1162    H          02/21/2018      05/03/2018      1,500,000         37

目前,我使用for循环来实现预期的输出。

代码语言:javascript
运行
复制
import pandas as pd
df = pd.read_csv('E:\data.csv')
df['Document Date'] = pd.to_datetime(df['Document Date'],format="%m/%d/%Y")
df['Clearing Date'] = pd.to_datetime(df['Clearing Date'],format="%m/%d/%Y")
df["No_Paid"] = ""
for i in df.index: 
     Vendor= df.loc[i,"Vendor"]
     Doc_Date= df.loc[i,"Document Date"]
     Six_Month = Doc_Date - pd.Timedelta(days=180)
     df.loc[i,"No_Paid"] = df.loc[(df["Vendor"] == Vendor) & (df["Clearing Date"] < Doc_Date) & (df["Document Date"] >= Six_Month),"Invoice_Amount"].count()

在实际情况下,我有100,000多个发票数据,这需要更长的时间我尝试使用df.apply ...But无法达到相同的输出.

EN

回答 1

Stack Overflow用户

发布于 2019-01-09 20:31:55

按照你的例子:

代码语言:javascript
运行
复制
import pandas as pd
# read in csv (save as csv or read in using pd.read_excel)
df = pd.read_csv('file.csv')
# to datetime just in case
df['Doc_Date'] = pd.to_datetime(df['Doc_Date'])
df['Exp_Date'] = pd.to_datetime(df['Exp_Date'])
df['Overdue'] = df['Doc_Date'] - df['Exp_Date']
# 180 days for 6 months
df['6M_Age'] = df['Doc_Date'] - pd.Timedelta(days=180)
# Hard to tell what the line in the middle of the data means
# you can group by two columns if you need too
df['Sum_of_paid'] = df.groupby('ID').cumsum()
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/54103605

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档