我正在对我公司的原材料需求进行分析,我采取的方法是利用成品的销售记录与每个成品的材料清单相结合。我现在遇到的问题是,每个成品都由多个组件组成,许多成品共享公共组件。我正在尝试保存每个成品的所有个人销售记录,并使用UnitsSold乘以每个部件的单位数量,以获得原材料的需求。以下是示例数据集的代码:
fg_Sales <- data_frame(FG_PartNumber=rep(c("A","B","C"),2),
Order_Date=seq.Date(as.Date("2011-1-1"),as.Date("2012-1-10"),length.out = 6),
FG_UnitsSold=c(100,200,300,400,500,600))
bill_materials <- data_frame(FG_PartNumber=rep(c("A","B","C"),4),
Components=c("C1","C2","C3","C4","C5","C6","C7","C7","C7","C8","C8","C9"),
Qty=rnorm(3,1,n = 12))%>%
arrange(FG_PartNumber)我熟悉dplyr中的left_join,但它似乎不起作用,因为它总是为我提供每个成品的第一个组件。
有没有人能帮上忙?谢谢。
发布于 2017-02-04 22:43:57
也许我没有理解这个问题,但是如果您将您的两个数据帧按FG_PartNumber分组,并根据您感兴趣的数量创建一个透视表,您可以获得您正在寻找的总数:
#Create data
set.seed(1)
fg_Sales <- data_frame(FG_PartNumber=rep(c("A","B","C"),2),
Order_Date=seq.Date(as.Date("2011-1-1"),as.Date("2012-1-10"),length.out = 6),
FG_UnitsSold=c(100,200,300,400,500,600))
bill_materials <- data_frame(FG_PartNumber=rep(c("A","B","C"),4),
Components=c("C1","C2","C3","C4","C5","C6","C7","C7","C7","C8","C8","C9"),
Qty=rnorm(3,1,n = 12))%>%
arrange(FG_PartNumber)
library(dplyr)
#make pivot tables for sales and quantity
tot_sales <- fg_Sales %>%
group_by(FG_PartNumber) %>%
summarise(tot_sales = sum(FG_UnitsSold))
tot_materials <- bill_materials %>%
group_by(FG_PartNumber) %>%
summarise(tot_qty = sum(Qty))
#join the pivot tables together
df <- left_join(tot_sales, tot_materials)
> df
# A tibble: 3 × 3
FG_PartNumber tot_sales tot_qty
<chr> <dbl> <dbl>
1 A 500 13.15087
2 B 700 14.76326
3 C 900 11.30953发布于 2017-02-05 04:28:08
我认为来自dplyr的inner_join是最好的选择:
library(dplyr)
fg_Sales_ext <- inner_join(x = fg_Sales,
y = bill_materials,
by = "FG_PartNumber")在inner_join文档中:“如果x和y之间有多个匹配项,则返回所有匹配项的组合。”
使用fg_Sales_ext,您现在可以使用group_by和summarise执行任何类型的分析。
https://stackoverflow.com/questions/42041492
复制相似问题