这个问题类似于在group by列here中选择前N个值。
但是,我想按组选择最后N个值,N取决于相应计数列的值。该计数表示特定名称出现的次数。如果count >3,我只想要最后三个条目,但如果它小于3,我只想要最后一个条目。
# Sample data
df <- data.frame(Name = c("x","x","x","x","y","y","y","z","z"), Value = c(1,2,3,4,5,6,7,8,9))
# Obtain count for each name
count <- df %>%
group_by(Name) %>%
summarise(Count = n_distinct(Value))
# Merge dataframe with count
merge(df, count, by=c("Name"))
# Delete the first entry for x and the first entry for z
# Desired output
data.frame(Name = c("x","x","x","y","y","y","z"), Value = c(2,3,4,5,6,7,9))
发布于 2017-07-31 17:40:35
在基数R中,首先用df$Name
拆分df
。然后,对于每个子组,检查行数,并有条件地提取最后3行或最后1行。
do.call(rbind, lapply(split(df, df$Name), function(a)
a[tail(sequence(NROW(a)), c(3,1)[(NROW(a) < 3) + 1]),]))
或者
do.call(rbind, lapply(split(df, df$Name), function(a)
a[tail(sequence(NROW(a)), ifelse(NROW(a) < 3, 1, 3)),]))
# Name Value
#x.2 x 2
#x.3 x 3
#x.4 x 4
#y.5 y 5
#y.6 y 6
#y.7 y 7
#z z 9
对于三个条件值
do.call(rbind, lapply(split(df, df$Name), function(a)
a[tail(sequence(NROW(a)), ifelse(NROW(a) >= 6, 6, ifelse(NROW(a) >= 3, 3, 1))),]))
发布于 2017-07-31 18:21:20
另一种愚蠢的方式:
df %>% group_by(Name) %>% slice(tail(row_number(),
if (n_distinct(Value) < 3) 1 else 3
))
# A tibble: 7 x 2
# Groups: Name [3]
Name Value
<fctr> <dbl>
1 x 2
2 x 3
3 x 4
4 y 5
5 y 6
6 y 7
7 z 9
data.table中的类比是...
library(data.table)
setDT(df)
df[, tail(.SD, if (uniqueN(Value) < 3) 1 else 3), by=Name]
在R基地最近的东西是..。
with(df, {
len = tapply(Value, Name, FUN = length)
nv = tapply(Value, Name, FUN = function(x) length(unique(x)))
df[ sequence(len) > rep(nv - ifelse(nv < 3, 1, 3), len), ]
})
..。这比应该的要难得多。
发布于 2017-07-31 18:38:27
另一种可能性是:
library(tidyverse)
df %>%
split(.$Name) %>%
map_df(~ if (n_distinct(.x) >= 3) tail(.x, 3) else tail(.x, 1))
这提供了:
# Name Value
#1 x 2
#2 x 3
#3 x 4
#4 y 5
#5 y 6
#6 y 7
#7 z 9
https://stackoverflow.com/questions/45422003
复制相似问题