我有一个巨大的表,它的行中有未排列的数据(相对于列的名称)。这张桌子看起来是这样的:
A B C
A1 B1 C1
A2 C2 NA
A3 C3 B3
NA B4 C4
我想知道是否有办法重新排列这个表,以便以"A“开头的所有值都在第一列上,以"B”开头的所有值都在第二列上等等.
预期的结果如下:
A B C
A1 B1 C1
A2 NA C2
A3 B3 C3
NA B4 C4
发布于 2018-06-13 08:09:33
下面是使用data.table
实现它的一种有效方法
library('data.table')
setDT(df1) # assign data table by reference
# melt df1 data with unique id for each row
df1 <- melt( data = df1[, id := .I], measure.vars = setdiff( names(df1), 'id' ))
# assign variable = NA which has value = NA, and variable = column_names for other values
df1[, variable := gsub("[0-9]", "", value)]
# rearrange df1 into multiple columns and remove NA and id columns
dcast(df1, "id ~ variable", value.var = "value")[, `:=` (`NA` = NULL, id = NULL ) ][]
输出:
# A B C
# 1: A1 B1 C1
# 2: A2 NA C2
# 3: A3 B3 C3
# 4: NA B4 C4
数据:
df1 <- read.table(text = "
A B C
A1 B1 C1
A2 C2 NA
A3 C3 B3
NA B4 C4",
header = TRUE, stringsAsFactors = FALSE)
编辑:
使用上面的数据。
library('data.table')
setDT(df1)[, D:= c( NA_character_)] # add column D
df1[, id := .I] # add unique id for each row
nm <- copy( setdiff( names(df1), 'id' ) ) # get column names and remove id
df1 <- dcast( data = melt( data = df1, measure.vars = nm)[, variable := gsub("[0-9]", "", value)],
formula = "id ~ variable",
# subsetting is done because, two NA values are passed to the aggregate function two times ( column D and A for row-4; column C and D for row-2 ).
# you can check it using this: function(x) print(x)
fun = function(x)x[1],
value.var = "value",
fill = NA_character_ )[, id := NULL ][]
# check for number of columns
if( ncol(df1) > length(nm) ){
df1[, `NA` := NULL ] # remove extra NA column
nm <- setdiff(names(df1), 'NA') # remove extra column name: NA
}
# rename columns efficiently by reference
setnames( df1, nm)
输出
print(df1)
# A B C D
# 1: A1 B1 C1 NA
# 2: A2 NA C2 NA
# 3: A3 B3 C3 NA
# 4: NA B4 C4 NA
发布于 2018-06-13 05:05:53
read.table(text = "
A B C
A1 B1 C1
A2 C2 NA
A3 C3 B3
NA B4 C4",
header = T) -> df
dfnew <- matrix(c(paste0(rep(LETTERS[1:3], each = 4), seq(1:4))), ncol = 3, nrow = 4)
dfnew[!dfnew %in% sort(as.character(unlist(df)))] <- NA
dfnew
[,1] [,2] [,3]
[1,] "A1" "B1" "C1"
[2,] "A2" NA "C2"
[3,] "A3" "B3" "C3"
[4,] NA "B4" "C4"
简而言之:创建一个包含所有可能的字母和数字组合的新矩阵,并将它们设置为不在原始表中的NA。
发布于 2018-06-13 05:17:33
对于每一行,在df[1,]
中查找每个元素的第一个字母,并将该元素移动到与df[1,]
中的匹配项相同的列中。
library(tidyverse)
df %>%
pmap(~{ x <- c(...)
ind <- map_dbl(substr(x, 1, 1), ~grep(.x, df[1,])[1]) %>%
.[!is.na(.)]
rep(NA, 3) %>%
`[<-`(ind, x[!is.na(x)])}) %>%
do.call(what = rbind) %>%
as.data.frame
# V1 V2 V3
# 1 A1 B1 C1
# 2 A2 <NA> C2
# 3 A3 B3 C3
# 4 <NA> B4 C4
https://stackoverflow.com/questions/50837564
复制