这两天分析一个单细胞数据发现一个奇怪的问题,就是创建 seurat 对象的时候,我明明设置了参数 CreateSeuratObject 函数的project 参数,但是最后merge 不同的样本后发现 orig.ident 这一列居然不是我设置的样本名!来看看细节~
我分析的单细胞数据为急性肾功能损伤的组织样本,共13个样本。数据下载链接:https://ngdc.cncb.ac.cn/omix/release/OMIX004421
我这里有下载好的放在百度盘: https://pan.baidu.com/s/1DyWqfA5G3Q7cPDLNICXRxA?pwd=5s2q 提取码: 5s2q
###
### Create: Jianming Zeng
### Date: 2023-12-31
### Email: jmzeng1314@163.com
### Blog: http://www.bio-info-trainee.com/
### Forum: http://www.biotrainee.com/thread-1376-1-1.html
### CAFS/SUSTC/Eli Lilly/University of Macau
### Update Log: 2023-12-31 First version
### Update Log: 2024-12-09 by juan zhang (492482942@qq.com)
###
rm(list=ls())
library(dplyr)
library(future)
library(Seurat)
library(clustree)
library(cowplot)
library(data.table)
library(qs)
library(Matrix)
getwd()
## read data
samples <- list.files("data/", recursive = F, full.names = F, pattern = "txt")
samples
scRNA_list <- lapply(samples, function(pro){
#pro <- samples[1]
print(pro)
counts <- fread(file = file.path("data/", pro), data.table = F)
counts[1:4,1:4]
rownames(counts) <- counts[, 1]
counts <- counts[, -1]
dim(counts)
# 创建Seurat对象
sce <- CreateSeuratObject(counts = counts, min.cells=3, project = gsub(".txt","", pro) )
return(sce)
})
names(scRNA_list) <- gsub(".txt","", samples)
scRNA_list
## merge
sce.all <- merge(scRNA_list[[1]], y=scRNA_list[-1], add.cell.ids=gsub(".txt","", samples))
sce.all <- JoinLayers(sce.all) # seurat v5
sce.all
# 查看特征
as.data.frame(sce.all@assays$RNA$counts[1:10, 1:2])
head(sce.all@meta.data, 10)
table(sce.all$orig.ident)
到这里table(sce.all$orig.ident)
的时候,发现样本名字长这样:
table(sce.all$orig.ident)
A1 A2 A3 A4 C1 C2 C3 C4 M1 M2 M3 S1 S4
6394 7621 8561 4780 6464 6432 7627 10443 7599 5756 8556 12940 10058
而我前面的 samples
是这样的,创建对象的时候sce <- CreateSeuratObject(counts = counts, min.cells=3, project = gsub(".txt","", pro) )
设置了 project 参数,理论上 应为去掉了 txt 后缀后的文件名字,好诡异!
> samples
[1] "IRI1d_1.txt" "IRI1d_2.txt" "IRI1d_3.txt" "IRI1d_4.txt" "IRI3d_1.txt" "IRI3d_2.txt" "IRI3d_3.txt" "IRI3d_4.txt" "MSC1.txt" "MSC2.txt" "MSC3.txt"
[12] "sham1.txt" "sham2.txt"
我反复检查我的数据,到底哪里出了问题,这几个样本名字难道是灵异事件,突然就出现了吗?
读一个数据看看:
pro <- samples[1]
print(pro)
counts <- fread(file = file.path("data/", pro), data.table = F)
counts[1:4,1:4]
rownames(counts) <- counts[, 1]
counts <- counts[, -1]
dim(counts)
# 创建Seurat对象
sce <- CreateSeuratObject(counts = counts, min.cells=3, project = gsub(".txt","", pro) )
head(sce@meta.data)
在 counts[1:4,1:4]
这里看到了可疑的前缀:
这个地方还能跑到 project参数的值里面去!???就真的跑过去了!!!
> gsub(".txt","", pro)
[1] "IRI1d_1"
赶紧查看 CreateSeuratObject
的帮助文档:
Create a Seurat object
Description
Create a Seurat object from raw data
Usage
CreateSeuratObject(
counts,
assay = "RNA",
names.field = 1,
names.delim = "_",
meta.data = NULL,
project = "CreateSeuratObject",
...
)
Arguments
counts
Either a matrix-like object with unnormalized data with cells as columns and features as rows or an Assay-derived object
assay
Name of the initial assay
names.field
For the initial identity class for each cell, choose this field from the cell's name. E.g. If your cells are named as BARCODE_CLUSTER_CELLTYPE in the input matrix, set names.field to 3 to set the initial identities to CELLTYPE.
names.delim
For the initial identity class for each cell, choose this delimiter from the cell's column name. E.g. If your cells are named as BARCODE-CLUSTER-CELLTYPE, set this to “-” to separate the cell name into its component parts for picking the relevant field.
meta.data
Additional cell-level metadata to add to the Seurat object. Should be a data.frame where the rows are cell names and the columns are additional metadata fields. Row names in the metadata need to match the column names of the counts matrix.
project
Project name for the Seurat object
这两个参数 names.field = 1, names.delim = "_",
以往都被我忽略了,
如果我的细胞名字的命名方式是这样:BARCODE_CLUSTER_CELLTYPE,那可以设置 names.field 为这个字符 按照 这个 参数names.delim
的模式进行分割,取第几列作为 project 的初始值
他们的默认是 恰好:A1_AAACCCAAGAGTTGTA-1
,按照_分割,取第一列 A1出来 放到了 metadata中!
是不是很意外!
那我如果想要自己设置的那个project值呢?我设置 names.field=0
,就可以了!
# 创建Seurat对象
sce <- CreateSeuratObject(counts = counts, min.cells=3, project = gsub(".txt","", pro), names.field=0 )
head(sce@meta.data)