创建Seurat对象时忽略的两个参数竟然有这样的功能？

生信技能树

发布于 2025-03-14 16:06:48

8700

代码可运行

运行总次数：0

代码可运行

这两天分析一个单细胞数据发现一个奇怪的问题，就是创建 seurat 对象的时候，我明明设置了参数 CreateSeuratObject 函数的project 参数，但是最后merge 不同的样本后发现 orig.ident 这一列居然不是我设置的样本名！来看看细节~

分析的数据

我分析的单细胞数据为急性肾功能损伤的组织样本，共13个样本。数据下载链接：https://ngdc.cncb.ac.cn/omix/release/OMIX004421

我这里有下载好的放在百度盘： https://pan.baidu.com/s/1DyWqfA5G3Q7cPDLNICXRxA?pwd=5s2q 提取码: 5s2q

读取数据并创建Seurat对象

###
### Create: Jianming Zeng
### Date:  2023-12-31  
### Email: jmzeng1314@163.com
### Blog: http://www.bio-info-trainee.com/
### Forum:  http://www.biotrainee.com/thread-1376-1-1.html
### CAFS/SUSTC/Eli Lilly/University of Macau
### Update Log: 2023-12-31   First version 
### Update Log: 2024-12-09   by juan zhang (492482942@qq.com)
### 

rm(list=ls())
library(dplyr) 
library(future)
library(Seurat)
library(clustree)
library(cowplot)
library(data.table)
library(qs)
library(Matrix)
getwd()

## read data
samples <- list.files("data/", recursive = F, full.names = F, pattern = "txt")
samples

scRNA_list <- lapply(samples, function(pro){
#pro <- samples[1]
print(pro)
  counts <- fread(file = file.path("data/", pro), data.table = F)
  counts[1:4,1:4]
  rownames(counts) <- counts[, 1]
  counts <- counts[, -1]
  dim(counts)

# 创建Seurat对象
  sce <- CreateSeuratObject(counts = counts, min.cells=3, project = gsub(".txt","", pro) )
return(sce)
})
names(scRNA_list) <- gsub(".txt","", samples)
scRNA_list

## merge
sce.all <- merge(scRNA_list[[1]], y=scRNA_list[-1], add.cell.ids=gsub(".txt","", samples))
sce.all <- JoinLayers(sce.all) # seurat v5
sce.all


# 查看特征
as.data.frame(sce.all@assays$RNA$counts[1:10, 1:2])
head(sce.all@meta.data, 10)
table(sce.all$orig.ident)

到这里table(sce.all$orig.ident) 的时候，发现样本名字长这样：

table(sce.all$orig.ident) 

   A1    A2    A3    A4    C1    C2    C3    C4    M1    M2    M3    S1    S4 
 6394  7621  8561  4780  6464  6432  7627 10443  7599  5756  8556 12940 10058

而我前面的 samples是这样的，创建对象的时候sce <- CreateSeuratObject(counts = counts, min.cells=3, project = gsub(".txt","", pro) ) 设置了 project 参数，理论上应为去掉了 txt 后缀后的文件名字，好诡异！

> samples
 [1] "IRI1d_1.txt" "IRI1d_2.txt" "IRI1d_3.txt" "IRI1d_4.txt" "IRI3d_1.txt" "IRI3d_2.txt" "IRI3d_3.txt" "IRI3d_4.txt" "MSC1.txt"    "MSC2.txt"    "MSC3.txt"   
[12] "sham1.txt"   "sham2.txt"

检查数据

我反复检查我的数据，到底哪里出了问题，这几个样本名字难道是灵异事件，突然就出现了吗？

读一个数据看看：

pro <- samples[1]
print(pro)
counts <- fread(file = file.path("data/", pro), data.table = F)
counts[1:4,1:4]
rownames(counts) <- counts[, 1]
counts <- counts[, -1]
dim(counts)

# 创建Seurat对象
sce <- CreateSeuratObject(counts = counts, min.cells=3, project = gsub(".txt","", pro) )
head(sce@meta.data)

在 counts[1:4,1:4] 这里看到了可疑的前缀：

这个地方还能跑到 project参数的值里面去！？？？就真的跑过去了！！！

> gsub(".txt","", pro)
[1] "IRI1d_1"

CreateSeuratObject 这个函数做了什么？

赶紧查看 CreateSeuratObject 的帮助文档：

Create a Seurat object
Description
Create a Seurat object from raw data

Usage
CreateSeuratObject(
  counts,
  assay = "RNA",
  names.field = 1,
  names.delim = "_",
  meta.data = NULL,
  project = "CreateSeuratObject",
  ...
)

Arguments
counts 
Either a matrix-like object with unnormalized data with cells as columns and features as rows or an Assay-derived object

assay 
Name of the initial assay

names.field 
For the initial identity class for each cell, choose this field from the cell's name. E.g. If your cells are named as BARCODE_CLUSTER_CELLTYPE in the input matrix, set names.field to 3 to set the initial identities to CELLTYPE.

names.delim 
For the initial identity class for each cell, choose this delimiter from the cell's column name. E.g. If your cells are named as BARCODE-CLUSTER-CELLTYPE, set this to “-” to separate the cell name into its component parts for picking the relevant field.

meta.data 
Additional cell-level metadata to add to the Seurat object. Should be a data.frame where the rows are cell names and the columns are additional metadata fields. Row names in the metadata need to match the column names of the counts matrix.

project 
Project name for the Seurat object

这两个参数 names.field = 1, names.delim = "_",以往都被我忽略了，

如果我的细胞名字的命名方式是这样：BARCODE_CLUSTER_CELLTYPE，那可以设置 names.field 为这个字符按照这个参数names.delim 的模式进行分割，取第几列作为 project 的初始值

他们的默认是恰好：A1_AAACCCAAGAGTTGTA-1，按照_分割，取第一列 A1出来放到了 metadata中！

是不是很意外！

那我如果想要自己设置的那个project值呢？我设置 names.field=0，就可以了！

# 创建Seurat对象
sce <- CreateSeuratObject(counts = counts, min.cells=3, project = gsub(".txt","", pro), names.field=0 )
head(sce@meta.data)

又是get到意外知识的一天！

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2025-03-13，如有侵权请联系 cloudcommunity@tencent.com 删除

txt

本文分享自生信技能树微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

登录后参与评论

0 条评论

热度