前面我们分享了两篇空转反卷积的方法测评,其中RCTD位于推荐榜首,今天就用这篇文献《Integrating spatial transcriptomics and single-cell RNA-sequencing reveals the alterations in epithelial cells during nodular formation in benign prostatic hyperplasia》中的数据试试看性能怎么样!
文献背景:一个样本的空间转录组数据分析:挖掘好思路(GSE242249)
测评解读:
共84,835个细胞被注释为上皮细胞、间质细胞(内皮细胞、成纤维细胞和平滑肌细胞[SMC])以及免疫细胞(T细胞、髓系细胞、浆细胞、肥大细胞和B细胞)。
单细胞数据GEO编号为:https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE172357,样本为其中的部分,选取的12个样本如下:
Patients | Sample | Matrix ID | Group | Nodule | Detection | 5ARIs treatment | Cells (spots) after filtering |
---|---|---|---|---|---|---|---|
P1 | BPH283_GN | GSM5252126 | BPH | Yes(GN) | scRNA-seq | No | 4,483 |
BPH283_SN | GSM5252127 | BPH | Yes(SN) | scRNA-seq | No | 2,964 | |
P2 | BPH327_GN | GSM5252128 | BPH | Yes(GN) | scRNA-seq | No | 5,389 |
BPH327_SN | GSM5252129 | BPH | Yes(SN) | scRNA-seq | No | 8,744 | |
P3 | BPH340_GN | GSM5252130 | BPH | Yes(GN) | scRNA-seq | No | 7,364 |
BPH340_SN | GSM5252131 | BPH | Yes(SN) | scRNA-seq | No | 7,523 | |
P4 | BPH389_GN | GSM5252132 | BPH | Yes(GN) | scRNA-seq | No | 6,000 |
BPH389_SN | GSM5252133 | BPH | Yes(SN) | scRNA-seq | No | 13,596 | |
P5 | BPH511_GN | GSM5252134 | BPH | Yes(GN) | scRNA-seq | No | 9,993 |
D17 | D17 | GSM5252458 | Normal | - | scRNA-seq | - | 6,125 |
D27 | D27 | GSM5252460 | Normal | - | scRNA-seq | - | 7,612 |
D35 | D35 | GSM5252462 | Normal | - | scRNA-seq | - | 5,688 |
下载下来后进行数据读取,并创建seurat对象:
###
### Create: Jianming Zeng
### Date: 2023-12-31
### Email: jmzeng1314@163.com
### Blog: http://www.bio-info-trainee.com/
### Forum: http://www.biotrainee.com/thread-1376-1-1.html
### CAFS/SUSTC/Eli Lilly/University of Macau
### Update Log: 2023-12-31 First version
### Update Log: 2024-12-09 by juan zhang (492482942@qq.com)
###
rm(list=ls())
library(ggsci)
library(dplyr)
library(future)
library(Seurat)
library(clustree)
library(cowplot)
library(data.table)
library(ggplot2)
library(patchwork)
library(stringr)
library(qs)
library(Matrix)
# 创建目录
getwd()
gse <- "GSE172357"
dir.create(gse)
# 方式一:标准文件夹
###### step1: 导入数据 ######
samples <- list.files("GSE172357/", pattern = "h5$",recursive = F, full.names = F)
samples
# 挑选文献中的12个
files <- c(samples[1:9],samples[15],samples[17],samples[19])
files
scRNAlist <- lapply(files, function(pro){
#pro <- files[1]
print(pro)
counts <- Read10X_h5(paste0("GSE172357/",pro))
name <- str_split(pro, pattern = "_Via_|_Fcol_|_filtered",simplify = T,n = 2)[,1]
print(name)
sce <- CreateSeuratObject(counts, project=name, min.cells = 3)
return(sce)
})
names(scRNAlist) <- str_split(files, pattern = "_Via_|_Fcol_|_filtered",simplify = T,n = 2)[,1]
scRNAlist
# merge
sce.all <- merge(scRNAlist[[1]], y=scRNAlist[-1], add.cell.ids=names(scRNAlist))
sce.all <- JoinLayers(sce.all) # seurat v5
sce.all
# 查看特征
as.data.frame(sce.all@assays$RNA$counts[1:10, 1:2])
head(sce.all@meta.data, 10)
table(sce.all$orig.ident)
library(qs)
qsave(sce.all, file="GSE172357/sce.all.qs")
挑选的12个样本如下:
成功创建seurat对象:
然后经过简单的质控降维聚类分群,注释如下:
结合这篇推文:空间转录组数据注释分析:RCTD反卷积(Nature Biotechnology IF: 33.1),来看看。
读取上次保存的空转数据:一个样本的空间转录组数据分析:挖掘好思路(GSE242249)
rm(list=ls())
library(spacexr)
library(Matrix)
library(doParallel)
library(ggplot2)
# 输出文件夹,可以换成自己指定的
savedir <- 'RCTD_results_Run'
if(!dir.exists(savedir))
dir.create(savedir)
knitr::opts_chunk$set(collapse = TRUE,comment = "#>",cache = TRUE,out.width = "100%" )
load("../2024-GSE242249-空间单细胞-良性前列腺增生/spatial.Rdata")
counts <- as.matrix(spatial@assays$Spatial$counts)
counts[1:10,1:5]
读取坐标:
# 读取坐标
centroids <- spatial[["slice1"]]@boundaries$centroids
coords <- setNames(as.data.frame(centroids@coords), c("x", "y"))
rownames(coords) <- centroids@cells
head(coords)
计算umi并构建对象:
# 计算每个位置的umi
nUMI <- colSums(counts) # In this case, total counts per pixel is nUMI
head(nUMI)
# 构建一个对象
puck <- SpatialRNA(coords, counts, nUMI)
puck
barcodes <- colnames(puck@counts) # pixels to be used (a list of barcode names).
plot_puck_continuous(puck, barcodes, puck@nUMI,
ylimit = c(0,round(quantile(puck@nUMI,0.9))),
size = 2,
title ='plot of nUMI')
# 看一下构建好的SpatialRNA对象
str(puck)
# count值
sce <- readRDS("../../03-scRNA/2021-GSE172357-12-BPH/3-check-by-0.5/sce.all.int.rds")
counts <- as.matrix(sce@assays$RNA$counts)
counts[1:5,1:5]
# 细胞类型:
cell_types <- sce$celltype
levels(cell_types)[-9]
cell_types <- factor(cell_types, levels = levels(cell_types)[-9])
head(cell_types)
table(cell_types)
# umi
nUMI <- sce$nCount_RNA
head(nUMI)
# 构建对象
reference <- Reference(counts, cell_types, nUMI)
str(reference)
这里注意线程数 max_cores,我用的服务器,资源比较多:
# 运行
myRCTD <- create.RCTD(puck, reference, max_cores = 60)
myRCTD <- run.RCTD(myRCTD, doublet_mode = 'full')
saveRDS(myRCTD,'myRCTD_visium_full.rds')
这里细胞数比较多,实战不知道还会运行多久~
Begin: process_cell_type_info
process_cell_type_info: number of cells in reference: 64142
process_cell_type_info: number of genes in reference: 27508
做到这里突然还意识到一个问题:我采用的是文献中类似的注释水平,但是单细胞的注释应该要再详细一点的,其中髓系细胞都没有分开呢,那结果解读很多不是没有办法展开吗?后面再做一个详细版本的吧~
本次分享到这,明天发可视化结果,与文献比较看看,感受一下RCTD结果怎么样~