今天分享一个来自2022年发表在Cancer Discov杂志(IF=30+)上的空间转录组数据分析,看看多样本分析策略吧。文献标题为《Spatiotemporal Immune Landscape of Colorectal Cancer Liver Metastasis at Single-Cell Level》。
数据下载:http://www.cancerdiversity.asia/scCRLM
我上一次下载的时候还是提供的space ranger的输出结果文件夹格式,这次直接给的rds格式。
This generated data sets containing 3,826 (ST-P1, liver), 4,658 (ST-P2, liver), 3,695 (ST-P3, liver), 3,721 (ST-P4, liver), 3,313 (ST-P1, colon), 4,174 (ST-P2, colon), 4,007 (ST-P3, colon), and 3,902 (ST-P4, colon) spots.



文件夹格式的我放到了网盘:通过网盘分享的文件:scCRLM 链接: https://pan.baidu.com/s/1WTU_KdjEcfaGOP2jdokULw?pwd=fs35 提取码: fs35
如果链接失效,加新叶的微信:Biotree123

这里还有一个内置的脚本:里面有非常多好的函数,我放到这里 SeuratWrappers.R
通过网盘分享的文件:WrapperFunction 链接: https://pan.baidu.com/s/1B3qHgCLD00Zuars3ik0h6w?pwd=3n6p 提取码: 3n6p
## Analysis for ST_CRC_LiverMetastasis
rm(list=ls())
library(dplyr)
library(patchwork)
## We need to load a previous version of spatstat to make Seurat run. Related
## to the following issue https://github.com/satijalab/seurat/issues/4226
library(Seurat)
library(stringr)
library(purrr)
library(readr)
library(tidyr)
library(ggplot2)
#library(tidyverse)
library(kableExtra)
library(gridExtra)
library(cowplot)
library(msigdbr)
library(clusterProfiler)
# library(EnhancedVolcano)
source(file = "WrapperFunction/SeuratWrappers.R")
如果上面提示缺少啥就安装啥好了。
这里一次性将所有空转样本读取进来,得到一个list对象。
这里还可以学到 空转的表达矩阵与切片信息分开读取。
## 1) Generating Seurat Objects
# # a list of seurat objects, one object per slice (per dataset)
# seurat_objects <-
# get.seurat.objects(datasets,sample_names)
# 读取数据,有h5与tissue_hires_image.png
indir <- "../Cell2location/scCRLM/data/ST_matrix2/"# 加上最后的"/"
samples <- list.dirs(indir, recursive = F, full.names = F)
samples
seurat_objects <- lapply(samples, function(pro){
#pro <- samples[1]
print(pro)
# 先读取 h5
data <- Read10X_h5(filename = paste0(indir, pro,"/filtered_feature_bc_matrix.h5"))
object <- CreateSeuratObject(counts = data, assay = "Spatial", min.cells = 3, project = pro)
object
# 再读取 spatial 目录
image <- Read10X_Image(image.dir = paste0(indir,pro),
image.name = "tissue_lowres_image.png",
filter.matrix = TRUE)
image
image <- image[Cells(x = object)]
DefaultAssay(object = image) <- "Spatial"
# 添加图片到object中
object[[pro]] <- image
#object@images[[1]]@scale.factors$lowres <- object@images[[1]]@scale.factors$hires
return(object)
})
names(seurat_objects) <- samples
seurat_objects
结果如下:
$`ST-colon1`
An object of class Seurat
41023 features across 3313 samples within 2 assays
Active assay: SCT (19901 features, 3000 variable features)
3 layers present: counts, data, scale.data
1 other assay present: Spatial
2 dimensional reductions calculated: pca, umap
1 spatial field of view present: ST.colon1
$`ST-colon2`
An object of class Seurat
41100 features across 4174 samples within 2 assays
Active assay: SCT (19903 features, 3000 variable features)
3 layers present: counts, data, scale.data
1 other assay present: Spatial
2 dimensional reductions calculated: pca, umap
1 spatial field of view present: ST.colon2
$`ST-colon3`
An object of class Seurat
38937 features across 4007 samples within 2 assays
Active assay: SCT (18819 features, 3000 variable features)
3 layers present: counts, data, scale.data
1 other assay present: Spatial
2 dimensional reductions calculated: pca, umap
1 spatial field of view present: ST.colon3
$`ST-colon4`
An object of class Seurat
39968 features across 3902 samples within 2 assays
Active assay: SCT (19384 features, 3000 variable features)
3 layers present: counts, data, scale.data
1 other assay present: Spatial
2 dimensional reductions calculated: pca, umap
1 spatial field of view present: ST.colon4
$`ST-liver1`
An object of class Seurat
43012 features across 3826 samples within 2 assays
Active assay: SCT (20898 features, 3000 variable features)
3 layers present: counts, data, scale.data
1 other assay present: Spatial
2 dimensional reductions calculated: pca, umap
1 spatial field of view present: ST.liver1
$`ST-liver2`
An object of class Seurat
43055 features across 4658 samples within 2 assays
Active assay: SCT (20931 features, 3000 variable features)
3 layers present: counts, data, scale.data
1 other assay present: Spatial
2 dimensional reductions calculated: pca, umap
1 spatial field of view present: ST.liver2
$`ST-liver3`
An object of class Seurat
38681 features across 3695 samples within 2 assays
Active assay: SCT (18713 features, 3000 variable features)
3 layers present: counts, data, scale.data
1 other assay present: Spatial
2 dimensional reductions calculated: pca, umap
1 spatial field of view present: ST.liver3
$`ST-liver4`
An object of class Seurat
37322 features across 3721 samples within 2 assays
Active assay: SCT (18040 features, 3000 variable features)
3 layers present: counts, data, scale.data
1 other assay present: Spatial
2 dimensional reductions calculated: pca, umap
1 spatial field of view present: ST.liver4
对每个样本单独计算,使用lapply函数:
## 2) 计算线粒体比例
seurat_objects <- lapply(seurat_objects, function(x){
percentage_mt <- PercentageFeatureSet(x, pattern = "^MT-")
AddMetaData(x, percentage_mt, col.name = "percent.mt")
})
head(seurat_objects[[1]]@meta.data)
## 3) 这里不进行过滤
## We are not going to filter our samples since the QC results are very different
seurat_objects_filter <-seurat_objects # %>%
# map(function(x) {
# x %>%
# subset(subset = percent.mt < 50 & nCount_Spatial > 500 & nCount_Spatial < 45000)
#})

这里使用了前面source的脚本中的函数 : source(file = "WrapperFunction/SeuratWrappers.R")
## 4) 每个样本单独进行SCT标准化,降维聚类分群
seurat_objects <- normalize.cluster(seurat_objects_filter, min.dist = 0.1,
spread = 0.5, resolution = 0.5)
normalize.cluster()函数内容如下:
normalize.cluster <- function(list_seurat, k.param = 20, resolution = 0.8,
min.dist = 0.3, spread = 1){
seurat_objects <- list_seurat %>%
map(function(x) {
SCTransform(x, assay = "Spatial", verbose = FALSE) %>%
RunPCA(assay = "SCT", verbose = FALSE) %>%
FindNeighbors(reduction = "pca", dims = 1:30, k.param= k.param) %>%
FindClusters(verbose = FALSE, resolution = resolution) %>%
RunUMAP(reduction = "pca", dims = 1:30,
min.dist = min.dist, spread = spread)
})
return(seurat_objects)
}
这样跑起来超方便!
同样也是定义了一个函数:
## 5) 绘制umap 聚类结果图
umap_spatial_plots <- get.umap.spatialClusters(seurat_objects,
label.size = 3, pt.size.factor = 1.75)
# 高亮每个亚群
spatial_dim_plot_ident <- lapply(seurat_objects, function(x){
SpatialDimPlot(x, cells.highlight = CellsByIdentities(object = x),
facet.highlight = TRUE, ncol = 2, pt.size.factor = 2.5,
cols.highlight = c("#8b0023", "#ffffff00"), stroke = 0)
})
get.umap.spatialClusters的内容:
get.umap.spatialClusters <- function(list_seurat, label = TRUE, ...){
umap_spatial_plots <- lapply(list_seurat, function(x){
umap_plot <- Seurat::DimPlot(x, reduction = "umap", label = TRUE)
spatial_plot <- Seurat::SpatialDimPlot(x, label = TRUE, ...)
umap_plot + spatial_plot
})
return(umap_spatial_plots)
}
看看一个样本的聚类结果:
umap_spatial_plots[[1]] + spatial_dim_plot_ident[[1]]

每个样本的亚群差异高表达基因分析:
## 6) 亚群差异分析
all_markers <-
get.all.markers(seurat_objects, logfc.threshold = 0.5, return.thresh = 0.01,
only.pos = TRUE)
# 提取每个亚群top2的差异基因
all_markers_table <- get.markers.table(all_markers, nr_markers = 2)
dir.create("ST_CRC_LiverMetastasis/ClusteringResults/Resolution05/", recursive = T)
write.all.markers(all_markers,
path = "ST_CRC_LiverMetastasis/ClusteringResults/Resolution05/")
write.loupe.file(seurat_objects,
path = "ST_CRC_LiverMetastasis/ClusteringResults/Resolution05/")
# 看一个样本的
all_markers_table$`ST-colon1`

这里用了一个for循环,我就放一个样本的结果:
## 8) 不同样本结果展示
for ( sample in names(seurat_objects )){
# sample <- names(seurat_objects )[1]
cat( "## Sample : ", sample , "\n\n" )
cat( "Showing sample : ", sample , "\n\n")
cat( "### Clusters and their molecular characterization \n\n")
cat( "**Umap and Spatial Plot of clusters** \n\n")
print( umap_spatial_plots[[sample]])
cat( "\n\n")
print(spatial_dim_plot_ident[[sample]])
cat( "\n\n")
cat( "**Table containing some of the top markers of the clusters** \n\n")
print( all_markers_table[[sample]] %>%
kbl() %>% kable_styling())
cat( "\n\n")
top5 <- all_markers[[sample]] %>%
group_by(cluster) %>%
top_n(n = 5, wt = avg_log2FC)
cat( "**Heatmap showing the markers of the different clusters ** \n\n")
print(DoHeatmap(seurat_objects[[sample]], features = top5$gene) + NoLegend())
cat( "\n\n")
cat( "**Spatial and Violin plots showing the expression of the markers of different clusters ** \n\n")
for (cluster_nr in levels(Idents(seurat_objects[[sample]]))){
cat( "Cluster : ", cluster_nr , "\n\n")
genes_cluster <- all_markers_table[[sample]] %>%
dplyr::filter(cluster == as.numeric(cluster_nr)) %>%
dplyr::pull(gene)
print(SpatialFeaturePlot(seurat_objects[[sample]],
features = genes_cluster,
pt.size.factor = 2.25, alpha = c(0.1,1),ncol=2))
cat( "\n\n")
print(VlnPlot(seurat_objects[[sample]],
features = genes_cluster, ncol=2))
cat( "\n\n")
}
cat( "\n\n")
cat( "**Enrichment Results:** The dashed line is the statistical treshold for a significant enrichment \n\n")
print(get.enrichment.barplots.cp(enrichment_results[[sample]]))
cat( "\n\n")
}
聚类结果:

每个cluster单独显示:

亚群top2基因热图:

每个亚群top2的切片特征图和小提琴:

最后结果保存一下:
# 保存
saveRDS(object = seurat_objects, file = "ST_CRC_LiverMetastasis/ClusteringResults/Resolution05/SeuratList_Clusters_Res05.rds")
以上就是一整套空转多样本的标准分析流程了。
如果你想系统的学习空转内容,看看我们最新的课程:千呼万唤始出来:单细胞+空间转录组分析培训班