首页
学习
活动
专区
圈层
工具
发布
社区首页 >专栏 >空转联合单细胞分析(二):【干一件有功德的事】构建一个20w细胞且有意义的Endometrium scRNA参考图谱

空转联合单细胞分析(二):【干一件有功德的事】构建一个20w细胞且有意义的Endometrium scRNA参考图谱

作者头像
KS科研分享与服务-TS的美梦
发布2025-12-20 17:30:51
发布2025-12-20 17:30:51
1460
举报

因为后期空转反卷积需要scRNA参考数据集,所以这里构建一个。前期演示的空转数据是人类子宫内膜(跟着Nature学空转分析(一):10x visium多样本基础分析的再次回顾---解析生孩子的困境),我们构建的scRNA也是人类子宫内膜scRNA图谱数据。

1、数据介绍及下载

在构建示例数据的时候,突然有了一个想法,把好的数据直接构建为一个公共数据图谱,不失为一种‘功德’的做法。这里演示的数据是人类子宫内膜单细胞图谱,弥补当年读书时没有太注重意识到这个技术、且经费有限造成的遗憾。数据来源于2025年nature communications发表的文章,Cao, D., Liu, Y., Cheng, Y. et al. Time-series single-cell transcriptomic profiling of luteal-phase endometrium uncovers dynamic characteristics and its dysregulation in recurrent implantation failures. Nat Commun 16, 137 (2025). https://doi.org/10.1038/s41467-024-55419-z,由香港大学-深圳医院团队完成。数据集在GEO数据库公开,GSE-number:GSE250130。

为什么认为这个数据集非常有价值呢?因为他收集了完整的时间点。首先简单说一下背景,人类子宫内膜经历周期性变化,目的是为胚胎着床作准备,在整个周期性变化过程中,只有一个时期才能进行着床,称之为“胚胎着床窗口期(WOI)”,一般发生在自然排卵后7-9天(LH-7/LH-9),了解子宫内膜着床前后的细胞、分子变化,能够有效的解析这一时期的分子机制,深入洞悉着床背后的机理,有望能够解决辅助生殖的成功率或者为不孕不育以及反复流产患者提供治疗思路和策略。这篇文章总共纳入了 28 份子宫内膜活检样本,覆盖5个时间点,分别为 LH+3(Day 3 post-luteinizing hormone surge)、LH+5、LH+7、LH+9、LH+11。这时间点涵盖了着床窗口期前后的阶段,非常全面。LH-7着床窗口期还有10例对照的RIF(recurrent implantation failure-反复植入失败)样本。我们这里构建的数据集没有纳入原文3例样本(PGT样本),全部纳入的是正常样本以及对照的10例RIF样本。

代码语言:javascript
复制
#download data
cd data_analysis/10X_space/scRNA_for_Sp/
nohup wget https://ftp.ncbi.nlm.nih.gov/geo/series/GSE250nnn/GSE250130/suppl/GSE250130_RAW.tar &
tar -xvf GSE250130_RAW.tar 
rm GSE250130_RAW.tar
#批量创建文件夹,每个sample一个文件夹
mkdir RIF{1..10}
mkdir LH3_{1..3}
mkdir LH5_{1..3}
mkdir LH7_{1..3}
mkdir LH9_{1..3}
mkdir LH11_{1..3}
#批量解压
for d in */; do
    f=$(ls "$d"/*.tar.gz)
    tar -xzvf "$d"*.tar.gz -C "$d" && rm "$f"
done
#每个样本文件夹里面是标准的3个文件,可用于后续分析
cd RIF1/
ls -lh
# -rw-r--r-- 1 tq_ziv tq_ziv 112K Oct 11  2022 barcodes.tsv.gz
# -rw-r--r-- 1 tq_ziv tq_ziv 326K Oct 11  2022 features.tsv.gz
# -rw-r--r-- 1 tq_ziv tq_ziv 243M Oct 11  2022 matrix.mtx.gz

2、读取单个data

1. Perform seurat pipelines in each sample

首先创建单个sample的seurat obj,并大致进行质控+降维,用于后续去除双细胞,之后再进行整合分析!

代码语言:javascript
复制
library(Seurat)
library(dplyr)
代码语言:javascript
复制
samplename=list.files('./scRNA_for_Sp/')#文件夹目录
names(samplename) = samplename
proj <- list()
samplename
代码语言:javascript
复制
# 1. Perform seurat pipelines in each sample
#单个样本跑正常的流程
setwd("~/data_analysis/10X_space/scRNA_for_Sp/")

for(i in1:length(samplename)){
  print(names(samplename[i]))   
  counts <- Read10X(data.dir = samplename[i])
  newproj <- CreateSeuratObject(counts = counts, min.cells = 10, min.features = 200,project = names(samplename[i]))
  newproj$sample <- names(samplename[i])
  newproj[["percent.mt"]] <- PercentageFeatureSet(object = newproj, pattern = "^MT-")
  newproj <- subset(x = newproj, subset = nCount_RNA < 60000 & nFeature_RNA <8000 & percent.mt < 20)
  newproj <- NormalizeData(newproj)
  newproj <- FindVariableFeatures(newproj,nfeatures=3000)
  newproj <- ScaleData(newproj)
#newproj<-SCTransform(newproj,return.only.var.genes = FALSE,assay = "RNA",verbose = FALSE)
  newproj <- RunPCA(object = newproj,verbose = FALSE)
  newproj <- FindNeighbors(newproj,dim=1:30)
  newproj <- FindClusters(newproj,resolution = 0.5)
  newproj <- RunUMAP (newproj,reduction="pca", dims = 1:30)
  proj[[names(samplename[i])]] <- newproj
}
代码语言:javascript
复制
## [1] "LH11_1"
代码语言:javascript
复制
## Normalizing layer: counts
代码语言:javascript
复制
## Finding variable features for layer counts
代码语言:javascript
复制
## Centering and scaling data matrix
代码语言:javascript
复制
## Computing nearest neighbor graph
代码语言:javascript
复制
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 7331
## Number of edges: 252397
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9116
## Number of communities: 16
## Elapsed time: 1 seconds
代码语言:javascript
复制
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
代码语言:javascript
复制
## 22:15:49 UMAP embedding parameters a = 0.9922 b = 1.112
代码语言:javascript
复制
## 22:15:49 Read 7331 rows and found 30 numeric columns
代码语言:javascript
复制
## 22:15:49 Using Annoy for neighbor search, n_neighbors = 30
代码语言:javascript
复制
## 22:15:49 Building Annoy index with metric = cosine, n_trees = 50
代码语言:javascript
复制
## 0%   10   20   30   40   50   60   70   80   90   100%
代码语言:javascript
复制
## [----|----|----|----|----|----|----|----|----|----|
代码语言:javascript
复制
## **************************************************|
## 22:15:51 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe6c358b2d
## 22:15:51 Searching Annoy index using 1 thread, search_k = 3000
## 22:15:53 Annoy recall = 100%
## 22:15:54 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:15:55 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:15:56 Commencing optimization for 500 epochs, with 296670 positive edges
## 22:16:09 Optimization finished
代码语言:javascript
复制
## [1] "LH11_2"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 10852
## Number of edges: 424944
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9221
## Number of communities: 14
## Elapsed time: 3 seconds
代码语言:javascript
复制
## 22:17:33 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:17:33 Read 10852 rows and found 30 numeric columns
## 22:17:33 Using Annoy for neighbor search, n_neighbors = 30
## 22:17:33 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:17:36 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe3b9ebccf
## 22:17:36 Searching Annoy index using 1 thread, search_k = 3000
## 22:17:41 Annoy recall = 100%
## 22:17:41 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:17:43 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:17:45 Commencing optimization for 200 epochs, with 478136 positive edges
## 22:17:53 Optimization finished
代码语言:javascript
复制
## [1] "LH11_3"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 10631
## Number of edges: 368712
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9374
## Number of communities: 22
## Elapsed time: 2 seconds
代码语言:javascript
复制
## 22:19:12 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:19:12 Read 10631 rows and found 30 numeric columns
## 22:19:12 Using Annoy for neighbor search, n_neighbors = 30
## 22:19:12 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:19:15 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe5b19f2c1
## 22:19:15 Searching Annoy index using 1 thread, search_k = 3000
## 22:19:19 Annoy recall = 100%
## 22:19:19 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:19:21 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:19:22 Commencing optimization for 200 epochs, with 441974 positive edges
## 22:19:30 Optimization finished
代码语言:javascript
复制
## [1] "LH3_1"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 10743
## Number of edges: 364306
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9057
## Number of communities: 18
## Elapsed time: 3 seconds
代码语言:javascript
复制
## 22:21:09 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:21:09 Read 10743 rows and found 30 numeric columns
## 22:21:09 Using Annoy for neighbor search, n_neighbors = 30
## 22:21:09 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:21:12 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe9cdf4f
## 22:21:12 Searching Annoy index using 1 thread, search_k = 3000
## 22:21:17 Annoy recall = 100%
## 22:21:17 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:21:19 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:21:20 Commencing optimization for 200 epochs, with 435826 positive edges
## 22:21:27 Optimization finished
代码语言:javascript
复制
## [1] "LH3_2"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8691
## Number of edges: 285079
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8465
## Number of communities: 11
## Elapsed time: 2 seconds
代码语言:javascript
复制
## 22:22:51 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:22:51 Read 8691 rows and found 30 numeric columns
## 22:22:51 Using Annoy for neighbor search, n_neighbors = 30
## 22:22:51 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:22:53 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe493da909
## 22:22:53 Searching Annoy index using 1 thread, search_k = 3000
## 22:22:57 Annoy recall = 100%
## 22:22:57 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:22:58 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:22:59 Commencing optimization for 500 epochs, with 360818 positive edges
## 22:23:13 Optimization finished
代码语言:javascript
复制
## [1] "LH3_3"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 7020
## Number of edges: 241685
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9308
## Number of communities: 20
## Elapsed time: 1 seconds
代码语言:javascript
复制
## 22:24:12 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:24:12 Read 7020 rows and found 30 numeric columns
## 22:24:12 Using Annoy for neighbor search, n_neighbors = 30
## 22:24:12 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:24:14 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe5160167d
## 22:24:14 Searching Annoy index using 1 thread, search_k = 3000
## 22:24:16 Annoy recall = 100%
## 22:24:17 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:24:18 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:24:19 Commencing optimization for 500 epochs, with 285654 positive edges
## 22:24:31 Optimization finished
代码语言:javascript
复制
## [1] "LH5_1"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 6713
## Number of edges: 247428
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8882
## Number of communities: 16
## Elapsed time: 1 seconds
代码语言:javascript
复制
## 22:25:16 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:25:16 Read 6713 rows and found 30 numeric columns
## 22:25:16 Using Annoy for neighbor search, n_neighbors = 30
## 22:25:16 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:25:18 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe5dc9d450
## 22:25:18 Searching Annoy index using 1 thread, search_k = 3000
## 22:25:21 Annoy recall = 100%
## 22:25:21 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:25:22 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:25:23 Commencing optimization for 500 epochs, with 284998 positive edges
## 22:25:35 Optimization finished
代码语言:javascript
复制
## [1] "LH5_2"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 7935
## Number of edges: 298181
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8195
## Number of communities: 13
## Elapsed time: 2 seconds
代码语言:javascript
复制
## 22:26:30 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:26:30 Read 7935 rows and found 30 numeric columns
## 22:26:30 Using Annoy for neighbor search, n_neighbors = 30
## 22:26:30 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:26:32 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe383c98bd
## 22:26:32 Searching Annoy index using 1 thread, search_k = 3000
## 22:26:35 Annoy recall = 100%
## 22:26:35 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:26:36 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:26:37 Commencing optimization for 500 epochs, with 334126 positive edges
## 22:26:50 Optimization finished
代码语言:javascript
复制
## [1] "LH5_3"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 6437
## Number of edges: 233939
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8992
## Number of communities: 16
## Elapsed time: 1 seconds
代码语言:javascript
复制
## 22:27:35 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:27:35 Read 6437 rows and found 30 numeric columns
## 22:27:35 Using Annoy for neighbor search, n_neighbors = 30
## 22:27:35 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:27:37 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe21d9082f
## 22:27:37 Searching Annoy index using 1 thread, search_k = 3000
## 22:27:39 Annoy recall = 100%
## 22:27:40 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:27:41 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:27:42 Commencing optimization for 500 epochs, with 268708 positive edges
## 22:27:53 Optimization finished
代码语言:javascript
复制
## [1] "LH7_1"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 10070
## Number of edges: 324999
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8503
## Number of communities: 12
## Elapsed time: 2 seconds
代码语言:javascript
复制
## 22:29:21 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:29:21 Read 10070 rows and found 30 numeric columns
## 22:29:21 Using Annoy for neighbor search, n_neighbors = 30
## 22:29:21 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:29:23 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe75675d9
## 22:29:23 Searching Annoy index using 1 thread, search_k = 3000
## 22:29:27 Annoy recall = 100%
## 22:29:28 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:29:29 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:29:30 Commencing optimization for 200 epochs, with 406614 positive edges
## 22:29:37 Optimization finished
代码语言:javascript
复制
## [1] "LH7_2"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 9730
## Number of edges: 344065
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9035
## Number of communities: 16
## Elapsed time: 3 seconds
代码语言:javascript
复制
## 22:30:49 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:30:49 Read 9730 rows and found 30 numeric columns
## 22:30:49 Using Annoy for neighbor search, n_neighbors = 30
## 22:30:49 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:30:52 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe5c21e64
## 22:30:52 Searching Annoy index using 1 thread, search_k = 3000
## 22:30:56 Annoy recall = 100%
## 22:30:56 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:30:58 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:30:59 Commencing optimization for 500 epochs, with 420648 positive edges
## 22:31:15 Optimization finished
代码语言:javascript
复制
## [1] "LH7_3"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8558
## Number of edges: 314480
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9244
## Number of communities: 21
## Elapsed time: 1 seconds
代码语言:javascript
复制
## 22:32:24 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:32:24 Read 8558 rows and found 30 numeric columns
## 22:32:24 Using Annoy for neighbor search, n_neighbors = 30
## 22:32:24 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:32:26 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe5183a39f
## 22:32:26 Searching Annoy index using 1 thread, search_k = 3000
## 22:32:29 Annoy recall = 100%
## 22:32:29 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:32:30 Found 2 connected components, falling back to 'spca' initialization with init_sdev = 1
## 22:32:30 Using 'irlba' for PCA
## 22:32:30 PCA: 2 components explained 42.86% variance
## 22:32:30 Scaling init to sdev = 1
## 22:32:31 Commencing optimization for 500 epochs, with 362708 positive edges
## 22:32:45 Optimization finished
代码语言:javascript
复制
## [1] "LH9_1"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 7331
## Number of edges: 257585
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9130
## Number of communities: 17
## Elapsed time: 1 seconds
代码语言:javascript
复制
## 22:33:45 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:33:45 Read 7331 rows and found 30 numeric columns
## 22:33:45 Using Annoy for neighbor search, n_neighbors = 30
## 22:33:45 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:33:46 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe10730ae9
## 22:33:46 Searching Annoy index using 1 thread, search_k = 3000
## 22:33:49 Annoy recall = 100%
## 22:33:50 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:33:51 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:33:53 Commencing optimization for 500 epochs, with 299704 positive edges
## 22:34:05 Optimization finished
代码语言:javascript
复制
## [1] "LH9_2"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 7727
## Number of edges: 256640
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9164
## Number of communities: 19
## Elapsed time: 1 seconds
代码语言:javascript
复制
## 22:35:00 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:35:00 Read 7727 rows and found 30 numeric columns
## 22:35:00 Using Annoy for neighbor search, n_neighbors = 30
## 22:35:00 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:35:02 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe73ed0e2a
## 22:35:02 Searching Annoy index using 1 thread, search_k = 3000
## 22:35:04 Annoy recall = 100%
## 22:35:05 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:35:06 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:35:08 Commencing optimization for 500 epochs, with 307292 positive edges
## 22:35:20 Optimization finished
代码语言:javascript
复制
## [1] "LH9_3"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 6976
## Number of edges: 233344
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9326
## Number of communities: 20
## Elapsed time: 1 seconds
代码语言:javascript
复制
## 22:36:09 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:36:09 Read 6976 rows and found 30 numeric columns
## 22:36:09 Using Annoy for neighbor search, n_neighbors = 30
## 22:36:09 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:36:11 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe2b0f537d
## 22:36:11 Searching Annoy index using 1 thread, search_k = 3000
## 22:36:13 Annoy recall = 100%
## 22:36:14 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:36:15 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:36:16 Commencing optimization for 500 epochs, with 282140 positive edges
## 22:36:27 Optimization finished
代码语言:javascript
复制
## [1] "RIF1"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 22262
## Number of edges: 769073
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9033
## Number of communities: 20
## Elapsed time: 11 seconds
代码语言:javascript
复制
## 22:38:55 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:38:55 Read 22262 rows and found 30 numeric columns
## 22:38:55 Using Annoy for neighbor search, n_neighbors = 30
## 22:38:55 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:39:01 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe1484e8ba
## 22:39:01 Searching Annoy index using 1 thread, search_k = 3000
## 22:39:12 Annoy recall = 100%
## 22:39:12 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:39:14 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:39:15 Commencing optimization for 200 epochs, with 959050 positive edges
## 22:39:30 Optimization finished
代码语言:javascript
复制
## [1] "RIF10"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 7406
## Number of edges: 271672
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9367
## Number of communities: 19
## Elapsed time: 1 seconds
代码语言:javascript
复制
## 22:40:15 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:40:15 Read 7406 rows and found 30 numeric columns
## 22:40:15 Using Annoy for neighbor search, n_neighbors = 30
## 22:40:15 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:40:17 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe11071bcd
## 22:40:17 Searching Annoy index using 1 thread, search_k = 3000
## 22:40:20 Annoy recall = 100%
## 22:40:20 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:40:21 Found 2 connected components, falling back to 'spca' initialization with init_sdev = 1
## 22:40:21 Using 'irlba' for PCA
## 22:40:22 PCA: 2 components explained 52.4% variance
## 22:40:22 Scaling init to sdev = 1
## 22:40:22 Commencing optimization for 500 epochs, with 312088 positive edges
## 22:40:34 Optimization finished
代码语言:javascript
复制
## [1] "RIF2"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8540
## Number of edges: 290816
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8767
## Number of communities: 14
## Elapsed time: 2 seconds
代码语言:javascript
复制
## 22:41:40 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:41:40 Read 8540 rows and found 30 numeric columns
## 22:41:40 Using Annoy for neighbor search, n_neighbors = 30
## 22:41:40 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:41:43 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe569248d7
## 22:41:43 Searching Annoy index using 1 thread, search_k = 3000
## 22:41:46 Annoy recall = 100%
## 22:41:46 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:41:48 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:41:48 Commencing optimization for 500 epochs, with 348870 positive edges
## 22:42:02 Optimization finished
代码语言:javascript
复制
## [1] "RIF3"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 9510
## Number of edges: 328576
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9243
## Number of communities: 20
## Elapsed time: 2 seconds
代码语言:javascript
复制
## 22:43:05 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:43:05 Read 9510 rows and found 30 numeric columns
## 22:43:05 Using Annoy for neighbor search, n_neighbors = 30
## 22:43:05 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:43:07 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefeef018ba
## 22:43:07 Searching Annoy index using 1 thread, search_k = 3000
## 22:43:11 Annoy recall = 100%
## 22:43:12 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:43:13 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:43:15 Commencing optimization for 500 epochs, with 393382 positive edges
## 22:43:31 Optimization finished
代码语言:javascript
复制
## [1] "RIF4"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 11375
## Number of edges: 397213
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9029
## Number of communities: 17
## Elapsed time: 3 seconds
代码语言:javascript
复制
## 22:44:51 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:44:51 Read 11375 rows and found 30 numeric columns
## 22:44:51 Using Annoy for neighbor search, n_neighbors = 30
## 22:44:51 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:44:54 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefefd96ed2
## 22:44:54 Searching Annoy index using 1 thread, search_k = 3000
## 22:44:58 Annoy recall = 100%
## 22:44:59 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:45:00 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:45:02 Commencing optimization for 200 epochs, with 466050 positive edges
## 22:45:09 Optimization finished
代码语言:javascript
复制
## [1] "RIF5"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 9081
## Number of edges: 304377
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9361
## Number of communities: 19
## Elapsed time: 1 seconds
代码语言:javascript
复制
## 22:46:18 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:46:18 Read 9081 rows and found 30 numeric columns
## 22:46:18 Using Annoy for neighbor search, n_neighbors = 30
## 22:46:18 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:46:21 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe45fbbba5
## 22:46:21 Searching Annoy index using 1 thread, search_k = 3000
## 22:46:24 Annoy recall = 100%
## 22:46:25 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:46:26 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:46:27 Commencing optimization for 500 epochs, with 374320 positive edges
## 22:46:42 Optimization finished
代码语言:javascript
复制
## [1] "RIF6"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 21163
## Number of edges: 746019
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8614
## Number of communities: 15
## Elapsed time: 11 seconds
代码语言:javascript
复制
## 22:49:02 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:49:02 Read 21163 rows and found 30 numeric columns
## 22:49:02 Using Annoy for neighbor search, n_neighbors = 30
## 22:49:02 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:49:07 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe64331abe
## 22:49:07 Searching Annoy index using 1 thread, search_k = 3000
## 22:49:15 Annoy recall = 100%
## 22:49:16 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:49:18 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:49:20 Commencing optimization for 200 epochs, with 914458 positive edges
## 22:49:35 Optimization finished
代码语言:javascript
复制
## [1] "RIF7"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 8515
## Number of edges: 290692
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9016
## Number of communities: 16
## Elapsed time: 2 seconds
代码语言:javascript
复制
## 22:50:38 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:50:38 Read 8515 rows and found 30 numeric columns
## 22:50:38 Using Annoy for neighbor search, n_neighbors = 30
## 22:50:38 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:50:40 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe73191fce
## 22:50:40 Searching Annoy index using 1 thread, search_k = 3000
## 22:50:43 Annoy recall = 100%
## 22:50:44 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:50:46 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:50:47 Commencing optimization for 500 epochs, with 349838 positive edges
## 22:51:01 Optimization finished
代码语言:javascript
复制
## [1] "RIF8"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 6715
## Number of edges: 232670
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8858
## Number of communities: 14
## Elapsed time: 1 seconds
代码语言:javascript
复制
## 22:51:52 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:51:52 Read 6715 rows and found 30 numeric columns
## 22:51:52 Using Annoy for neighbor search, n_neighbors = 30
## 22:51:52 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:51:53 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe633b2d9e
## 22:51:53 Searching Annoy index using 1 thread, search_k = 3000
## 22:51:56 Annoy recall = 100%
## 22:51:56 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:51:58 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:51:58 Commencing optimization for 500 epochs, with 277944 positive edges
## 22:52:09 Optimization finished
代码语言:javascript
复制
## [1] "RIF9"
代码语言:javascript
复制
## Normalizing layer: counts
## Finding variable features for layer counts
## Centering and scaling data matrix
## Computing nearest neighbor graph
## Computing SNN
代码语言:javascript
复制
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 7278
## Number of edges: 266568
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9239
## Number of communities: 20
## Elapsed time: 1 seconds
代码语言:javascript
复制
## 22:53:06 UMAP embedding parameters a = 0.9922 b = 1.112
## 22:53:06 Read 7278 rows and found 30 numeric columns
## 22:53:06 Using Annoy for neighbor search, n_neighbors = 30
## 22:53:06 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 22:53:08 Writing NN index file to temp file /tmp/RtmpmI4BPa/file3aefe69dbba4
## 22:53:08 Searching Annoy index using 1 thread, search_k = 3000
## 22:53:11 Annoy recall = 100%
## 22:53:12 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 22:53:13 Initializing from normalized Laplacian + noise (using RSpectra)
## 22:53:15 Commencing optimization for 500 epochs, with 307572 positive edges
## 22:53:28 Optimization finished

合并一下,看看一些基本信息:

代码语言:javascript
复制
colors_map = c("aquamarine1", "bisque", "blue", "brown", "brown1", "cadetblue", "cyan", "chartreuse3", "chocolate", "coral1", "darkorange", "cornflowerblue", "darkgoldenrod", "darkolivegreen", "darkmagenta", "darkolivegreen1", "deeppink1", "grey", "purple", "darkblue", "black", "pink1", "plum1", "yellow1", "olivedrab1","coral4", "hotpink", "tan1")
merged <- merge(proj[[1]], y = proj[-1])
代码语言:javascript
复制
VlnPlot(merged,features = "nCount_RNA",pt.size = 0,cols=colors_map,group.by = 'orig.ident')
代码语言:javascript
复制
VlnPlot(merged,features = "percent.mt",pt.size = 0,cols=colors_map,group.by = 'orig.ident')
代码语言:javascript
复制
VlnPlot(merged,features = "nFeature_RNA",pt.size = 0,cols=colors_map,group.by = 'orig.ident')

3、去除双细胞

  1. 关于DoubletFinder常见的问题:

(1):预期双细胞(doublet)比例应该是多少? 取决于你所使用的平台(如 10x、Parse等),并会随着输入细胞数量的不同而变化。10X大概参考https://github.com/chris-mcginnis-ucsf/DoubletFinder/issues/76

(2):DoubletFinder是单个样本运行还是整合后数据中运行? 不建议在已整合(integrated)的Seurat对象上运行DoubletFinder,单个样本分开运行。而且,最好单个样本先运行降维聚类去除双细胞,去除之后再整合数据运行流程。

(3):在可视化BCmvn时可能出现多个pK值,怎么选择? 可以查看不同的PK值结果,根据你对数据的理解选择最合理的那个。

代码语言:javascript
复制
library(DoubletFinder)
#加载函数,这个整合函数是KS科研分享与服务自写函数
#内容在合集-单细胞scRNA-seuratv4-DoubletFinder
source('~/data_analysis/10X_space/ks_detectDoublet.R')
代码语言:javascript
复制
#查看每个sample total cells
total_sample_cells <- sapply(proj, ncol)
total_sample_cells
代码语言:javascript
复制
sample_estDubRate = c(0.054,0.076,0.076,0.076,0.061,0.054,0.054,
                      0.061,0.046,0.076,0.076,0.061,0.054,0.054,
                      0.054,0.08,0.054,0.061,0.069,0.076,0.069,0.08,0.061,0.054,0.054)
for(i in seq_along(proj)){

  proj[[i]] <- ks_detectDoublet(proj[[i]],
                                dims = 1:30,
                                estDubRate=sample_estDubRate[i],
                                ncores = 1, 
                                SCTransform=F, #原始处理seurat是否使用了SCTransform?
                                Homotypic=F, 
                                annotation="seurat_clusters")
}
代码语言:javascript
复制
## Loading required package: fields
代码语言:javascript
复制
## Loading required package: spam
代码语言:javascript
复制
## Spam version 2.10-0 (2023-10-23) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction 
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.
代码语言:javascript
复制
## 
## Attaching package: 'spam'
代码语言:javascript
复制
## The following objects are masked from 'package:base':
## 
##     backsolve, forwardsolve
代码语言:javascript
复制
## Loading required package: viridisLite
代码语言:javascript
复制
## 
## Try help(fields) to get started.
代码语言:javascript
复制
## Loading required package: parallel
代码语言:javascript
复制
## [1] "Creating artificial doublets for pN = 5%"
## [1] "Creating Seurat object..."
## [1] "Normalizing Seurat object..."
代码语言:javascript
复制
## Normalizing layer: counts
代码语言:javascript
复制
## [1] "Finding variable genes..."
代码语言:javascript
复制
## Finding variable features for layer counts
代码语言:javascript
复制
## [1] "Scaling data..."
代码语言:javascript
复制
## Centering and scaling data matrix
代码语言:javascript
复制
## [1] "Running PCA..."
## [1] "Calculating PC distance matrix..."
## [1] "Defining neighborhoods..."
## [1] "Computing pANN across all pK..."
## [1] "pK = 0.005..."
## [1] "pK = 0.01..."
## [1] "pK = 0.02..."
## [1] "pK = 0.03..."
## [1] "pK = 0.04..."
## [1] "pK = 0.05..."
## [1] "pK = 0.06..."
## [1] "pK = 0.07..."
## [1] "pK = 0.08..."
## [1] "pK = 0.09..."
## [1] "pK = 0.1..."
## [1] "pK = 0.11..."
## [1] "pK = 0.12..."
## [1] "pK = 0.13..."
## [1] "pK = 0.14..."
## [1] "pK = 0.15..."
## [1] "pK = 0.16..."
## [1] "pK = 0.17..."
## [1] "pK = 0.18..."
## [1] "pK = 0.19..."
## [1] "pK = 0.2..."
## [1] "pK = 0.21..."
## [1] "pK = 0.22..."
## [1] "pK = 0.23..."
## [1] "pK = 0.24..."
## [1] "pK = 0.25..."
## [1] "pK = 0.26..."
## [1] "pK = 0.27..."
## [1] "pK = 0.28..."
## [1] "pK = 0.29..."
## [1] "pK = 0.3..."
## [1] "Creating artificial doublets for pN = 10%"
## [1] "Creating Seurat object..."
## [1] "Normalizing Seurat object..."
代码语言:javascript
复制
#plot doublets UMAP
library(ggplot2)
duPlot <- list()
for (i in seq_along(proj)) {
  p = DimPlot(proj[[i]], group.by = "DF.classify")+ggtitle(unique(proj[[i]]$orig.ident))
  duPlot[[i]] <- p
}
代码语言:javascript
复制
#plot doublets UMAP
library(ggplot2)
duPlot <- list()
for (i in seq_along(proj)) {
  p = DimPlot(proj[[i]], group.by = "DF.classify")+ggtitle(unique(proj[[i]]$orig.ident))
  duPlot[[i]] <- p
}
代码语言:javascript
复制
#remove doublets
for (i in seq_along(proj)) {
  proj[[i]] <- subset(proj[[i]], subset = (DF.classify == "Singlet"))
}
sapply(proj, ncol)

4、integration

首先合并数据。

代码语言:javascript
复制
#merge
sce_edm <- merge(proj[[1]], y = proj[-1])
sce_edm <- subset(x = sce_edm, subset = nCount_RNA < 30000)
#NormalizeData & ScaleData
sce_edm <- NormalizeData(sce_edm)
代码语言:javascript
复制
sce_edm <- FindVariableFeatures(sce_edm,nfeatures=3000)
代码语言:javascript
复制
sce_edm <- ScaleData(sce_edm)
代码语言:javascript
复制
sce_edm <- RunPCA(sce_edm, verbose=F)
代码语言:javascript
复制
Seurat::ElbowPlot(sce_edm,ndims = 50)

整合批次矫正这里使用harmony!

代码语言:javascript
复制
sce_edm <- IntegrateLayers(object = sce_edm, 
                           method = HarmonyIntegration, 
                           orig.reduction = "pca", 
                           new.reduction = "integrated.Harmony",
                           verbose = FALSE)
代码语言:javascript
复制
sce_edm <- IntegrateLayers(object = sce_edm, 
                           method = HarmonyIntegration, 
                           orig.reduction = "pca", 
                           new.reduction = "integrated.Harmony",
                           verbose = FALSE)
代码语言:javascript
复制
# re-join layers after integration
sce_edm[["RNA"]] <- JoinLayers(sce_edm[["RNA"]])

Perform dimension reduction:

代码语言:javascript
复制
sce_edm <- FindNeighbors(sce_edm, reduction = "integrated.Harmony", dims = 1:25)
sce_edm <- FindClusters(sce_edm, resolution = 0.8)
sce_edm <- RunUMAP(sce_edm, dims = 1:25, reduction = "integrated.Harmony")

plot results:

代码语言:javascript
复制
DimPlot(sce_edm, reduction = "umap", label = T,pt.size = 0.8)
代码语言:javascript
复制
DimPlot(sce_edm, reduction = "umap", label = T,pt.size = 0.8,split.by = 'orig.ident',ncol = 5) +NoLegend()

接下来就是对celltype的注释,当然对于QC,每个人都有自己的理解,我们构建数据并不是标准,是根据实际情况进行了调整!

觉得分享有用的点个赞再走呗!下节继续反卷积的分析流程!

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2025-12-04,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 KS科研分享与服务 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 1、数据介绍及下载
  • 2、读取单个data
    • 1. Perform seurat pipelines in each sample
  • 3、去除双细胞
  • 4、integration
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档