分享是一种态度
too-many-cells
软件,一种是 TooManyCellsR
R 包,由于 R 包报错问题尚未解决,所以本教程只涉及 too-many-cells
软件下面仅介绍一种安装方式,更多方式请见 github 文档
## 安装 stack
curl -sSL https://get.haskellstack.org/ | sh #这个 curl 可以用 conda 安装
stack setup
## 安装 too-many-cells
git clone https://github.com/GregorySchwartz/too-many-cells.git
cd too-many-cells
stack install
nohup.out
看到最后提示将 /blabla/.local/bin
路径加入 PATH 变量,因为这是软件的安装路径个人习惯,project 下建三个文件夹:
file="matrix.mtx"
sink(file)
cat("%%MatrixMarket matrix coordinate integer general\n")
cat("%\n")
sink()
tmp=do.call(rbind,lapply(1:ncol(ct),function(i){
return(data.frame(row=1:nrow(ct),
col=i,
exp=ct[,i]))
}))
labels.csv
,大致长下面这个样子:
如果已知细胞有不同的来源,或者数据分析之后对细胞有注释需求都可以通过这个输入文件实现
too-many-cells make-tree \
--matrix-path ../input/expr_count.csv \
--labels-file ../input/labels.csv \
--draw-collection "PieRing" \
--output ../out \
> ../out/clusters.csv
clusters.csv
,记录聚类结果,一个是 dendrogram.svg
,可视化聚类结果too-many-cells make-tree \
--prior ../out \
--labels-file ../input/labels.csv \
--smart-cutoff 1 \ #经调试,我的数据最合适的值是1
--min-size 1 \
--draw-collection "PieChart" \
--output ../out_pruned \
> ../out_pruned/clusters_pruned.csv
clusters_pruned.csv
了$ head clusters_pruned.csv
cell,cluster,path
AAACGGGAGGTGTTAA.1,9,9/8/7/6/5/4/3/2/1/0
AACACGTTCGGCGGTT.1,9,9/8/7/6/5/4/3/2/1/0
AACCGCGGTATATGAG.1,9,9/8/7/6/5/4/3/2/1/0
ACACCCTTCTGGTTCC.1,9,9/8/7/6/5/4/3/2/1/0
ACCTTTAAGGTGTTAA.1,9,9/8/7/6/5/4/3/2/1/0
ACGAGGACACGTTGGC.1,9,9/8/7/6/5/4/3/2/1/0
AGGGAGTCAGGCTCAC.1,9,9/8/7/6/5/4/3/2/1/0
AGGGATGAGCGATAGC.1,9,9/8/7/6/5/4/3/2/1/0
AGTGGGAAGATGTAAC.1,9,9/8/7/6/5/4/3/2/1/0
too-many-cells make-tree \
--prior ../out \
--labels-file ../input/labels.csv \
--smart-cutoff 1 \
--min-size 1 \
--draw-collection "PieChart" \
--draw-node-number \ #只需多加这个参数
--output ../out_pruned \
> ../out_pruned/clusters_pruned.csv
too-many-cells make-tree \
--prior out \
--labels-file labels.csv \
--smart-cutoff 4 \
--min-size 1 \
--draw-collection "PieChart" \
--output out_pruned \
> clusters_pruned.csv
too-many-cells make-tree \
--prior out \
--labels-file labels.csv \
--smart-cutoff 4 \
--min-size 1 \
--draw-collection "PieChart" \
--draw-max-node-size 40 \
--output out_pruned \
> clusters_pruned.csv
too-many-cells make-tree \
--prior out \
--labels-file labels.csv \
--smart-cutoff 4 \
--min-size 1 \
--draw-collection "PieChart" \
--draw-max-node-size 40 \
--draw-no-scale-nodes \
--output out_pruned \
> clusters_pruned.csv
too-many-cells make-tree \
--prior ../out \
--matrix-path ../input/expr_count.csv \
--labels-file ../input/labels.csv \
--smart-cutoff 1 \
--min-size 1 \
--feature-column 2 \
--draw-leaf "DrawItem (DrawThresholdContinuous [(\"gene1\", 0), (\"gene2\", 0)])" \
--draw-colors "[\"#e41a1c\", \"#377eb8\", \"#4daf4a\", \"#eaeaea\"]" \
--draw-scale-saturation 10 \ #如果不加这个参数,很可能表达量普遍较低以至于整张图没有颜色,至于这个值多少比较合适我还没有试过
--output ../out_gene_expression \
> ../out_gene_expression/clusters_pruned.csv
too-many-cells differential \
--matrix-path ../input/expr_count.csv \
-n "([110], [148])" \
+RTS -N24
> ../out/differential.csv
too-many-cells diversity\
--priors ../out1 \
--priors ../out2 \
-o ../out_diversity_stats
too-many-cells paths\
--prior ../out \
--labels-file ../input/labels.csv \
--bandwidth 3 \
-o ../out_paths