前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >单细胞数据复现-肺癌文章代码复现5

单细胞数据复现-肺癌文章代码复现5

原创
作者头像
小胡子刺猬的生信学习123
发布2022-05-22 17:16:40
8500
发布2022-05-22 17:16:40
举报
文章被收录于专栏:文献分享及代码学习

单细胞数据复现-肺癌文章代码复现1https://cloud.tencent.com/developer/article/1992648

单细胞数据复现-肺癌文章代码复现2https://cloud.tencent.com/developer/article/1995619

单细胞数据复现-肺癌文章代码复现3https://cloud.tencent.com/developer/article/1996043

单细胞数据复现-肺癌文章代码复现4https://cloud.tencent.com/developer/article/2006654

教程3和4主要是分别对epi细胞亚群进行的分析,也是将亚群细分,然后去找里面比较重要的基因。今天的代码是对str亚群进行的分析。

ps:我发现放上图片后所占的版面过长,因此我就不放自己做出来的图了,是基本和原文一致的,基本按照我相关的代码是可以出来的,大家可以下载这篇文章的附图,进行比较,是不是自己需要调参数,因此我在后面有一个讲解的教程里面,单独进行拼图,与原文进行比较。

R环境的包及颜色配置加载

代码语言:javascript
复制
### load libraries
library(Seurat)
library(dplyr)
library(reticulate)
library(sctransform)
library(cowplot)
library(ggplot2)
library(viridis)
library(tidyr)
library(magrittr)
library(reshape2)
library(readxl)
library(readr)
library(stringr)
library(gplots)
library(grid)
library(rlang)
library(tibble)

theme_set(theme_cowplot())

#color scheme
use_colors <- c(
  Tumor = "brown2",
  Normal = "deepskyblue2",
  G1 = "#46ACC8",
  G2M = "#E58601",
  S = "#B40F20",
  Epithelial = "seagreen",
  Immune = "darkgoldenrod2",
  Stromal = "steelblue",
  p018 = "#E2D200",
  p019 = "#46ACC8",
  p023 = "#E58601",
  p024 = "#B40F20",
  p027 = "#0B775E",
  p028 = "#E1BD6D",
  p029 = "#35274A",
  p030 = "#F2300F",
  p031 = "#7294D4",
  p032 = "#5B1A18",
  p033 = "#9C964A",
  p034 = "#FD6467",
  Endothelial1 = "#FED976",
  Endothelial2 = "#FEB24C",
  Endothelial3 = "#fd8d3C",
  Endothelial4 = "#FC4E2A",
  Endothelial5 = "#E31A1C",
  Endothelial6 = "#BD0026",
  Endothelial7 = "#800026",
  Lymphaticendothelial = "salmon",
  Fibroblast1 = "#2166AC",
  Fibroblast2 = "#4393C3",
  Myofibroblast1 = "#5AAE61",
  Myofibroblast2 = "#1B7837",
  Smoothmuscle1 = "#9970AB",
  Smoothmuscle2 = "#762A83",
  Mesothelial = "#40004B")

亚群分析

读取在一开始做分群的时候保存的rdata数据。

代码语言:javascript
复制
str_anno <- readRDS("seurat_objects/str_anno.RDS")

通过读取meta中的数据,然后选择自己需要的几种细胞的水平,来进行后面的分析。

代码语言:javascript
复制
str_anno@meta.data$cell_type_str <- factor(str_anno@meta.data$cell_type_str, levels = c("Endothelial1",
                                                                                        "Endothelial2",
                                                                                        "Endothelial3",
                                                                                        "Endothelial4",
                                                                                        "Endothelial5",
                                                                                        "Endothelial6",
                                                                                        "Endothelial7",
                                                                                        "Lymphaticendothelial",
                                                                                        "Fibroblast1",
                                                                                        "Fibroblast2",
                                                                                        "Myofibroblast1",
                                                                                        "Myofibroblast2",
                                                                                        "Smoothmuscle1",
                                                                                        "Smoothmuscle2",
                                                                                        "Mesothelial"))


##画图,主要是根据group的参数,这里选择的有组织、病人来源、细胞类型;还根据featuregene进行点图绘制,这个图也是单细胞组学文章中进行出镜的图
DimPlot(str_anno, group.by = "tissue_type", cols = use_colors)
#ggsave2("DimPlot_str_Normal_Tumor.pdf", path = "output/fig3", width = 15, height = 15, units = "cm")

DimPlot(str_anno, group.by = "patient_id", cols = use_colors, pt.size = 0.5)
ggsave2("SuppFig1C_str_patients.pdf", path = "../results", width = 15, height = 15, units = "cm")

DimPlot(str_anno, group.by = "cell_type_str", label = F, split.by = "tissue_type", cols = use_colors, pt.size = 0.5)
ggsave2("Fig3A_umap.pdf", path = "../results", width = 30, height = 15, units = "cm")

DotPlot(str_anno, features = c("WT1", "UPK3B", "MYH11", "PDGFRB", "ACTA2", "MYLK", "LUM", "PDGFRA", "CCL21", "PROX1", "PECAM1", "VWF"), group.by = "cell_type_str") + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  coord_flip() + 
  scale_color_viridis()
ggsave2("Fig3B.pdf", path = "../results", width = 16, height = 12, units = "cm")
代码语言:javascript
复制
###subsetting
##前面提到了在进行亚群在细分的时候,由于将自己需要的亚群进行了提取,所以需要在进行标准化,符合后面分析的要求。
str_endo <- subset(str_anno, subset = cell_type_str %in% c("Endothelial1",
                                                           "Endothelial2",
                                                           "Endothelial3",
                                                           "Endothelial4",
                                                           "Endothelial5",
                                                           "Endothelial6",
                                                           "Endothelial7",
                                                           "Lymphaticendothelial"))
str_endo <- ScaleData(str_endo)

str_fibro <- subset(str_anno, subset = cell_type_str %in% c("Fibroblast1",
                                                            "Fibroblast2",
                                                            "Myofibroblast1",
                                                            "Myofibroblast2",
                                                            "Smoothmuscle1",
                                                            "Smoothmuscle2",
                                                            "Mesothelial"))

str_fibro <- ScaleData(str_fibro)

endo_counts <- FetchData(str_endo, vars = c("tissue_type", "cell_type_str", "sample_id", "patient_id")) %>%  
  mutate(tissue_type = factor(tissue_type, levels = c("Tumor", "Normal")))

endo_counts_tbl <- endo_counts %>%
  dplyr::count(cell_type_str, patient_id, tissue_type)
write_csv(endo_counts_tbl, path = "../results/SuppTable1.csv")

fibro_counts <- FetchData(str_fibro, vars = c("tissue_type", "cell_type_str", "sample_id", "patient_id")) %>%  
  mutate(tissue_type = factor(tissue_type, levels = c("Tumor", "Normal"))) 

fibro_counts_tbl <- fibro_counts %>%
  dplyr::count(cell_type_str, patient_id, tissue_type)
write_csv(fibro_counts_tbl, path = "../results/SuppTable2.csv")

##作者很多的绘图的代码都很好看,自己在绘图的后期,可以自己粘贴过来,进行颜色和输入的修改就可以
ggplot(data = endo_counts, aes(x = tissue_type, fill = cell_type_str)) +
  geom_bar(position = "fill") +
  scale_fill_manual(values = use_colors) +
  coord_flip() +
  scale_y_reverse()
ggsave2("Fig3A_barplot_endothelial.pdf", path = "../results", width = 20, height = 5, units = "cm")

ggplot(data = fibro_counts, aes(x = tissue_type, fill = cell_type_str)) +
  geom_bar(position = "fill") +
  scale_fill_manual(values = use_colors) +
  coord_flip() +
  scale_y_reverse()
ggsave2("Fig3A_barplot_fibroblastic.pdf", path = "../results", width = 20, height = 5, units = "cm")

endo_counts %>%
  filter(tissue_type == "Tumor") %>%
  ggplot(aes(x = sample_id, fill = cell_type_str)) +
  geom_bar(position = "fill") +
  scale_fill_manual(values = use_colors) +
  coord_flip() +
  scale_y_reverse()
ggsave2("Fig3A_barplot_endothelial_per_patient.pdf", path = "../results", width = 30, height = 30, units = "cm")

fibro_counts %>%
  filter(tissue_type == "Tumor") %>%
  ggplot(aes(x = sample_id, fill = cell_type_str)) +
  geom_bar(position = "fill") +
  scale_fill_manual(values = use_colors) +
  coord_flip() +
  scale_y_reverse()
ggsave2("Fig3A_barplot_fibroblastic_per_patient.pdf", path = "../results", width = 30, height = 30, units = "cm")

endo_counts %>%
  filter(tissue_type == "Normal") %>%
  ggplot(aes(x = sample_id, fill = cell_type_str)) +
  geom_bar(position = "fill") +
  scale_fill_manual(values = use_colors) +
  coord_flip() +
  scale_y_reverse()
ggsave2("SuppFig6A_endothelial.pdf", path = "../results", width = 30, height = 30, units = "cm")

fibro_counts %>%
  filter(tissue_type == "Normal") %>%
  ggplot(aes(x = sample_id, fill = cell_type_str)) +
  geom_bar(position = "fill") +
  scale_fill_manual(values = use_colors) +
  coord_flip() +
  scale_y_reverse()
ggsave2("SuppFig6A_fibroblastic.pdf", path = "../results", width = 30, height = 30, units = "cm")

热图绘制

在进行单细胞热图绘制的时候,seurat的heatmap只是可以看到亚群和基因号,不可以添加其他的参数,但是这个脚本可以添加其他很多的参数,如分组、样本来源等,文章作者也给出了参考来源-https://github.com/satijalab/seurat/issues/2201,主要的代码参考如下,自己在github的网站上查看源码根据自己的需求进行更改。

代码语言:javascript
复制
DoMultiBarHeatmap <- function (object, 
                               features = NULL, 
                               cells = NULL, 
                               group.by = "ident", 
                               additional.group.by = NULL, 
                               additional.group.sort.by = NULL, 
                               cols.use = NULL,
                               group.bar = TRUE, 
                               disp.min = -2.5, 
                               disp.max = NULL, 
                               slot = "scale.data", 
                               assay = NULL, 
                               label = TRUE, 
                               size = 5.5, 
                               hjust = 0, 
                               angle = 45, 
                               raster = TRUE, 
                               draw.lines = TRUE, 
                               lines.width = NULL, 
                               group.bar.height = 0.02, 
                               combine = TRUE) 
{
  cells <- cells %||% colnames(x = object)
  if (is.numeric(x = cells)) {
    cells <- colnames(x = object)[cells]
  }
  assay <- assay %||% DefaultAssay(object = object)
  DefaultAssay(object = object) <- assay
  features <- features %||% VariableFeatures(object = object)
  ## Why reverse???
  features <- rev(x = unique(x = features))
  disp.max <- disp.max %||% ifelse(test = slot == "scale.data", 
                                   yes = 2.5, no = 6)
  possible.features <- rownames(x = GetAssayData(object = object, 
                                                 slot = slot))
  if (any(!features %in% possible.features)) {
    bad.features <- features[!features %in% possible.features]
    features <- features[features %in% possible.features]
    if (length(x = features) == 0) {
      stop("No requested features found in the ", slot, 
           " slot for the ", assay, " assay.")
    }
    warning("The following features were omitted as they were not found in the ", 
            slot, " slot for the ", assay, " assay: ", paste(bad.features, 
                                                             collapse = ", "))
  }
  
  if (!is.null(additional.group.sort.by)) {
    if (any(!additional.group.sort.by %in% additional.group.by)) {
      bad.sorts <- additional.group.sort.by[!additional.group.sort.by %in% additional.group.by]
      additional.group.sort.by <- additional.group.sort.by[additional.group.sort.by %in% additional.group.by]
      if (length(x = bad.sorts) > 0) {
        warning("The following additional sorts were omitted as they were not a subset of additional.group.by : ", 
                paste(bad.sorts, collapse = ", "))
      }
    }
  }
  
  data <- as.data.frame(x = as.matrix(x = t(x = GetAssayData(object = object, 
                                                             slot = slot)[features, cells, drop = FALSE])))
  
  object <- suppressMessages(expr = StashIdent(object = object, 
                                               save.name = "ident"))
  group.by <- group.by %||% "ident"
  groups.use <- object[[c(group.by, additional.group.by[!additional.group.by %in% group.by])]][cells, , drop = FALSE]
  plots <- list()
  for (i in group.by) {
    data.group <- data
    if (!is_null(additional.group.by)) {
      additional.group.use <- additional.group.by[additional.group.by!=i]  
      if (!is_null(additional.group.sort.by)){
        additional.sort.use = additional.group.sort.by[additional.group.sort.by != i]  
      } else {
        additional.sort.use = NULL
      }
    } else {
      additional.group.use = NULL
      additional.sort.use = NULL
    }
    
    group.use <- groups.use[, c(i, additional.group.use), drop = FALSE]
    
    for(colname in colnames(group.use)){
      if (!is.factor(x = group.use[[colname]])) {
        group.use[[colname]] <- factor(x = group.use[[colname]])
      }  
    }
    
    if (draw.lines) {
      lines.width <- lines.width %||% ceiling(x = nrow(x = data.group) * 
                                                0.0025)
      placeholder.cells <- sapply(X = 1:(length(x = levels(x = group.use[[i]])) * 
                                           lines.width), FUN = function(x) {
                                             return(Seurat:::RandomName(length = 20))
                                           })
      placeholder.groups <- data.frame(rep(x = levels(x = group.use[[i]]), times = lines.width))
      group.levels <- list()
      group.levels[[i]] = levels(x = group.use[[i]])
      for (j in additional.group.use) {
        group.levels[[j]] <- levels(x = group.use[[j]])
        placeholder.groups[[j]] = NA
      }
      
      colnames(placeholder.groups) <- colnames(group.use)
      rownames(placeholder.groups) <- placeholder.cells
      
      group.use <- sapply(group.use, as.vector)
      rownames(x = group.use) <- cells
      
      group.use <- rbind(group.use, placeholder.groups)
      
      for (j in names(group.levels)) {
        group.use[[j]] <- factor(x = group.use[[j]], levels = group.levels[[j]])
      }
      
      na.data.group <- matrix(data = NA, nrow = length(x = placeholder.cells), 
                              ncol = ncol(x = data.group), dimnames = list(placeholder.cells, 
                                                                           colnames(x = data.group)))
      data.group <- rbind(data.group, na.data.group)
    }
    
    order_expr <- paste0('order(', paste(c(i, additional.sort.use), collapse=','), ')')
    group.use = with(group.use, group.use[eval(parse(text=order_expr)), , drop=F])
    
    plot <- Seurat:::SingleRasterMap(data = data.group, raster = raster, 
                                     disp.min = disp.min, disp.max = disp.max, feature.order = features, 
                                     cell.order = rownames(x = group.use), group.by = group.use[[i]])
    
    if (group.bar) {
      pbuild <- ggplot_build(plot = plot)
      group.use2 <- group.use
      cols <- list()
      na.group <- Seurat:::RandomName(length = 20)
      for (colname in rev(x = colnames(group.use2))) {
        if (colname == i) {
          colid = paste0('Identity (', colname, ')')
        } else {
          colid = colname
        }
        
        # Default
        cols[[colname]] <- c(scales::hue_pal()(length(x = levels(x = group.use[[colname]]))))  
        
        #Overwrite if better value is provided
        if (!is_null(cols.use[[colname]])) {
          req_length = length(x = levels(group.use))
          if (length(cols.use[[colname]]) < req_length){
            warning("Cannot use provided colors for ", colname, " since there aren't enough colors.")
          } else {
            if (!is_null(names(cols.use[[colname]]))) {
              if (all(levels(group.use[[colname]]) %in% names(cols.use[[colname]]))) {
                cols[[colname]] <- as.vector(cols.use[[colname]][levels(group.use[[colname]])])
              } else {
                warning("Cannot use provided colors for ", colname, " since all levels (", paste(levels(group.use[[colname]]), collapse=","), ") are not represented.")
              }
            } else {
              cols[[colname]] <- as.vector(cols.use[[colname]])[c(1:length(x = levels(x = group.use[[colname]])))]
            }
          }
        }
        
        # Add white if there's lines
        if (draw.lines) {
          levels(x = group.use2[[colname]]) <- c(levels(x = group.use2[[colname]]), na.group)  
          group.use2[placeholder.cells, colname] <- na.group
          cols[[colname]] <- c(cols[[colname]], "#FFFFFF")
        }
        names(x = cols[[colname]]) <- levels(x = group.use2[[colname]])
        
        y.range <- diff(x = pbuild$layout$panel_params[[1]]$y.range)
        y.pos <- max(pbuild$layout$panel_params[[1]]$y.range) + y.range * 0.015
        y.max <- y.pos + group.bar.height * y.range
        pbuild$layout$panel_params[[1]]$y.range <- c(pbuild$layout$panel_params[[1]]$y.range[1], y.max)
        
        plot <- suppressMessages(plot + 
                                   annotation_raster(raster = t(x = cols[[colname]][group.use2[[colname]]]),  xmin = -Inf, xmax = Inf, ymin = y.pos, ymax = y.max) + 
                                   annotation_custom(grob = grid::textGrob(label = colid, hjust = 0, gp = gpar(cex = 0.75)), ymin = mean(c(y.pos, y.max)), ymax = mean(c(y.pos, y.max)), xmin = Inf, xmax = Inf) +
                                   coord_cartesian(ylim = c(0, y.max), clip = "off")) 
        
        if ((colname == i) && label) {
          x.max <- max(pbuild$layout$panel_params[[1]]$x.range)
          x.divs <- pbuild$layout$panel_params[[1]]$x.major %||% pbuild$layout$panel_params[[1]]$x$break_positions()
          group.use$x <- x.divs
          label.x.pos <- tapply(X = group.use$x, INDEX = group.use[[colname]],
                                FUN = median) * x.max
          label.x.pos <- data.frame(group = names(x = label.x.pos), 
                                    label.x.pos)
          plot <- plot + geom_text(stat = "identity", 
                                   data = label.x.pos, aes_string(label = "group", 
                                                                  x = "label.x.pos"), y = y.max + y.max * 
                                     0.03 * 0.5, angle = angle, hjust = hjust, 
                                   size = size)
          plot <- suppressMessages(plot + coord_cartesian(ylim = c(0, 
                                                                   y.max + y.max * 0.002 * max(nchar(x = levels(x = group.use[[colname]]))) * 
                                                                     size), clip = "off"))
        }
      }
    }
    plot <- plot + theme(line = element_blank())
    plots[[i]] <- plot
  }
  if (combine) {
    plots <- CombinePlots(plots = plots)
  }
  return(plots)
}

根据上面的热图的源码进行分组的热图高表达基因的绘制。基本的代码跟前面的epi的代码是类似的。

代码语言:javascript
复制
###DEGs endothelial

Idents(str_endo) <- str_endo@meta.data$cell_type_str

endo_markers <- FindAllMarkers(str_endo, only.pos = T, min.pct = 0.25, min.diff.pct = 0.25)

top_endo_markers <- endo_markers %>% group_by(cluster) %>% top_n(10, wt = avg_log2FC)

DoMultiBarHeatmap(str_endo, features = top_endo_markers$gene, group.by = "cell_type_str", additional.group.by = "tissue_type",additional.group.sort.by = "tissue_type", cols.use = list(tissue_type = use_colors), draw.lines = F) +
  scale_fill_viridis()
ggsave2("SuppFig6B.png", path = "../results", width = 30, height = 40, units = "cm")
#ggsave2("HeatMap_Endo.pdf", path = "output/fig3", width = 30, height = 40, units = "cm")
#ggsave2("HeatMap_Endo.emf", path = "output/fig3", width = 30, height = 40, units = "cm")

这里是对另一种细胞类型进行的差异分析,如果想要求严格一些,可以放p_value值进行筛选。

代码语言:javascript
复制
###DEGs fibroblastic

Idents(str_fibro) <- str_fibro@meta.data$cell_type_str

fibro_markers <- FindAllMarkers(str_fibro, only.pos = T, min.pct = 0.25, min.diff.pct = 0.25)

top_fibro_markers <- fibro_markers %>% group_by(cluster) %>% top_n(10, wt = avg_log2FC)

DoMultiBarHeatmap(str_fibro, features = top_fibro_markers$gene, group.by = "cell_type_str", additional.group.by = "tissue_type",additional.group.sort.by = "tissue_type", cols.use = list(tissue_type = use_colors), draw.lines = F) +
  scale_fill_viridis()
ggsave2("Fig6C.png", path = "../results", width = 30, height = 40, units = "cm")
#ggsave2("HeatMap_Fibro.pdf", path = "output/fig3", width = 30, height = 40, units = "cm")
#ggsave2("HeatMap_Fibro.emf", path = "output/fig3", width = 30, height = 40, units = "cm")

主要将3个不同的细胞类型进行细分,是主要想看在不同微环境下,肿瘤基因的相关通路,所以可以发现在3个亚群最后一步分析当中都带入了progeny scores参数,具体做肿瘤的友友可以参考这篇文章-https://www.jianshu.com/p/4058050d546e,由于我是做植物的,来源也不一样,所以不进行讲解了。

代码语言:javascript
复制
###progeny scores

str_fibro2 <- subset(str_anno, subset = cell_type_str %in% c("Fibroblast1",
                                                             "Fibroblast2",
                                                             "Myofibroblast1",
                                                             "Myofibroblast2",
                                                             "Smoothmuscle1",
                                                             "Smoothmuscle2"))

progeny_scores <- as.data.frame(t(GetAssayData(str_fibro2, assay = "progeny", slot = "scale.data")))
progeny_scores$cell_id <- rownames(progeny_scores)
progeny_scores <- gather(progeny_scores, Pathway, Activity, -cell_id)

cells_clusters <- FetchData(str_anno, c("cell_type_str"))
cells_clusters$cell_id <- rownames(cells_clusters)

progeny_scores <- inner_join(progeny_scores, cells_clusters)

summarized_progeny_scores <- progeny_scores %>% 
  group_by(Pathway, cell_type_str) %>% 
  summarise(avg = mean(Activity), std = sd(Activity), .groups = 'drop') %>%
  pivot_wider(id_cols = Pathway, names_from = cell_type_str, values_from = avg) %>%
  column_to_rownames("Pathway") %>%
  as.matrix()

heatmap.2(summarized_progeny_scores, trace = "none", density.info = "none", col = bluered(100), margins = c(10,10))
ggsave("Fig3D.pdf", width = 7, height = 10)

总结

可以发现这篇与上两篇的对epi分析的思路很像,都是对亚群进行细分以及细胞通路的查看,去看一些基因的表达情况,将里面的基因根据不同表达水平进行划分。这个时候发现单细胞的分析在前面还是很像的,但是根据自己研究的样本以及生物学问题的来源不一样,后面是需要进行不同的包的调取,还有个性化分析的,所以无论是做植物还是动物的,多读一些最新的单细胞组学的文章都是能学到很多的内容的,不断的将自己的文章提高一个新的思路。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • R环境的包及颜色配置加载
  • 亚群分析
  • 热图绘制
  • 总结
相关产品与服务
命令行工具
腾讯云命令行工具 TCCLI 是管理腾讯云资源的统一工具。使用腾讯云命令行工具,您可以快速调用腾讯云 API 来管理您的腾讯云资源。此外,您还可以基于腾讯云的命令行工具来做自动化和脚本处理,以更多样的方式进行组合和重用。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档