跟着存档教程动手学RNAseq分析(五):DESeq2基因水平差异表达分析
我们详细介绍了差异表达分析工作流程中的各个步骤,并提供了理论和示例代码。为了给运行DGE分析所需的代码提供更简洁的参考,我们总结了如下分析中的步骤:
# Run tximport
txi <- tximport(files, type="salmon", tx2gene=t2g, countsFromAbundance = "lengthScaledTPM")
# "files" is a vector wherein each element is the path to the salmon quant.sf file, and each element is named with the name of the sample.
# "t2g" is a 2 column data frame which contains transcript IDs mapped to geneIDs (in that order)
# Check that the row names of the metadata equal
# the column names of the **raw counts** data
all(colnames(txi$counts) == rownames(metadata))
# Create DESeq2Dataset object
dds <- DESeqDataSetFromTximport(txi, colData = metadata, design = ~ condition)
# Transform counts for data visualization
rld <- rlog(dds, blind=TRUE)
# Plot PCA
plotPCA(rld, intgroup="condition")
# Extract the rlog matrix from the object
# and compute pairwise correlation values
rld_mat <- assay(rld)
rld_cor <- cor(rld_mat)
# Plot heatmap
pheatmap(rld_cor, annotation = metadata)
# **Optional step** - Re-create DESeq2 dataset
# if the design formula has changed after QC analysis
# in include other sources of variation
# using "dds <- DESeqDataSetFromTximport(txi, colData = metadata, design = ~ covaraite + condition)"
# Run DESeq2 differential expression analysis
dds <- DESeq(dds)
# **Optional step** - Output normalized counts
# to save as a file to access outside RStudio
# using "normalized_counts <- counts(dds, normalized=TRUE)"
plotDispEsts(dds)
# Output results of Wald test for contrast
contrast <- c("condition", "level_to_compare", "base_level")
res <- results(dds, contrast = contrast, alpha = 0.05)
res <- lfcShrink(dds, contrast = contrast, res=res)
# Set thresholds
padj.cutoff < - 0.05
# Turn the results object into a tibble for use with tidyverse functions
res_tbl <- res %>%
data.frame() %>%
rownames_to_column(var="gene") %>%
as_tibble()
# Subset the significant results
sig_res <- filter(res_tbl, padj < padj.cutoff)
8. 可视化结果:火山图、热图、top基因的标准化计数图等。
9. 进行分析,提取结果的功能意义:GO或KEGG富集,GSEA等。