
禁止商业或二改转载,仅供自学使用,侵权必究,如需截取部分内容请后台联系作者!
教程内容
p1p2p3p4
物种系统发育树(Phylogenetic tree),也称为进化树或系统进化树,是一种以树状分支图形来表示各物种或基因之间的亲缘关系的图表。它利用生物的形态特征、分子序列(如DNA、RNA或蛋白质序列)等数据,通过数理统计算法来计算生物之间的进化关系,从而构建出一个反映物种进化历史的拓扑结构。
系统发育树由节点(node)和进化分支(branch)组成:
系统发育树可以是有根的(rooted),也可以是无根的(unrooted):
构建系统发育树的方法主要有以下几种:
系统发育树在多个领域有广泛应用:
系统发育树主要说明以下问题:
安装ggtree的时候注意R包的依赖问题,耐心等待安装。
library(ape)
library(ggnewscale)
library(ggplot2)
library(ggtree)
library(ggtreeExtra)
library(RColorBrewer)
library(tidyverse)
# devtools::install_github("kevinwolz/hisafer")
library(hisafer)R语言系统发育树数据集下载链接:

本次加载数据有四种数据类型,它们分别是
pMAGs_bact_gtdtk_midroot.tree;pMAGS_tax.tsv;pMAGS_Presence_Datasets.txt;pMAGs_cov_sum.txtBacTree <- read.tree("pMAGs_bact_gtdtk_midroot.tree")
dat <- read_tsv("pMAGS_tax.tsv")
dataset <- read_tsv("pMAGS_Presence_Datasets.txt")
covmax <- read_tsv("pMAGs_cov_sum.txt")dat$p_c <- if_else(dat$p == "p__Proteobacteria", dat$c, dat$p)
dat$p_c <- gsub(".__", "", dat$p_c, perl = T)
dat$p_c <- gsub("_.$", "", dat$p_c, perl = T)
bacDat <- dat %>%
filter(d == "d__Bacteria") %>%
mutate(Abundance = 0.2)
mostAbundantTax <- bacDat %>%
group_by(p_c) %>%
summarise(total = n()) %>%
arrange(desc(total)) %>%
slice(1:16)
list <- mostAbundantTax$p_c
list <- c(list, "Nitrospirota")
bacDat <- bacDat %>%
mutate(p_c = if_else(p_c %in% list, p_c, "Other"))
bacDatset <- dataset %>%
filter(MAGs %in% bacDat$MAGs)
bacDatset$Busi <-
gsub("Busi", "Busi et al., Nat Com, 2022", bacDatset$Busi)
bacDatset$ENSEMBLE <-
gsub("ENSEMBLE", "Michoud et al, L&O, 2023", bacDatset$ENSEMBLE)
bacDatset$Tibet <-
gsub("Tibet", "Tibetan Glacier Genome and Gene", bacDatset$Tibet)
bacDatset$Tara <- gsub("Tara", "Tara Oceans", bacDatset$Tara)
bacDatset <- as.data.frame(bacDatset)
rownames(bacDatset) <- bacDatset$MAGs
bacDatset$MAGs <- NULL
bactcov <- covmax %>%
filter(MAGs %in% bacDat$MAGs) %>%
select(MAGs, count)
bactcov <- as.data.frame(bactcov)
rownames(bactcov) <- bactcov$MAGs
bactcov$MAGs <- NULL构建和美化了一个微生物物种的系统发育树,并添加了多个图层来展示不同的数据。
p1:基础的圆形系统发育树,节点颜色根据 group 变量着色。p2:在 p1 的基础上添加了条形图,展示每个节点的 MAGs 值。p3:在 p2 的基础上添加了热图,展示 bacDatset 数据。p4:在 p3 的基础上添加了第二个热图,展示 bactcov 数据。bactcov$count <- log10(bactcov$count)
a <- split(bacDat$MAGs, bacDat$p_c)
tree <- groupOTU(BacTree, a)
getPaletteBact <- colorRampPalette(brewer.pal(9, "Set1"))
bactColor <- getPaletteBact(length(unique(bacDat$p_c)) + 1)
bactColor[1] <- "black"localfile://path?media/17369124795723/17369141309286.jpg

p1ggtree(tree, layout = "circular", aes(color = group)):使用 ggtree 函数创建一个圆形布局的系统发育树,节点颜色根据 group 变量进行着色。geom_tree():添加树的线条。theme_tree():应用 ggtree 的默认主题。geom_treescale(width = 0.1):添加一个比例尺,宽度为 0.1。scale_color_manual(values = bactColor, na.value = "transparent", guide = "none"):自定义节点颜色,缺失值透明,不显示图例。theme(legend.position = "right"):将图例位置设置在右侧。p1 <-
ggtree(tree,
layout = "circular",
aes(color = group)
) + #
geom_tree() +
theme_tree() +
geom_treescale(width = 0.1) +
scale_color_manual(
values = bactColor,
na.value = "transparent",
guide = "none"
) +
# geom_text2(aes(subset=!isTip, label=node), hjust=-.3)+
theme(legend.position = "right")
p1localfile://path?media/17369124795723/17369141670671.jpg

p2new_scale_colour() 和 new_scale_fill():重置颜色和填充尺度,以便添加新的图层。geom_fruit:在树的每个节点上添加条形图,数据来自 bacDat,条形图的宽度为 0.01,高度为 MAGs,填充颜色根据 p_c 变量。scale_fill_manual(values = bactColor[-1]):自定义条形图的填充颜色。labs(fill = "Taxa"):设置填充颜色的图例标签为 "Taxa"。p2 <- p1 +
new_scale_colour() +
new_scale_fill() +
geom_fruit(
data = bacDat,
pwidth = 0.01,
geom = geom_bar,
mapping = aes(y = MAGs, fill = p_c, x = 1),
# orientation="y",
stat = "identity",
) +
scale_fill_manual(values = bactColor[-1]) +
labs(fill = "Taxa") +
new_scale_colour() +
new_scale_fill()
p2localfile://path?media/17369124795723/17369142007084.jpg

p3gheatmap:在 p2 的基础上添加一个热图,数据来自 bacDatset,热图宽度为 0.2,偏移量为 0.1,不显示列名,颜色默认。scale_fill_discrete(na.translate = FALSE):处理缺失值,不翻译为颜色。labs(fill = "Datasets"):设置填充颜色的图例标签为 "Datasets"。p3 <-
gheatmap(
p2,
bacDatset,
width = 0.2,
offset = 0.1,
# offset=8, width=0.6,
colnames = FALSE,
color = NULL
) +
scale_fill_discrete(na.translate = F) +
labs(fill = "Datasets") +
new_scale_colour() +
new_scale_fill()
p3localfile://path?media/17369124795723/17369142210388.jpg

p4gheatmap:在 p3 的基础上添加另一个热图,数据来自 bactcov,热图宽度为 0.05,偏移量为 0.6,不显示列名,颜色默认。scale_fill_viridis_c():使用 viridis 色板填充热图。labs(fill = "Normalized log10\nabundance"):设置填充颜色的图例标签为 "Normalized log10 abundance"。 p4 <- gheatmap(
p3,
bactcov,
offset = 0.6,
width = 0.05,
colnames = FALSE,
color = NULL
) +
scale_fill_viridis_c() +
labs(fill = "Normalized log10\nabundance")
p4localfile://path?media/17369124795723/17369142461721.jpg

结果:微生物物种系统发育树图,它由以下部分组成:
将图片以PDF格式保存到本地
ggsave_fitmax <- function(
filename,
plot,
maxheight = 7,
maxwidth = maxheight,
units = "in", ...) {
if (is.null(plot)) {
return(FALSE)
}
dims <- get_dims(
ggobj = plot,
maxheight = maxheight,
maxwidth = maxwidth,
units = units
)
ggplot2::ggsave(
filename = filename,
plot = plot,
height = dims$height,
width = dims$width,
units = units, ...
)
}
get_dims <- function(
ggobj,
maxheight,
maxwidth = maxheight,
units = "in", ...) {
# Internal helper function:
# Treat all null units in a unit object as if they were inches.
# This is a bad idea in gneral, but I use it here as a workaround.
# Extracting unit names from non-atomic unit objects is a pain,
# so questions like "which rows of this table layout have null heights?"
# are hard to answer. To work around it, I exploit an (undocumented!)
# quirk: When calculating the size of a table layout inside a Grid plot,
# convertUnit(...) treats null units as zero.
# Therefore
# (convertHeight(grob_height, "in", valueOnly=TRUE)
# - convertHeight(null_as_if_inch(grob_height), "in", valueOnly=TRUE))
# does the inverse of convertUnit: It gives the sum of all *null* heights
# in the object, treating *fixed* units as zero.
#
# Warning: I repeat, this approach ONLY makes any sense if
# convertUnit(unit(1, "null"), "in", "x", valueOnly=T) == 0
# is true. Please check that it is before calling this code.
.null_as_if_inch <- function(u) {
stopifnot(packageVersion("grid") < "4.0")
if (!grid::is.unit(u)) {
return(u)
}
if (is.atomic(u)) {
if ("null" %in% attr(u, "unit")) {
d <- attr(u, "data")
u <- unit(
x = as.vector(u),
units = gsub("null", "in", attr(u, "unit")),
data = d
)
}
return(u)
}
if (inherits(u, "unit.arithmetic")) {
l <- .null_as_if_inch(u$arg1)
r <- .null_as_if_inch(u$arg2)
if (is.null(r)) {
args <- list(l)
} else {
args <- list(l, r)
}
return(do.call(u$fname, args))
}
if (inherits(u, "unit.list")) {
return(do.call(grid::unit.c, lapply(u, .null_as_if_inch)))
}
return(u)
}
if (inherits(ggobj, "ggplot") && !isTRUE(ggobj$respect) &&
is.null(ggobj$theme$aspect.ratio) && is.null(ggobj$coordinates$ratio) &&
is.null(ggplot2::theme_get()$aspect.ratio)) {
return(list(height = maxheight, width = maxwidth))
}
tmpf <- tempfile(pattern = "dispos-a-plot", fileext = ".png")
png(
filename = tmpf,
height = maxheight,
width = maxwidth,
units = units,
res = 120, ...
)
on.exit({
dev.off()
unlink(tmpf)
})
if (inherits(ggobj, "ggplot")) {
g <- ggplot2::ggplotGrob(ggobj)
} else if (inherits(ggobj, "gtable")) {
g <- ggobj
} else {
stop("Don't know how to get sizes for object of class ", deparse(class(ggobj)))
}
stopifnot(grid::convertUnit(grid::unit(1, "null"), "in", "x", valueOnly = TRUE) == 0)
known_ht <- sum(grid::convertHeight(g$heights, units, valueOnly = TRUE))
known_wd <- sum(grid::convertWidth(g$widths, units, valueOnly = TRUE))
free_ht <- maxheight - known_ht
free_wd <- maxwidth - known_wd
if (packageVersion("grid") >= "4.0.0") {
null_rowhts <- as.numeric(g$heights[grid::unitType(g$heights) == "null"])
null_colwds <- as.numeric(g$widths[grid::unitType(g$widths) == "null"])
panel_asps <- (
matrix(null_rowhts, ncol = 1)
%*% matrix(1 / null_colwds, nrow = 1))
} else {
all_null_rowhts <- (
grid::convertHeight(.null_as_if_inch(g$heights), "in", valueOnly = TRUE)
- grid::convertHeight(g$heights, "in", valueOnly = TRUE))
all_null_colwds <- (
grid::convertWidth(.null_as_if_inch(g$widths), "in", valueOnly = TRUE)
- grid::convertWidth(g$widths, "in", valueOnly = TRUE))
null_rowhts <- all_null_rowhts[all_null_rowhts > 0]
null_colwds <- all_null_colwds[all_null_colwds > 0]
panel_asps <- (matrix(null_rowhts, ncol = 1) %*% matrix(1 / null_colwds, nrow = 1))
}
panel_asps <- matrix(null_rowhts, ncol = 1) %*% matrix(1 / null_colwds, nrow = 1)
max_rowhts <- free_ht / sum(null_rowhts) * null_rowhts
max_colwds <- free_wd / sum(null_colwds) * null_colwds
rowhts_if_maxwd <- max_colwds[1] * panel_asps[, 1]
colwds_if_maxht <- max_rowhts[1] / panel_asps[1, ]
height <- min(maxheight, known_ht + sum(rowhts_if_maxwd))
width <- min(maxwidth, known_wd + sum(colwds_if_maxht))
return(list(height = height, width = width))
}
ggsave_fitmax("bacterialTree.pdf",
p4,
maxheight = 15
)本教程将指导你如何使用ggtree等一系列包在R语言环境中构建微生物物种的系统发育树。为了帮助读者更好地理解和应用,本教程提供了完整的数据和代码示例。
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Asia/Shanghai
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] hisafer_1.5.1 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
[6] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1 tidyverse_2.0.0
[11] RColorBrewer_1.1-3 ggtreeExtra_1.12.0 ggtree_3.8.2 ggplot2_3.5.1 ggnewscale_0.4.10
[16] ape_5.7-1
loaded via a namespace (and not attached):
[1] yulab.utils_0.1.4 utf8_1.2.4 generics_0.1.3 ggplotify_0.1.2 stringi_1.8.3
[6] lattice_0.22-6 hms_1.1.3 digest_0.6.35 magrittr_2.0.3 timechange_0.3.0
[11] grid_4.3.3 fastmap_1.1.1 jsonlite_1.8.8 fansi_1.0.6 aplot_0.2.2
[16] scales_1.3.0 lazyeval_0.2.2 cli_3.6.3 rlang_1.1.4 munsell_0.5.0
[21] tidytree_0.4.6 withr_3.0.2 cachem_1.0.8 tools_4.3.3 parallel_4.3.3
[26] tzdb_0.4.0 memoise_2.0.1 colorspace_2.1-0 vctrs_0.6.5 R6_2.5.1
[31] gridGraphics_0.5-1 lifecycle_1.0.4 fs_1.6.3 ggfun_0.1.4 treeio_1.26.0
[36] pkgconfig_2.0.3 pillar_1.9.0 gtable_0.3.4 glue_1.8.0 Rcpp_1.0.12
[41] tidyselect_1.2.1 rstudioapi_0.16.0 farver_2.1.1 nlme_3.1-164 patchwork_1.3.0
[46] compiler_4.3.3原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。