禁止商业或二改转载,仅供自学使用,侵权必究,如需截取部分内容请后台联系作者!
教程内容
p1
p2
p3
p4
物种系统发育树(Phylogenetic tree),也称为进化树或系统进化树,是一种以树状分支图形来表示各物种或基因之间的亲缘关系的图表。它利用生物的形态特征、分子序列(如DNA、RNA或蛋白质序列)等数据,通过数理统计算法来计算生物之间的进化关系,从而构建出一个反映物种进化历史的拓扑结构。
系统发育树由节点(node)和进化分支(branch)组成:
系统发育树可以是有根的(rooted),也可以是无根的(unrooted):
构建系统发育树的方法主要有以下几种:
系统发育树在多个领域有广泛应用:
系统发育树主要说明以下问题:
安装ggtree的时候注意R包的依赖问题,耐心等待安装。
library(ape)
library(ggnewscale)
library(ggplot2)
library(ggtree)
library(ggtreeExtra)
library(RColorBrewer)
library(tidyverse)
# devtools::install_github("kevinwolz/hisafer")
library(hisafer)
R语言系统发育树数据集下载链接:
本次加载数据有四种数据类型,它们分别是
pMAGs_bact_gtdtk_midroot.tree
;pMAGS_tax.tsv
;pMAGS_Presence_Datasets.txt
;pMAGs_cov_sum.txt
BacTree <- read.tree("pMAGs_bact_gtdtk_midroot.tree")
dat <- read_tsv("pMAGS_tax.tsv")
dataset <- read_tsv("pMAGS_Presence_Datasets.txt")
covmax <- read_tsv("pMAGs_cov_sum.txt")
dat$p_c <- if_else(dat$p == "p__Proteobacteria", dat$c, dat$p)
dat$p_c <- gsub(".__", "", dat$p_c, perl = T)
dat$p_c <- gsub("_.$", "", dat$p_c, perl = T)
bacDat <- dat %>%
filter(d == "d__Bacteria") %>%
mutate(Abundance = 0.2)
mostAbundantTax <- bacDat %>%
group_by(p_c) %>%
summarise(total = n()) %>%
arrange(desc(total)) %>%
slice(1:16)
list <- mostAbundantTax$p_c
list <- c(list, "Nitrospirota")
bacDat <- bacDat %>%
mutate(p_c = if_else(p_c %in% list, p_c, "Other"))
bacDatset <- dataset %>%
filter(MAGs %in% bacDat$MAGs)
bacDatset$Busi <-
gsub("Busi", "Busi et al., Nat Com, 2022", bacDatset$Busi)
bacDatset$ENSEMBLE <-
gsub("ENSEMBLE", "Michoud et al, L&O, 2023", bacDatset$ENSEMBLE)
bacDatset$Tibet <-
gsub("Tibet", "Tibetan Glacier Genome and Gene", bacDatset$Tibet)
bacDatset$Tara <- gsub("Tara", "Tara Oceans", bacDatset$Tara)
bacDatset <- as.data.frame(bacDatset)
rownames(bacDatset) <- bacDatset$MAGs
bacDatset$MAGs <- NULL
bactcov <- covmax %>%
filter(MAGs %in% bacDat$MAGs) %>%
select(MAGs, count)
bactcov <- as.data.frame(bactcov)
rownames(bactcov) <- bactcov$MAGs
bactcov$MAGs <- NULL
构建和美化了一个微生物物种的系统发育树,并添加了多个图层来展示不同的数据。
p1
:基础的圆形系统发育树,节点颜色根据 group
变量着色。p2
:在 p1
的基础上添加了条形图,展示每个节点的 MAGs
值。p3
:在 p2
的基础上添加了热图,展示 bacDatset
数据。p4
:在 p3
的基础上添加了第二个热图,展示 bactcov
数据。bactcov$count <- log10(bactcov$count)
a <- split(bacDat$MAGs, bacDat$p_c)
tree <- groupOTU(BacTree, a)
getPaletteBact <- colorRampPalette(brewer.pal(9, "Set1"))
bactColor <- getPaletteBact(length(unique(bacDat$p_c)) + 1)
bactColor[1] <- "black"
localfile://path?media/17369124795723/17369141309286.jpg
p1
ggtree(tree, layout = "circular", aes(color = group))
:使用 ggtree
函数创建一个圆形布局的系统发育树,节点颜色根据 group
变量进行着色。geom_tree()
:添加树的线条。theme_tree()
:应用 ggtree
的默认主题。geom_treescale(width = 0.1)
:添加一个比例尺,宽度为 0.1。scale_color_manual(values = bactColor, na.value = "transparent", guide = "none")
:自定义节点颜色,缺失值透明,不显示图例。theme(legend.position = "right")
:将图例位置设置在右侧。p1 <-
ggtree(tree,
layout = "circular",
aes(color = group)
) + #
geom_tree() +
theme_tree() +
geom_treescale(width = 0.1) +
scale_color_manual(
values = bactColor,
na.value = "transparent",
guide = "none"
) +
# geom_text2(aes(subset=!isTip, label=node), hjust=-.3)+
theme(legend.position = "right")
p1
localfile://path?media/17369124795723/17369141670671.jpg
p2
new_scale_colour()
和 new_scale_fill()
:重置颜色和填充尺度,以便添加新的图层。geom_fruit
:在树的每个节点上添加条形图,数据来自 bacDat
,条形图的宽度为 0.01,高度为 MAGs
,填充颜色根据 p_c
变量。scale_fill_manual(values = bactColor[-1])
:自定义条形图的填充颜色。labs(fill = "Taxa")
:设置填充颜色的图例标签为 "Taxa"。p2 <- p1 +
new_scale_colour() +
new_scale_fill() +
geom_fruit(
data = bacDat,
pwidth = 0.01,
geom = geom_bar,
mapping = aes(y = MAGs, fill = p_c, x = 1),
# orientation="y",
stat = "identity",
) +
scale_fill_manual(values = bactColor[-1]) +
labs(fill = "Taxa") +
new_scale_colour() +
new_scale_fill()
p2
localfile://path?media/17369124795723/17369142007084.jpg
p3
gheatmap
:在 p2
的基础上添加一个热图,数据来自 bacDatset
,热图宽度为 0.2,偏移量为 0.1,不显示列名,颜色默认。scale_fill_discrete(na.translate = FALSE)
:处理缺失值,不翻译为颜色。labs(fill = "Datasets")
:设置填充颜色的图例标签为 "Datasets"。p3 <-
gheatmap(
p2,
bacDatset,
width = 0.2,
offset = 0.1,
# offset=8, width=0.6,
colnames = FALSE,
color = NULL
) +
scale_fill_discrete(na.translate = F) +
labs(fill = "Datasets") +
new_scale_colour() +
new_scale_fill()
p3
localfile://path?media/17369124795723/17369142210388.jpg
p4
gheatmap
:在 p3
的基础上添加另一个热图,数据来自 bactcov
,热图宽度为 0.05,偏移量为 0.6,不显示列名,颜色默认。scale_fill_viridis_c()
:使用 viridis
色板填充热图。labs(fill = "Normalized log10\nabundance")
:设置填充颜色的图例标签为 "Normalized log10 abundance"。 p4 <- gheatmap(
p3,
bactcov,
offset = 0.6,
width = 0.05,
colnames = FALSE,
color = NULL
) +
scale_fill_viridis_c() +
labs(fill = "Normalized log10\nabundance")
p4
localfile://path?media/17369124795723/17369142461721.jpg
结果:微生物物种系统发育树图,它由以下部分组成:
将图片以PDF格式保存到本地
ggsave_fitmax <- function(
filename,
plot,
maxheight = 7,
maxwidth = maxheight,
units = "in", ...) {
if (is.null(plot)) {
return(FALSE)
}
dims <- get_dims(
ggobj = plot,
maxheight = maxheight,
maxwidth = maxwidth,
units = units
)
ggplot2::ggsave(
filename = filename,
plot = plot,
height = dims$height,
width = dims$width,
units = units, ...
)
}
get_dims <- function(
ggobj,
maxheight,
maxwidth = maxheight,
units = "in", ...) {
# Internal helper function:
# Treat all null units in a unit object as if they were inches.
# This is a bad idea in gneral, but I use it here as a workaround.
# Extracting unit names from non-atomic unit objects is a pain,
# so questions like "which rows of this table layout have null heights?"
# are hard to answer. To work around it, I exploit an (undocumented!)
# quirk: When calculating the size of a table layout inside a Grid plot,
# convertUnit(...) treats null units as zero.
# Therefore
# (convertHeight(grob_height, "in", valueOnly=TRUE)
# - convertHeight(null_as_if_inch(grob_height), "in", valueOnly=TRUE))
# does the inverse of convertUnit: It gives the sum of all *null* heights
# in the object, treating *fixed* units as zero.
#
# Warning: I repeat, this approach ONLY makes any sense if
# convertUnit(unit(1, "null"), "in", "x", valueOnly=T) == 0
# is true. Please check that it is before calling this code.
.null_as_if_inch <- function(u) {
stopifnot(packageVersion("grid") < "4.0")
if (!grid::is.unit(u)) {
return(u)
}
if (is.atomic(u)) {
if ("null" %in% attr(u, "unit")) {
d <- attr(u, "data")
u <- unit(
x = as.vector(u),
units = gsub("null", "in", attr(u, "unit")),
data = d
)
}
return(u)
}
if (inherits(u, "unit.arithmetic")) {
l <- .null_as_if_inch(u$arg1)
r <- .null_as_if_inch(u$arg2)
if (is.null(r)) {
args <- list(l)
} else {
args <- list(l, r)
}
return(do.call(u$fname, args))
}
if (inherits(u, "unit.list")) {
return(do.call(grid::unit.c, lapply(u, .null_as_if_inch)))
}
return(u)
}
if (inherits(ggobj, "ggplot") && !isTRUE(ggobj$respect) &&
is.null(ggobj$theme$aspect.ratio) && is.null(ggobj$coordinates$ratio) &&
is.null(ggplot2::theme_get()$aspect.ratio)) {
return(list(height = maxheight, width = maxwidth))
}
tmpf <- tempfile(pattern = "dispos-a-plot", fileext = ".png")
png(
filename = tmpf,
height = maxheight,
width = maxwidth,
units = units,
res = 120, ...
)
on.exit({
dev.off()
unlink(tmpf)
})
if (inherits(ggobj, "ggplot")) {
g <- ggplot2::ggplotGrob(ggobj)
} else if (inherits(ggobj, "gtable")) {
g <- ggobj
} else {
stop("Don't know how to get sizes for object of class ", deparse(class(ggobj)))
}
stopifnot(grid::convertUnit(grid::unit(1, "null"), "in", "x", valueOnly = TRUE) == 0)
known_ht <- sum(grid::convertHeight(g$heights, units, valueOnly = TRUE))
known_wd <- sum(grid::convertWidth(g$widths, units, valueOnly = TRUE))
free_ht <- maxheight - known_ht
free_wd <- maxwidth - known_wd
if (packageVersion("grid") >= "4.0.0") {
null_rowhts <- as.numeric(g$heights[grid::unitType(g$heights) == "null"])
null_colwds <- as.numeric(g$widths[grid::unitType(g$widths) == "null"])
panel_asps <- (
matrix(null_rowhts, ncol = 1)
%*% matrix(1 / null_colwds, nrow = 1))
} else {
all_null_rowhts <- (
grid::convertHeight(.null_as_if_inch(g$heights), "in", valueOnly = TRUE)
- grid::convertHeight(g$heights, "in", valueOnly = TRUE))
all_null_colwds <- (
grid::convertWidth(.null_as_if_inch(g$widths), "in", valueOnly = TRUE)
- grid::convertWidth(g$widths, "in", valueOnly = TRUE))
null_rowhts <- all_null_rowhts[all_null_rowhts > 0]
null_colwds <- all_null_colwds[all_null_colwds > 0]
panel_asps <- (matrix(null_rowhts, ncol = 1) %*% matrix(1 / null_colwds, nrow = 1))
}
panel_asps <- matrix(null_rowhts, ncol = 1) %*% matrix(1 / null_colwds, nrow = 1)
max_rowhts <- free_ht / sum(null_rowhts) * null_rowhts
max_colwds <- free_wd / sum(null_colwds) * null_colwds
rowhts_if_maxwd <- max_colwds[1] * panel_asps[, 1]
colwds_if_maxht <- max_rowhts[1] / panel_asps[1, ]
height <- min(maxheight, known_ht + sum(rowhts_if_maxwd))
width <- min(maxwidth, known_wd + sum(colwds_if_maxht))
return(list(height = height, width = width))
}
ggsave_fitmax("bacterialTree.pdf",
p4,
maxheight = 15
)
本教程将指导你如何使用ggtree等一系列包在R语言环境中构建微生物物种的系统发育树。为了帮助读者更好地理解和应用,本教程提供了完整的数据和代码示例。
R version 4.3.3 (2024-02-29)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Asia/Shanghai
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] hisafer_1.5.1 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
[6] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1 tidyverse_2.0.0
[11] RColorBrewer_1.1-3 ggtreeExtra_1.12.0 ggtree_3.8.2 ggplot2_3.5.1 ggnewscale_0.4.10
[16] ape_5.7-1
loaded via a namespace (and not attached):
[1] yulab.utils_0.1.4 utf8_1.2.4 generics_0.1.3 ggplotify_0.1.2 stringi_1.8.3
[6] lattice_0.22-6 hms_1.1.3 digest_0.6.35 magrittr_2.0.3 timechange_0.3.0
[11] grid_4.3.3 fastmap_1.1.1 jsonlite_1.8.8 fansi_1.0.6 aplot_0.2.2
[16] scales_1.3.0 lazyeval_0.2.2 cli_3.6.3 rlang_1.1.4 munsell_0.5.0
[21] tidytree_0.4.6 withr_3.0.2 cachem_1.0.8 tools_4.3.3 parallel_4.3.3
[26] tzdb_0.4.0 memoise_2.0.1 colorspace_2.1-0 vctrs_0.6.5 R6_2.5.1
[31] gridGraphics_0.5-1 lifecycle_1.0.4 fs_1.6.3 ggfun_0.1.4 treeio_1.26.0
[36] pkgconfig_2.0.3 pillar_1.9.0 gtable_0.3.4 glue_1.8.0 Rcpp_1.0.12
[41] tidyselect_1.2.1 rstudioapi_0.16.0 farver_2.1.1 nlme_3.1-164 patchwork_1.3.0
[46] compiler_4.3.3
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
扫码关注腾讯云开发者
领取腾讯云代金券
Copyright © 2013 - 2025 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有
深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569
腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287
Copyright © 2013 - 2025 Tencent Cloud.
All Rights Reserved. 腾讯云 版权所有