Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice
今天的推文来试着复现一下论文中的figure5a
自己构造的示例数据部分截图
image.png
library(readxl)
library(tidyverse)
dat<-read_excel("data/20231219/20131219.xlsx") %>%
mutate(group3=case_when(
group2 == 0 ~ "0",
TRUE ~ group1
))
dat %>% head()
dat %>% pull(x) %>% unique() -> x.levels
dat %>% pull(y) %>% unique() -> y.levels
library(ggplot2)
dat %>%
mutate(x=factor(x,levels = x.levels),
y=factor(y,levels = rev(y.levels))) %>%
ggplot(aes(x=x,y=y))+
geom_tile(aes(fill=group3),
color="gray")+
theme_bw(base_size = 15)+
theme(panel.grid = element_blank(),
panel.border = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
axis.text.x = element_text(angle=90,hjust=0),
axis.text.y = element_text(face = "italic"))+
scale_x_discrete(position = "top",
expand = c(0,0))+
scale_y_discrete(expand = c(0,0))+
scale_fill_manual(values = c("0"="white",
"A"="#fe0000",
"B"="#c0a100",
"D"="#00b650",
"E"="#f38dd5",
"F"="#7230a3",
"G"="#02b5fc"))+
theme(legend.position = "none")
image.png
作图代码不是难点,主要是作图数据准备成什么格式。这里用来填充颜色的数据,基因不存在统一填充为0,基因存在填充为不同的群体值。