文章/答案/技术大牛

发布

社区首页 >专栏 >生信绘图与配色

生信绘图与配色

原创

用户11008504

修改于 2024-07-02 10:42:22

3940

本文资料与代码来源于医学方出品的《SCI论文绘图之道》，转载请说明。

一、前言

高水平的SCI插图有如下的特点：

1.形式丰富且合适；

2.信息直观，每张图最多反应2-3个信息；

3.根据期刊要求使用颜色、形状、大型；

4.布局不能留太多空白和拥挤；

5.必要时添加辅助线帮助阅读。

如何绘制：

1.理解数据

2.绘制草图

3.根据期刊要求准备图片配色、格式、分辨率

二、ggplot2的绘图原理

2.1 散点图

#install.packages(c('tidyverse','ggsci'))
library(tidyverse)
library(ggsci)

ggplot(mtcars,aes(mpg,qsec,color = factor(cyl)))+
 size = 3, alpha = .6)+
  theme_bw()+
  scale_color_lancet()+
  scale_x_continuous()+
  scale_y_continuous()

元素组成：

1.画布：ggplot()函数，mtcars为画图数据

2.横纵坐标：mpg,qsec两个变量

3.散点- 几何对象： geom_point(）函数，size,alpha为控制点属性的参数

4.散点颜色- 变量映射：color = factor(cyl)：把cyl这个变量因子化，不同的颜色表示变量的分类水平

5.图例- 变量映射的产物

6.背景网路：theme_bw()完成，主题函数

7.坐标：横纵坐标，包括坐标轴上的刻度。

最基本的四要素：

1)绘图数据:mtcars

2)画布：由geom_point()函数完成

3)映射：由aes()函数完成

4)几何对象：由 geom_point()函数完成

ggplot2的基本结构：

1）变量映射系统

2）几何对象系统

3）标度系统

4）主题系统

三、常见图形绘制

连续型数据：某个区间内的任意值都可以取的数据，特点是可以进行 无限的分割和测量，两个相邻的值之间可能存在无数个中间值。

离散型数据则是指只能取到有限个数或者是可数个数的数据，通常以整数表示。

3.1 单个连续型变量

常用：盒型图和小提琴图，在纵坐标上展示数据

离散型变量（分组变量）+连续变量

3.2 两个或多个连续型变量

ggplot中颜色实现两种方式：

1.变量映射（取值越大颜色越深）

2.人为定义：颜色参数设置为某个颜色

mtcars <- mtcars %>% as_tibble() %>% mutate(cyl = factor(cyl),
                                            carb = factor(carb))
set.seed(2019)
dsamp <- diamonds[sample(nrow(diamonds), 1000), ]
p1 <- ggplot(mtcars,aes(mpg,qsec,fill = cyl))+
  geom_point(size = 3,shape = 21, color = 'black')+
  theme_classic()+
  theme(legend.position = c(.1,.8),#图标的位置
        legend.background = element_rect(color = 'black'))#图标的颜色

p2 <- ggplot(mtcars,aes(mpg,qsec,fill = carb))+
  geom_point(size = 3,shape = 21, color = 'black')+
  theme_classic()+
  theme(legend.position = c(.1,.7),
        legend.background = element_rect(color = 'black'))
#plot_grid()函数用于多个 ggplot 对象（即图表）排列组合,align参数设置对齐方式 ，v表示垂直对齐，h表示水平对齐。
plot_grid(p1,p2,ncol=2,labels = c('A','B'),align = c('v','h'))

散点图颜色：标度函数-scale_fill(); scale_color()，分为连续色和离散色。

1.连续色：由scale_fill函数第三个词决定，如scale_fill_gradient表示连续色

dsamp <- dsamp %>% mutate(depth = depth - median(depth))
p1 <- ggplot(dsamp, aes(carat, price)) +
  geom_point(aes(fill = depth),shape=21, color = 'black',size = 2)+
  scale_fill_gradient2(low = '#41ab5d',mid = 'white',high = '#e31a1c',
                        midpoint = 0)+
  theme(legend.position = c(.1,.8))

p2 <- ggplot(dsamp, aes(carat, price)) +
  geom_point(aes(fill = depth),shape=21, color = 'black',size = 2)+
  scale_fill_gsea()+
  theme(legend.position = c(.1,.8))

plot_grid(p1,p2,ncol=2,labels = c('A','B'),align = c('v','h'))

2.离散色：使用配色包，如ggsci包，提供了一组科学期刊风格的配色方案

3.3 连续型变量绘图优化

气泡图

set.seed(2019)
dsamp <- diamonds[sample(nrow(diamonds), 500), ]
dsamp <-  dsamp %>% mutate(x = exp(x))

p1 <- ggplot(dsamp, aes(carat, price)) +
  geom_point(aes(size = x,fill = cut),shape=21, 
             color = 'black')+
  scale_fill_brewer(palette = 'Set2')

p2 <- ggplot(dsamp, aes(carat, price)) +
  geom_point(aes(size = log(x),fill = cut),shape=21, 
             color = 'black')+
  scale_fill_brewer(palette = 'Set2')
#RColorBrewer配色包--颜色区分较大，适合分类变量，如 Set1, Set2, Set3, Accent, Paired, Pastel1, Pastel2 等
plot_grid(p1,p2,ncol=2,labels = c('A','B'),align = c('v','h'))

二维密度图：适合大数据量，stat_density_2d()函数

a <- data.frame( x=rnorm(20000, 10, 1.9), y=rnorm(20000, 10, 1.2) )
b <- data.frame( x=rnorm(20000, 14.5, 1.9), y=rnorm(20000, 14.5, 1.9) )
c <- data.frame( x=rnorm(20000, 9.5, 1.9), y=rnorm(20000, 15.5, 1.9) )
data <- rbind(a,b,c)
p2 <- ggplot(data, aes(x=x, y=y) ) +
  stat_density_2d(aes(fill = ..level..), geom = "polygon")+
  theme_bw()+
  scale_fill_gradient(low = '#edf8e9',high = '#238b45')+
  theme(panel.grid = element_blank())

散点+密度图：ggExtra包

install.packages("ggExtra")
library(ggExtra)

piris <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_point(shape=21, size = 3,stroke = 1.2)+
  scale_color_npg()+
  theme_bw()+
  theme(legend.position = c(.1,.86))
ggMarginal(piris, type= 'density',
           groupFill = TRUE)

层峦叠峰：ggridges包，geom_density_ridges()函数

install.packages('ggridges')
library(ggridges)

ggplot(dsamp,aes(x = carat,y = clarity, fill = clarity))+
  geom_density_ridges(alpha = .5)+
  scale_fill_brewer(palette = 'Set3')+
  theme_classic()

3.4 两个连续型变量相关性图-ggcorrplot包

install.packages("ggcorrplot")
library(ggcorrplot)
mtcars <- mtcars %>% select(c('mpg','disp','hp','drat','wt','qsec'))
corr <- round(cor(mtcars), 1)
p1 <- ggcorrplot(
  corr,
  type = "lower",
  outline.color = "white",
  colors = c("#6D9EC1", "white", "#E46726")
)

p2 <- ggcorrplot(
  corr,
  type = "upper",
  outline.color = "white",
  colors = c("#084594", "white", "#ef3b2c")
)

p3 <- ggcorrplot(corr,
           type = "lower",
           lab = TRUE)

p.mat <- cor_pmat(mtcars)

p4 <- ggcorrplot(corr,
           type = "lower",
           p.mat = p.mat)

plot_grid(p1,p2,p3,p4,ncol=2,labels = LETTERS[1:4],align = c('v','h'))

3.5 连续型变量趋势图

3.5.1线图：geom_line()+geom_point()函数

set.seed(2020)
id <- 1:8
dat <- tibble(x = rep(id,2),
              y = c(2*id+rnorm(8,0,1),
                    6*id+rnorm(8,0,2)),
              group = rep(c('Group1','Group2'),each = 8))

p1 <- ggplot(dat,aes(x, y, color = group))+
  geom_line(size = .8)+
  geom_point(shape = 21,color = 'black',size=3,fill = 'white')+
  scale_color_d3()+
  theme_classic()+
  theme(legend.position = c(.15,.85))

p2 <- ggplot(dat,aes(x, log(y), color = group))+
  geom_line(size = .8)+
  geom_point(shape = 21,color = 'black',size=3,fill = 'white')+
  scale_color_d3()+
  theme_classic()+
  theme(legend.position = c(.15,.85))

p3 <- ggplot(dat,aes(x, y))+
  geom_line(size = .8,color = '#2b8cbe')+
  geom_point(shape = 21,color = 'black',size=3,fill = 'white')+
  theme_classic()+
  facet_wrap(~group,ncol = 2,scales = 'free_y')+
  theme(legend.position = c(.15,.85),
        strip.background = element_blank(),
        strip.text = element_text(size = 12))

p4 <- plot_grid(p1,p2,ncol=2,labels = LETTERS[1:2],align = c('v','h'))
plot_grid(p4,p3,ncol=1,labels = c('','C'),align = c('v','h'))

3.5.2配比坡度图

3.5.3面积图 geom_area()函数

3.5.4平滑曲线 geom_smooth()函数

3.5.5生存曲线 survminer,survival包

3.5.6三元图 ggtern包

与等高线，三位密度图结合

install.packages('ggtern')
library(ggtern)
set.seed(2019)
a <- tibble(x=rnorm(20000, 80, 30), 
                 y=rnorm(20000, 100, 30), 
                 z = rnorm(20000, 60, 20))

ggtern(data=a,aes(x,y,z))+
  stat_density_tern(aes(fill = ..level..,alpha = ..level..),
                    geom = 'polygon')+
  scale_fill_gradient(low = '#fcbba1',high = '#ef3b2c')+
  theme_showarrows()+
  guides(color = "none", fill = "none", alpha = "none")

3.5.7 雷达图:ggradar包

3.6 离散型数据构成可视化

3.6.1.饼图：ggplot中由条形图变形而来，条图和坐标系转换而成

injuries <- tibble(type = c('Road injury','Self-harm','CVD',
                            'Cancers','Infectious Diseases'),
                   counts = c(214,123,69,24,17),
                   share = counts/sum(counts)*100)

p1 <- ggplot(injuries,aes('', y = counts, fill = type))+
  geom_bar(width = 1, size = 1, color = 'white',stat = 'identity')+
  coord_polar(theta = 'y')+
  geom_text(aes(label = paste0(round(share,1),'%')),
            position = position_stack(vjust = .5))+
  labs(x = NULL, y = NULL, fill = NULL, 
       title = "Injury proportion in young adults")+
  scale_fill_npg()+
  theme_classic() +
  theme(axis.line = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(hjust = 0.5, color = "#666666"))

injuries2 <- injuries %>% mutate(type = factor(type,
                                              levels = type[order(counts)])) 
p2 <- ggplot(injuries2,aes('', y = counts, fill = type))+
  geom_bar(width = 1, size = 1, color = 'white',stat = 'identity')+
  coord_polar(theta = 'y')+
  geom_text(aes(label = paste0(round(share,1),'%')),
            position = position_stack(vjust = .5))+
  labs(x = NULL, y = NULL, fill = NULL, 
       title = "Injury proportion in young adults")+
  scale_fill_npg()+
  theme_classic() +
  theme(axis.line = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(hjust = 0.5, color = "#666666"))+
  guides(fill = guide_legend(reverse = TRUE))

plot_grid(p1,p2,ncol=2,labels = LETTERS[1:2],align = c('v','h'))

3.6.2玫瑰图:饼图x轴进行真实变量映射

ggplot(injuries,aes(x = type, y = counts, fill = type))+
  geom_bar(width = 1,color = 'black',stat = 'identity')+
  coord_polar(theta = 'x')+
  geom_text(aes(label = paste0(round(share,1),'%'),
                y = counts+2))+
  labs(x = NULL, y = NULL, fill = NULL, 
       title = "Injury proportion in young adults")+
  scale_fill_tron()+
  theme_bw() +
  theme(axis.text.x = element_text(size = 13, color='black',
                                   angle = seq(-30,-330,len = 5)),
        plot.title = element_text(hjust = 0.5, color = "#666666"),
        panel.border = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        legend.position = 'none')

3.6.3戒指图：设置条柱的宽度width及x轴取值范围

install.packages('ggpubr')
library(ggpubr)

ggdonutchart(injuries,'share',
             label = paste0(round(injuries$share),'%'),
             fill = 'type',color = 'white',
             palette = 'lancet')

3.6.4.树图

install.packages('treemapify')
library(treemapify)
set.seed(2019)
diseases <- data.frame(type = rep(c('CVD','Infections','Cancer',
                            'Metabolic','Digestive','CNS'),
                            times = c(5,5,5,3,4,6)),
                   diseases = c('Heart attack','Stroke','CHD',
                                'Arrhythmia','Heart failure',
                                'Hepatitis','Malaria','HIV',
                                'TB','Influenza','HCC','CRC',
                                'Lung','Gastric','Breast',
                                'Diabetes','Hypertention',
                                'Mucolipidoses','IBD','Celiac Disease',
                                "Crohn's Disease",'Diarrhea',
                                'Autism','ADHD','Depression','Meningitis',
                                'Migraine','GBM'),
                   freqs = sample(20:60,28))

ggplot(diseases,aes(area = freqs,fill = type,
                    label = diseases, subgroup = type))+
  geom_treemap(color = 'gray20')+
  geom_treemap_subgroup_border() +
  geom_treemap_subgroup_text(place = "centre", grow = T, 
                             alpha = 0.5, colour ="black", 
                             fontface = "italic", min.size = 0) +
  geom_treemap_text(colour = "white", place = "topleft", reflow = T)+
  scale_fill_brewer(palette = 'Set2')

3.7 离散型数据分布: geom_hline()+geom_bar()

3.7.1 堆栈式条图

injuries <- tibble(age = rep(c('Young adults',
                               'Middle-aged people',
                               'the elders'),each=5),
                   type = rep(c('Road injury','Self-harm','CVD',
                                'Cancers','Infectious Diseases'),3),
                   counts = c(214,123,69,24,17,
                              129,110,201,101,45,
                              56,32,212,189,78))
injuries <- injuries %>% mutate(age = factor(age, 
                                             levels = c('Young adults',
                                                        'Middle-aged people',
                                                        'the elders')))
injuries <- injuries %>%
  mutate(type = factor(type,levels = c('CVD','Road injury','Cancers',
                                       'Self-harm','Infectious Diseases')))
p1 <- ggplot(injuries,aes(type,weight = counts,fill = age))+
  geom_hline(yintercept = seq(100,400,100),color = 'gray')+
  geom_bar(color = 'black',width = .7,position = 'stack')+
  scale_fill_brewer(palette = 'Accent')+
  scale_y_continuous(expand = c(0,0))+
  theme_classic()+
  theme(axis.text.x = element_text(angle = 45,hjust=1))

p2 <- ggplot(injuries,aes(type,weight = counts,fill = age))+
  geom_bar(color = 'black',width = .7,position = 'fill')+
  scale_fill_brewer(palette = 'Accent')+
  scale_y_continuous(expand = c(0,0))+
  theme_classic()+
  theme(axis.text.x = element_text(angle = 45,hjust=1))

plot_grid(p1,p2,ncol=1,labels = c('A','B'))

3.7.2 条图排序，突出最重要的分组

作图前对y轴数值进行排序，order默认从小到大

set.seed(2019)
injuries <- tibble(
                   type = c('Heart attack','Stroke','CHD',
                                'Arrhythmia','Heart failure',
                                'Hepatitis','Malaria','HIV',
                                'TB','Influenza','HCC','CRC',
                                'Lung','Gastric','Breast',
                                'Diabetes','Hypertention',
                                'Mucolipidoses','IBD','Celiac Disease',
                                "Crohn's Disease",'Diarrhea',
                                'Autism','ADHD','Depression','Meningitis',
                                'Migraine','GBM'),
                   counts = sample(20:100,28))
injuries <- injuries %>% mutate(
  type = factor(type,levels = type[order(counts)]))
p1 <- ggplot(injuries,aes(type,weight = counts))+
  geom_hline(yintercept = seq(20,90,10),color = 'gray')+
  geom_bar(color = 'black',width = .7,fill = '#00bdaa',size = .3)+
  scale_y_continuous(expand = c(0,0))+
  theme_classic()+
  theme(axis.text.x = element_text(angle = 45,hjust=1))
injuries2 <- injuries %>% mutate(
  type = factor(type,levels = rev(type[order(counts)])))
# or:
#injuries2 <- injuries %>% mutate(
# type = factor(type,levels = type[order(counts,decreasing = T)]))
p2 <- ggplot(injuries2,aes(type,weight = counts))+
  geom_hline(yintercept = seq(20,90,10),color = 'gray')+
  geom_bar(color = 'black',width = .7,fill = '#a1dd70',size = .3)+
  scale_y_continuous(expand = c(0,0))+
  theme_classic()+
  theme(axis.text.x = element_text(angle = 45,hjust=1))
plot_grid(p1, p2, ncol = 1,labels = c('A','B'),align = c('v','h'))

当有正负值时

set.seed(2019)
df <- tibble(diseases = c('Heart attack','Stroke','CHD',
                          'Arrhythmia','Heart failure',
                          'Hepatitis','Malaria','HIV',
                          'TB','Influenza','HCC','CRC',
                          'Lung','Gastric','Breast',
                          'Diabetes','Hypertention',
                          'Mucolipidoses','IBD','Celiac Disease',
                          "Crohn's Disease",'Diarrhea',
                          'Autism','ADHD','Depression','Meningitis',
                          'Migraine','GBM'),
             trends = rnorm(28,0,2))
df <- df %>% mutate(
  diseases = factor(diseases,levels = rev(diseases[order(trends)])))

ggplot(df,aes(diseases,weight = trends))+
  geom_hline(yintercept = seq(-4,3,1),color = 'gray')+
  geom_bar(color = 'black',width = .6,fill = '#5bd1d7',size = .3)+
  scale_y_continuous(limits = c(-5,3.5))+
  theme_classic()+
  ylab('Temporal trends of diseases')+
  theme(axis.text.x = element_text(angle = 45,hjust=1))

瘦身--棒棒糖图：geom_segment()+geom_point()

ggplot(injuries,aes(y = type,x = counts))+
  geom_segment(aes(y = type, yend = type,x = 0,xend = counts),
               color = 'black')+
  geom_point(color = 'black',fill = '#28c3d4',
             size = 3,shape = 21)+
  scale_y_discrete(expand = c(0.02,0.02))+
  scale_x_continuous(expand = c(0,0.8),
                     breaks = seq(20,100,20),
                     labels = seq(20,100,20))+
  theme_light()+
  ylab('')+
  theme(
    panel.grid.major.y = element_blank(),
    panel.border = element_blank(),
    axis.ticks.y = element_blank())

一个分组由不同亚型组成

set.seed(2019)
diseases <- data.frame(type = rep(c('CVD','Infections','Cancer',
                                    'Metabolic','Digestive','CNS'),
                                  times = c(5,5,5,3,4,6)),
                       disease = c('Heart attack','Stroke','CHD',
                                    'Arrhythmia','Heart failure',
                                    'Hepatitis','Malaria','HIV',
                                    'TB','Influenza','HCC','CRC',
                                    'Lung','Gastric','Breast',
                                    'Diabetes','Hypertention',
                                    'Mucolipidoses','IBD','Celiac Disease',
                                    "Crohn's Disease",'Diarrhea',
                                    'Autism','ADHD','Depression','Meningitis',
                                    'Migraine','GBM'),
                       freqs = sample(20:60,28))
diseases2 <- diseases %>% 
  mutate(disease = 
           factor(disease,levels = disease[order(freqs,decreasing = TRUE)]))
sums <- tapply(diseases$freqs,type,sum)
diseases3 <- diseases2 %>% 
  mutate(type = factor(type,levels = 
                         names(sums)[order(sums,decreasing = T)]),
         disease = interaction(disease,type,drop = TRUE))          
ggplot(diseases3,aes(disease, weight = freqs, fill = type))+
  geom_bar(color = 'black',width = .6,size = .3)+
  scale_y_continuous(expand = c(0,0))+
  scale_x_discrete(labels = gsub('\\.[a-zA-Z]{0,20}','',
                                 levels(diseases3$disease)))+
  scale_fill_lancet()+
  theme_classic()+
  theme(axis.text.x = element_text(angle = 45,hjust=1))

杠铃图geom_segment()+geom_point)()+

df <- tibble(Age = rep(c('0-19','20-34','35-49','50-64','65+'),2),
             Sex = rep(c('Male','Female'),each = 5),
             Values = c(18,21,20,26,29,15,18,15,22,23))
df2 <- df %>% tidyr::spread(key = Sex, value = Values)
p2 <- ggplot()+
  geom_segment(data = df2,aes(x = Female, xend = Male, 
                              y = Age, yend = Age),
               size = 1.5,color = 'gray')+
  geom_point(data = df, aes(Values, Age, fill = Sex),
             shape = 21,size = 3)+
  labs(x = 'Values')+
  scale_fill_jco()+
  theme_light()+
  theme(legend.position = 'top',
        panel.grid.minor=element_blank(),
        panel.grid.major.y=element_blank(),
        panel.grid.major.x=element_line(),
        axis.ticks=element_blank(),
        panel.border=element_blank())
plot_grid(p1, p2, ncol = 2,labels = c('A','B'),align = c('v','h'))

圆心旋转饼图

data <- data.frame(
  individual=paste( "Mister ", seq(1,60), sep=""),
  group=c( rep('A', 10), rep('B', 30), rep('C', 14), rep('D', 6)) ,
  value=sample( seq(10,100), 60, replace=T)
)

empty_bar <- 4
to_add <- data.frame( matrix(NA, empty_bar*nlevels(data$group), ncol(data)) )
colnames(to_add) <- colnames(data)
to_add$group <- rep(levels(data$group), each=empty_bar)
data <- rbind(data, to_add)
data <- data %>% arrange(group)
data$id <- seq(1, nrow(data))

label_data <- data
number_of_bar <- nrow(label_data)
angle <- 90 - 360 * (label_data$id-0.5) /number_of_bar     
label_data$hjust <- ifelse( angle < -90, 1, 0)
label_data$angle <- ifelse(angle < -90, angle+180, angle)

ggplot(data, aes(x=as.factor(id), y=value, fill=group)) +     
  geom_bar(stat="identity", alpha=0.5) +
  ylim(-100,120) +
  theme_minimal() +
  theme(
    legend.position = "none",
    axis.text = element_blank(),
    axis.title = element_blank(),
    panel.grid = element_blank(),
    plot.margin = unit(rep(-1,4), "cm")) +
  coord_polar() + 
  geom_text(data=label_data, aes(x=id, y=value+10, label=individual, 
                                 hjust=hjust), color="black", 
            fontface="bold",alpha=0.6, size=2.5,
            angle= label_data$angle, inherit.aes = FALSE )

金字塔图：

populations <- tibble(age = rep(c("0-4","5-9","10-14","15-19",
                              "20-24","25-29","30-34",
                              "35-39","40-44","45-49","50-54",
                              "55-59","60-64","65-69","70-74",
                              "75-79","80-84",'85+'),2),
                      sex = rep(c('Male','Female'),each = 18),
                      pops = c(91646,65397,75560,108622,245368,
                               278821,196527,153679,153202,248852,
                               388813,350713,241516,130285,103534,
                               119681,76644,47576,
                               85376,62108,72649,107474,238128,269100,
                               192116,159126,156828,240783,378479,
                               361806,225168,123183,112859,149189,
                               105804,78370
                               ))
populations <- populations %>% mutate(
  age = factor(age, levels = c("0-4","5-9","10-14","15-19",
                               "20-24","25-29","30-34",
                               "35-39","40-44","45-49","50-54",
                               "55-59","60-64","65-69","70-74",
                               "75-79","80-84",'85+')),
  pops = pops/1e3,
  pops = ifelse(sex == 'Female',-pops, pops))
library(ggthemes)
ggplot(populations,aes(x =age,y=pops,fill=sex,width=0.8)) +
  coord_flip() +
  geom_bar(data=subset(populations,sex=="Female"),stat = "identity") +
  geom_bar(data=subset(populations,sex=="Male"), stat = "identity") +
  scale_y_continuous(breaks = seq(-400,400,length=9),
                     labels = paste0(as.character
                                     (c(abs(seq(-400,400,length=9)))), "k"), 
                     limits = c(-450, 450)) +
  theme_economist_white(horizontal = FALSE) +
  scale_fill_economist() + 
  labs(title="2010",
       y="Population",x="Age") + 
  guides(fill=guide_legend(reverse = TRUE))+
  theme(
    legend.position =c(0.8,0.9),
    legend.title = element_blank(),
    plot.title = element_text(size=20),
    plot.caption = element_text(size=12,hjust=0)
  )

3.8 离散型数据相关性可视化

韦恩图-VennDiagram包

install.packages('VennDiagram')
library(VennDiagram)
set.seed(2019)
SNP_pop_1=paste(rep("SNP_" , 200) , sample(c(1:1000) , 
                                           200 , replace=F) , sep="")
SNP_pop_2=paste(rep("SNP_" , 200) , sample(c(1:1000) , 
                                           200 , replace=F) , sep="")
SNP_pop_3=paste(rep("SNP_" , 200) , sample(c(1:1000) , 
                                           200 , replace=F) , sep="")
SNP_pop_4=paste(rep("SNP_" , 200) , sample(c(1:1000) , 
                                           200 , replace=F) , sep="")

venn.diagram(
  x = list(SNP_pop_1 , SNP_pop_2 , SNP_pop_3, SNP_pop_4),
  category.names = c("SNP pop 1" , "SNP pop 2 " , "SNP pop 3","SNP_pop_4"),
  filename = 'Venn.png',
  output = T ,
  lty = 'blank',
  fill = c('#07689f','#a2d5f2','#ff7e67','#10ddc2')
)

upset图-ggupset包

install.packages('ggupset')
library(ggupset)

热图heatmap()函数或pheatmap包

set.seed(2019)
df <- tibble(cancers = rep(c('Bladder','Bone','CNS','Breast','Cervix uteri',
                         'CRC','Lung','Liver','Prostate','Kidney',
                         'Eye','Skin','Thyroid','HLM','GBM'),3),
             regions = rep(c('Region A','Region B','Region C'),each = 15),
             incidence = sample(10:100,45))
#16-rank(incidence)显示肿瘤在不同regions从大到小排列，需要根据自己的数据更改
df <- df %>% group_by(regions) %>% mutate(ranks = 16-rank(incidence))

ggplot(df, aes(cancers, regions,fill = incidence))+
  geom_tile(color = 'gray20',size = .4)+
  theme_gray()+
  scale_x_discrete(expand = c(0,0)) + 
  scale_y_discrete(expand = c(0,0))+
  theme(axis.text.x = element_text(angle = 45,hjust=1))+ 
  labs(x = "", y = "") + 
  geom_text(aes(label = ranks))+
  scale_fill_gradient(low = '#fff5f0',high = '#cb181d')

3.9 文字注释：叠加文本 text或annotate()函数

1.ggrepel包中geom_text_repel()函数加标签

set.seed(2019)
df <- tibble(gene = paste0('gene',1:5000),
             expressions = c(rnorm(4995,3,0.5),
                             5.5,6,7,8,9),
             position = sample(1:100,5000,replace = T))

df <- df %>% mutate(expressions = ifelse(expressions<0,0.01,expressions),
                    level = ifelse(expressions>5,'High','Low'))

install.packages('ggrepel')
library(ggrepel)

ggplot(df, aes(x = position, y= expressions,
               fill = level,size = level))+
  geom_hline(yintercept = 5,color = 'gray',linetype =2)+
  geom_jitter(shape = 21, color = 'black')+
  scale_fill_manual(values = c('blue','gray'))+
  scale_size_manual(values = c(4,2))+
  geom_text_repel(data = df[df$expressions>=5,],
            aes(x = position, y= expressions,label = gene))+
  theme(legend.position = 'none')

2.gghighlight包高亮文本

install.packages('gghighlight')
library(gghighlight)

gghighlight_point(df,aes(position, expressions, color = level,
                         size = level),
                  expressions > 5)+
  geom_hline(yintercept = 5,color = 'gray',linetype =2)+
  scale_color_manual(values = c('blue','gray'))+
  scale_size_manual(values = c(4,2))+
  theme(legend.position = 'none')

grid包在绘图区域外添加文本

annotate()函数可以标记多种元素，如文本、矩形、散点等

ggplot(mtcars, aes(x = wt, y = mpg)) + 
  geom_point(shape = 21, color = 'black',fill = '#ff8a5c',size=4)+
  geom_smooth(method = lm)+
  annotate("text", x = 5, y = 32, label = "italic(R) ^ 2 == 0.75",
           parse = TRUE,size = 6, color = 'blue')+
  theme_bw()

3.10 p值标注-ggsignif, ggpval, ggstatsplot包

ggsignif

install.packages("ggsignif")
library(ggsignif)

set.seed(2019)
df <- tibble(group = rep(paste('Group',LETTERS[1:4]),times = c(10,8,10,13)),
             values = c(rnorm(10,15,2),rnorm(8,14,2),
                        rnorm(10,8,1),rnorm(13,5,2)))

p1 <- ggplot(df,aes(group,values))+
  geom_boxplot()+
  geom_jitter(size = 2,shape = 21,fill = 'gray40')+
  geom_signif(comparisons = list(c('Group A','Group B'),
                                 c('Group A','Group C'),
                                 c('Group A','Group D')),
              map_signif_level = FALSE, textsize=6, #TRUE标记星号
              test = t.test,
              step_increase = 0.2) +
  ylim(NA, 25)+
  theme_classic()
  plot_grid(p1,p2,ncol=2,align = 'h',labels = c('A','B'))

ggstatsplot

install.packages("ggstatsplot")
library(ggstatsplot)
ggbetweenstats(
  data = iris,
  plot.type = 'box',
  x = Species,
  y = Sepal.Length,
  pairwise.comparisons = T,
  mean.plotting = F,
  messages = T,
  type = 'p',
  palette = 'Set1') + 
  scale_y_continuous(breaks = seq(3, 8, by = 1))+
  theme_classic()+
  theme(legend.position = c(.11,.8))

3.11 seqlogo与进化树

seqlogo（基序图）：ggseqlogo包，需要将序列转化字符串

不同位置展示该位置上碱基的构成比例，字母越大则表示该碱基占比越大。也可以用来展示蛋白序列。

install.packages('ggseqlogo')
library(ggseqlogo)

data(ggseqlogo_sample)
ggseqlogo( seqs_dna$MA0001.1 )

ggplot()+
  annotate("rect", xmin = 0.5, xmax = 3.5, ymin = -0.05, ymax = 1.9, 
           alpha=0.1, col="black", fill="yellow")+
  geom_logo(seqs_dna$MA0001.1, stack_width = 0.9,
            col_scheme = 'nucleotide2')+
  annotate("segment", x=4, xend = 8, y=1.2, yend = 1.2, size=2)+
  annotate("text", x=6, y=1.3, label="Text annotation")+
  theme_logo()

进化树-ggtree包

环形结构进化树

nwk <- system.file("extdata", "sample.nwk", package="treeio")
tree <- read.tree(nwk)
circ <- ggtree(tree, layout = "circular")
df <- data.frame(first=c("a", "b", "a", "c", "d", "d", 
                         "a", "b", "e", "e", "f", "c", "f"),
                 second= c("z", "z", "z", "z", "y", "y", 
                           "y", "y", "x", "x", "x", "a", "a"))
rownames(df) <- tree$tip.label

set.seed(2019)
df2 <- as.data.frame(matrix(rnorm(39), ncol=3))
rownames(df2) <- tree$tip.label
colnames(df2) <- LETTERS[1:3]

p1 <- gheatmap(circ, df, offset=.8, width=.2,
               colnames_angle=95, colnames_offset_y = .25) +
  scale_fill_viridis_d(option="D", name="discrete\nvalue")

library(ggnewscale)
p2 <- p1 + new_scale_fill()
p3 <- gheatmap(p2, df2, offset=15, width=.3,
         colnames_angle=90, colnames_offset_y = .25) +
  scale_fill_viridis_c(option = 'C',name="continuous\nvalue")

plot_grid(p1,p3,ncol = 2,labels = c('A','B'))

和玹图-chordDiagram()函数，输入数据为数据框或矩阵

library(circlize)

set.seed(2019)

numbers <- sample(c(1:1000), 100, replace = T)
data <- matrix( numbers, ncol=5)
rownames(data) <- paste0("Set-", seq(1,20))
colnames(data) <- paste0("Pair-", seq(1,5))

chordDiagram(data, transparency = 0.5)

元素增多

circos.par("track.height" = 0.4)
set.seed(2019)
data = data.frame(
  factor = sample(letters[1:8], 1000, replace = TRUE),
  x = rnorm(1000),
  y = runif(1000)
)


circos.initialize( factors=data$factor, x=data$x )

circos.trackPlotRegion(factors = data$factor, y = data$y, 
                       panel.fun = function(x, y) {
                         circos.axis()
                       })
circos.trackPoints(data$factor, data$x, data$y, col="deeppink",pch = 16)
circos.trackHist(data$factor, data$x, col="#69b3a2")

4.配色-RColorBrewer, ggsci, paletteer包

配色网站：https://colorbrewer2.org/

4.1离散色-离散变量上色

ggsci包匹配期刊配色，但不一定符合杂志最新配色要求

RColorBrewer包集成到ggplot2中，使用scale_fill_brewer()函数即可。推荐使用Set2,Set3,Accent配色板

paletteer包含25个包，869种配离散色

devtools::install_github("EmilHvitfeldt/paletteer")
library(paletteer)

p1 <- ggplot(dsamp, aes(carat, price, fill = cut)) +
    geom_point(shape = 21, color = 'black', size = 3) +
    scale_fill_paletteer_d("awtools::mpalette") +
    theme_classic() +
    labs(title = 'awtools-mpalette', x = '', y = '') +
    theme(legend.position = c(.15,.8),
          legend.background = element_rect(fill = NA),
          plot.title = element_text(hjust = 0.5))
p1

scale_fill_manual()函数自定义

4.2 连续色-paletteer包中scale_fill_paletteer_c()函数即可调用。

5.主题选择与整体布局

展示数据大小，选择网格背景，方便比较大小；

展示数据相关性，选择空白背景的test和classic主题。

set.seed(2019)
dsamp <- diamonds[sample(nrow(diamonds), 1000), ]

P1 <- ggplot(dsamp,aes(carat, price, fill = cut))+
  geom_point(shape = 21, color = 'black', size = 3)+
  scale_fill_manual(name = '',
                    values = c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99'))+
  labs(title = 'default theme',x='',y='')+
  theme(legend.position = c(.15,.8),
        legend.background = element_rect(fill = NA,color = NA),
        plot.title = element_text(hjust = 0.5))

P2 <- ggplot(dsamp,aes(carat, price, fill = cut))+
  geom_point(shape = 21, color = 'black', size = 3)+
  scale_fill_manual(name = '',
                    values = c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99'))+
  labs(title = 'bw theme',x='',y='')+
  theme_bw()+
  theme(legend.position = c(.15,.8),
        legend.background = element_rect(fill = NA,color = NA),
        plot.title = element_text(hjust = 0.5))

5.2 图例位置、大小应与插图搭配-theme()函数

set.seed(2019)
id <- 1:10
df <- tibble(x = rep(id,2),
             y = c(id*1.8+rnorm(10,0,1),
                   x=id*3+rnorm(10,0,1)),
             group = rep(c('Group A','Group B'),each = 10))

p1 <- ggplot(df, aes(x,y,fill = group,group = group))+
  geom_line()+
  geom_point(shape = 21, color = 'gray40', size = 3)+
  theme_classic()

p2 <- ggplot(df, aes(x,y,fill = group,group = group))+
  geom_line()+
  geom_point(shape = 21, color = 'gray40', size = 3)+
  theme_classic()+
  theme(legend.position = c(.15,.85))图例位置

5.3多面板图facet_wrap(~clarity,ncol = 3)分三列面板

5.4元素大小

散点图geom_point()函数中size参数调整

柱状图geom_bar()函数中width参数调整

scale_size_continuous()函数调整size属性

6.细节修饰-标度函数

7.图片拼接与导出

7.1对齐与拼接

plot_grid()函数拼接，align = c('v','h')参数纵横向对齐

patchwork包-支持多层嵌套拼图，直接+相连

7.2导出-'Cairo'包，export包(导出PPT)

CairoPDF('file_name.pdf',height = 6,width = 6)
graph2ppt(object)

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

学习笔记

原创声明：本文系作者授权腾讯云开发者社区发表，未经许可，不得转载。

如有侵权，请联系 cloudcommunity@tencent.com 删除。

学习笔记

登录后参与评论

暂无评论

编辑精选文章

换一批

万字详解高可用架构设计

5218

Go 开发者必备：Protocol Buffers 入门指南

亿级月活的社交 APP，陌陌如何做到 3 分钟定位故障？

3121

60页PPT全解：DeepSeek系列论文技术要点整理

4448

全网最全的R语言基础图形合集

r 语言数据可视化

直方图是一种对数据分布情况进行可视化的图形，它是二维统计图表，对应两个坐标分别是统计样本以及该样本对应的某个属性如频率等度量。

生信学习者

2024/06/12

1520

文献组图

数据分析

追风少年i

2025/01/07

1460

R绘图笔记 | 二维散点图与统计直方图组合

r 语言

前面介绍了散点图、柱状图、直方图和核密度估计图，有时候散点图不能很直观的看的出数据的分布情况，这里介绍散点图与统计直方图组合绘制。

DoubleHelix

2020/11/11

1.8K0

生物信息数据分析教程视频——10-TCGA数据库:miRNA的表达探索

视频

视频地址：http://mpvideo.qpic.cn/0bc3ueaacaaagqalujrtqfrvbiodagqqaaia.f10002.mp4? 代码： library(TCGAbioli

DoubleHelix

2022/12/15

7710

ggplot2|详解八大基本绘图要素

编程算法

ggplot2是由Hadley Wickham创建的一个十分强大的可视化R包。按照ggplot2的绘图理念，Plot(图)= data(数据集)+ Aesthetics(美学映射)+ Geometry(几何对象)。本文将从ggplot2的八大基本要素逐步介绍这个强大的R可视化包。

生信补给站

2020/08/05

7.7K0

🤒 geomtextpath | 成功让你的ggplot注释拥有傲人曲线！~

contour label lines smoothing text

1写在前面最近的世界杯结果的确是让人大跌眼镜🕶️, 日本队🇯🇵先后击败世界杯冠军, 德国队🇩🇪和西班牙队🇪🇸, 韩国队🇰🇷逆转葡萄牙🇵🇹, 踩着乌拉圭🇺🇾进入淘汰赛(请韩国队🇰🇷自觉感谢裁判), 让无数人站上天台😂. 不过大家要是看看这几十年日本足球⚽️的发展也就不会觉得奇怪了, 就算有一天日本队将梦想照进现实,捧起大力神杯🏆, 我也不觉得有什么奇怪的. 还是祝各亚洲球队取得好成绩, 也祝梅西和C罗在顶峰相遇, 人生不留遗憾😘. ---- 接着是这一期的教程, 最近用了一下geomtextpath, 是个

生信漫卷

2023/02/24

4270

R可视乎 | 散点图系列（2）

机器学习编程算法数据可视化 https 网络安全

散点图（scatter graph、point graph、X-Y plot、scatter chart ）是科研绘图中最常见的图表类型之一，通常用于显示和比较数值。散点图是使用一系列的散点在直角坐标系中展示变量的数值分布。在二维散点图中，可以通过观察两个变量的数据变化，发现两者的关系与相关性。

庄闪闪

2021/04/09

1.3K0

R语言ggplot2画热图添加分组信息的颜色条

腾讯云开发者社区

之前有人在公众号留言问文章开头这幅图如何实现，下面的B图是折线图加柱形图，相对比较容易实现，上面的A图稍微有点复杂，我想到的办法是拼图，图A可以看成三个热图，然后加一个堆积柱形图，最后将四个图组合到一起。那就按照这个思路试一下看能不能实现。最初的想法是左侧的颜色条用堆积柱形图来实现，又看了一遍Y叔公众号关于aplot这个包的推文，发现他是用geom_tile()函数实现的，仔细想想还是geom_tile()函数实现起来比较方便。首先解决昨天的遗留问题：ggplot2画图添加文字内容的时候如何添加下划线

用户7010445

2021/01/06

5.5K0

你还缺scRNA-seq的workflow吗？

数据 cell data scale workflow

之前曾老师给我看了一位在pipebio工作的生信工程师Roman Hillje的scRNA-seq的workflow，今天整理一下分享给大家。

生信菜鸟团

2024/07/31

5030

kaggle案例重复：科比的投篮选择之二

kaggle

这里用到 group_by()和 summarise()函数。一个简单的小例子理解这两个函数的用法

用户7010445

2020/03/03

7080

67-R可视化11-用ggrepel更加美观的添加标记（火山图的实现）

grep github https git 开源

上一讲我们提到了66-R可视化10-自由的在ggplot上添加文本（柱状图加计数）[2]

北野茶缸子

2021/12/17

4.6K0

ggplot2 核密度图和直方图

alpha histogram plot size

color, size, linetype: 同上 fill: 填充 alpha: 透明度

火星娃统计

2020/09/15

5.3K0

生物信息数据分析教程视频——07-TCGA数据库：基因的表达探索

sample 函数教程视频

视频地址：http://mpvideo.qpic.cn/0b2ewiaakaaahmalygztmbrvbmwdawzaabia.f10002.mp4? 参考文章：【0代码】单基因泛癌分析教程视频

DoubleHelix

2022/12/15

7510

ggplot2轻松绘制误差线点图与箱线图

element ggplot2 legend text 数据

R语言数据分析指南

2023/08/18

5900

免疫浸润结果可视化

aes cell 基础可视化数据

在之前的推文中我们介绍了2行代码实现9种免疫浸润方法，今天给大家介绍下常见的免疫浸润结果的可视化。

医学和生信笔记

2023/08/30

1.4K0

R语言聚类树图小例子

http

将层级聚类的结果转化为ggdendro作图需要的格式，用到的函数是dendro_data(hc,type="rectangle") type有两个参数可选

用户7010445

2020/03/03

1.1K0

scRNA分析| Seurat堆叠小提琴图不满足？那就ggplot2 堆叠各种元素

element ggplot2 函数数据优化

单细胞常见的可视化方式有DimPlot，FeaturePlot ，DotPlot ，VlnPlot 和 DoHeatmap几种，Seurat均可以实现，但文献中的图大多会精美很多。比如

生信补给站

2023/08/25

5K0

scRNA分析| Seurat堆叠小提琴图不满足？那就ggplot2 堆叠各种元素

[NC图表复现]全球植物病原真菌分布图

element 地图论文数据图表

R语言数据分析指南

2023/11/23

3100

Nature图表复现 | 组合绘制个性化热图

数据可视化对象论文数据图表

R语言数据分析指南

2023/09/20

6051

绘图技巧 | 不同样式华夫饼图(Waffle charts)绘制技巧

charts ggplot2 label point scale

今天这篇推文，小编为大家介绍华夫饼图(Waffle charts) 的绘制方法，这次主要介绍使用R-waffle包进行绘制。主要内容如下：

DataCharm

2022/05/24

1.8K0