前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >生信绘图与配色

生信绘图与配色

原创
作者头像
用户11008504
修改2024-07-02 18:42:22
1110
修改2024-07-02 18:42:22

本文资料与代码来源于医学方出品的《SCI论文绘图之道》,转载请说明。

一、前言

高水平的SCI插图有如下的特点:

1.形式丰富且合适;

2.信息直观,每张图最多反应2-3个信息;

3.根据期刊要求使用颜色、形状、大型;

4.布局不能留太多空白和拥挤;

5.必要时添加辅助线帮助阅读。

如何绘制:

1.理解数据

2.绘制草图

3.根据期刊要求准备图片配色、格式、分辨率

二、ggplot2的绘图原理

2.1 散点图

代码语言:r
复制
#install.packages(c('tidyverse','ggsci'))
library(tidyverse)
library(ggsci)

ggplot(mtcars,aes(mpg,qsec,color = factor(cyl)))+
 size = 3, alpha = .6)+
  theme_bw()+
  scale_color_lancet()+
  scale_x_continuous()+
  scale_y_continuous()
元素组成
元素组成

元素组成:

1.画布:ggplot()函数,mtcars为画图数据

2.横纵坐标:mpg,qsec两个变量

3.散点- 几何对象: geom_point()函数,size,alpha为控制点属性的参数

4.散点颜色- 变量映射:color = factor(cyl):把cyl这个变量因子化,不同的颜色表示变量的分类水平

5.图例- 变量映射的产物

6.背景网路:theme_bw()完成,主题函数

7.坐标:横纵坐标,包括坐标轴上的刻度。

最基本的四要素:

1)绘图数据:mtcars

2)画布:由geom_point()函数完成

3)映射:由aes()函数完成

4)几何对象:由 geom_point()函数完成

ggplot2的基本结构:

1)变量映射系统

2)几何对象系统

3)标度系统

4)主题系统

三、常见图形绘制

连续型数据:某个区间内的任意值都可以取的数据,特点是可以进行 无限的分割和测量,两个相邻的值之间可能存在无数个中间值。

离散型数据则是指只能取到有限个数或者是可数个数的数据,通常以整数表示。

3.1 单个连续型变量

常用:盒型图和小提琴图,在纵坐标上展示数据

离散型变量(分组变量)+连续变量

3.2 两个或多个连续型变量

散点图图形-shape参数
散点图图形-shape参数

ggplot中颜色实现两种方式:

1.变量映射(取值越大颜色越深)

2.人为定义:颜色参数设置为某个颜色

代码语言:r
复制
mtcars <- mtcars %>% as_tibble() %>% mutate(cyl = factor(cyl),
                                            carb = factor(carb))
set.seed(2019)
dsamp <- diamonds[sample(nrow(diamonds), 1000), ]
p1 <- ggplot(mtcars,aes(mpg,qsec,fill = cyl))+
  geom_point(size = 3,shape = 21, color = 'black')+
  theme_classic()+
  theme(legend.position = c(.1,.8),#图标的位置
        legend.background = element_rect(color = 'black'))#图标的颜色

p2 <- ggplot(mtcars,aes(mpg,qsec,fill = carb))+
  geom_point(size = 3,shape = 21, color = 'black')+
  theme_classic()+
  theme(legend.position = c(.1,.7),
        legend.background = element_rect(color = 'black'))
#plot_grid()函数用于多个 ggplot 对象(即图表)排列组合,align参数设置对齐方式 ,v表示垂直对齐,h表示水平对齐。
plot_grid(p1,p2,ncol=2,labels = c('A','B'),align = c('v','h'))

散点图颜色:标度函数-scale_fill(); scale_color(),分为连续色和离散色。

1.连续色:由scale_fill函数第三个词决定,如scale_fill_gradient表示连续色

代码语言:r
复制
dsamp <- dsamp %>% mutate(depth = depth - median(depth))
p1 <- ggplot(dsamp, aes(carat, price)) +
  geom_point(aes(fill = depth),shape=21, color = 'black',size = 2)+
  scale_fill_gradient2(low = '#41ab5d',mid = 'white',high = '#e31a1c',
                        midpoint = 0)+
  theme(legend.position = c(.1,.8))

p2 <- ggplot(dsamp, aes(carat, price)) +
  geom_point(aes(fill = depth),shape=21, color = 'black',size = 2)+
  scale_fill_gsea()+
  theme(legend.position = c(.1,.8))

plot_grid(p1,p2,ncol=2,labels = c('A','B'),align = c('v','h'))

2.离散色:使用配色包,如ggsci包,提供了一组科学期刊风格的配色方案

3.3 连续型变量绘图优化

气泡图

代码语言:r
复制
set.seed(2019)
dsamp <- diamonds[sample(nrow(diamonds), 500), ]
dsamp <-  dsamp %>% mutate(x = exp(x))

p1 <- ggplot(dsamp, aes(carat, price)) +
  geom_point(aes(size = x,fill = cut),shape=21, 
             color = 'black')+
  scale_fill_brewer(palette = 'Set2')

p2 <- ggplot(dsamp, aes(carat, price)) +
  geom_point(aes(size = log(x),fill = cut),shape=21, 
             color = 'black')+
  scale_fill_brewer(palette = 'Set2')
#RColorBrewer配色包--颜色区分较大,适合分类变量,如 Set1, Set2, Set3, Accent, Paired, Pastel1, Pastel2 等
plot_grid(p1,p2,ncol=2,labels = c('A','B'),align = c('v','h'))

二维密度图:适合大数据量,stat_density_2d()函数

代码语言:r
复制
a <- data.frame( x=rnorm(20000, 10, 1.9), y=rnorm(20000, 10, 1.2) )
b <- data.frame( x=rnorm(20000, 14.5, 1.9), y=rnorm(20000, 14.5, 1.9) )
c <- data.frame( x=rnorm(20000, 9.5, 1.9), y=rnorm(20000, 15.5, 1.9) )
data <- rbind(a,b,c)
p2 <- ggplot(data, aes(x=x, y=y) ) +
  stat_density_2d(aes(fill = ..level..), geom = "polygon")+
  theme_bw()+
  scale_fill_gradient(low = '#edf8e9',high = '#238b45')+
  theme(panel.grid = element_blank())

散点+密度图:ggExtra包

代码语言:r
复制
install.packages("ggExtra")
library(ggExtra)

piris <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_point(shape=21, size = 3,stroke = 1.2)+
  scale_color_npg()+
  theme_bw()+
  theme(legend.position = c(.1,.86))
ggMarginal(piris, type= 'density',
           groupFill = TRUE)

层峦叠峰:ggridges包,geom_density_ridges()函数

代码语言:r
复制
install.packages('ggridges')
library(ggridges)

ggplot(dsamp,aes(x = carat,y = clarity, fill = clarity))+
  geom_density_ridges(alpha = .5)+
  scale_fill_brewer(palette = 'Set3')+
  theme_classic()

3.4 两个连续型变量相关性图-ggcorrplot包

代码语言:R
复制
install.packages("ggcorrplot")
library(ggcorrplot)
mtcars <- mtcars %>% select(c('mpg','disp','hp','drat','wt','qsec'))
corr <- round(cor(mtcars), 1)
p1 <- ggcorrplot(
  corr,
  type = "lower",
  outline.color = "white",
  colors = c("#6D9EC1", "white", "#E46726")
)

p2 <- ggcorrplot(
  corr,
  type = "upper",
  outline.color = "white",
  colors = c("#084594", "white", "#ef3b2c")
)

p3 <- ggcorrplot(corr,
           type = "lower",
           lab = TRUE)

p.mat <- cor_pmat(mtcars)

p4 <- ggcorrplot(corr,
           type = "lower",
           p.mat = p.mat)

plot_grid(p1,p2,p3,p4,ncol=2,labels = LETTERS[1:4],align = c('v','h'))

3.5 连续型变量趋势图

3.5.1线图:geom_line()+geom_point()函数

代码语言:R
复制
set.seed(2020)
id <- 1:8
dat <- tibble(x = rep(id,2),
              y = c(2*id+rnorm(8,0,1),
                    6*id+rnorm(8,0,2)),
              group = rep(c('Group1','Group2'),each = 8))

p1 <- ggplot(dat,aes(x, y, color = group))+
  geom_line(size = .8)+
  geom_point(shape = 21,color = 'black',size=3,fill = 'white')+
  scale_color_d3()+
  theme_classic()+
  theme(legend.position = c(.15,.85))

p2 <- ggplot(dat,aes(x, log(y), color = group))+
  geom_line(size = .8)+
  geom_point(shape = 21,color = 'black',size=3,fill = 'white')+
  scale_color_d3()+
  theme_classic()+
  theme(legend.position = c(.15,.85))

p3 <- ggplot(dat,aes(x, y))+
  geom_line(size = .8,color = '#2b8cbe')+
  geom_point(shape = 21,color = 'black',size=3,fill = 'white')+
  theme_classic()+
  facet_wrap(~group,ncol = 2,scales = 'free_y')+
  theme(legend.position = c(.15,.85),
        strip.background = element_blank(),
        strip.text = element_text(size = 12))

p4 <- plot_grid(p1,p2,ncol=2,labels = LETTERS[1:2],align = c('v','h'))
plot_grid(p4,p3,ncol=1,labels = c('','C'),align = c('v','h'))

3.5.2配比坡度图

3.5.3面积图 geom_area()函数

3.5.4平滑曲线 geom_smooth()函数

3.5.5生存曲线 survminer,survival包

3.5.6三元图 ggtern包

与等高线,三位密度图结合

代码语言:R
复制
install.packages('ggtern')
library(ggtern)
set.seed(2019)
a <- tibble(x=rnorm(20000, 80, 30), 
                 y=rnorm(20000, 100, 30), 
                 z = rnorm(20000, 60, 20))

ggtern(data=a,aes(x,y,z))+
  stat_density_tern(aes(fill = ..level..,alpha = ..level..),
                    geom = 'polygon')+
  scale_fill_gradient(low = '#fcbba1',high = '#ef3b2c')+
  theme_showarrows()+
  guides(color = "none", fill = "none", alpha = "none")  

3.5.7 雷达图:ggradar包

3.6 离散型数据构成可视化

3.6.1.饼图:ggplot中由条形图变形而来,条图和坐标系转换而成

代码语言:R
复制
injuries <- tibble(type = c('Road injury','Self-harm','CVD',
                            'Cancers','Infectious Diseases'),
                   counts = c(214,123,69,24,17),
                   share = counts/sum(counts)*100)

p1 <- ggplot(injuries,aes('', y = counts, fill = type))+
  geom_bar(width = 1, size = 1, color = 'white',stat = 'identity')+
  coord_polar(theta = 'y')+
  geom_text(aes(label = paste0(round(share,1),'%')),
            position = position_stack(vjust = .5))+
  labs(x = NULL, y = NULL, fill = NULL, 
       title = "Injury proportion in young adults")+
  scale_fill_npg()+
  theme_classic() +
  theme(axis.line = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(hjust = 0.5, color = "#666666"))

injuries2 <- injuries %>% mutate(type = factor(type,
                                              levels = type[order(counts)])) 
p2 <- ggplot(injuries2,aes('', y = counts, fill = type))+
  geom_bar(width = 1, size = 1, color = 'white',stat = 'identity')+
  coord_polar(theta = 'y')+
  geom_text(aes(label = paste0(round(share,1),'%')),
            position = position_stack(vjust = .5))+
  labs(x = NULL, y = NULL, fill = NULL, 
       title = "Injury proportion in young adults")+
  scale_fill_npg()+
  theme_classic() +
  theme(axis.line = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        plot.title = element_text(hjust = 0.5, color = "#666666"))+
  guides(fill = guide_legend(reverse = TRUE))

plot_grid(p1,p2,ncol=2,labels = LETTERS[1:2],align = c('v','h'))

3.6.2玫瑰图:饼图x轴进行真实变量映射

代码语言:R
复制
ggplot(injuries,aes(x = type, y = counts, fill = type))+
  geom_bar(width = 1,color = 'black',stat = 'identity')+
  coord_polar(theta = 'x')+
  geom_text(aes(label = paste0(round(share,1),'%'),
                y = counts+2))+
  labs(x = NULL, y = NULL, fill = NULL, 
       title = "Injury proportion in young adults")+
  scale_fill_tron()+
  theme_bw() +
  theme(axis.text.x = element_text(size = 13, color='black',
                                   angle = seq(-30,-330,len = 5)),
        plot.title = element_text(hjust = 0.5, color = "#666666"),
        panel.border = element_blank(),
        axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        legend.position = 'none')

3.6.3戒指图:设置条柱的宽度width及x轴取值范围

代码语言:R
复制
install.packages('ggpubr')
library(ggpubr)

ggdonutchart(injuries,'share',
             label = paste0(round(injuries$share),'%'),
             fill = 'type',color = 'white',
             palette = 'lancet')

3.6.4.树图

代码语言:r
复制
install.packages('treemapify')
library(treemapify)
set.seed(2019)
diseases <- data.frame(type = rep(c('CVD','Infections','Cancer',
                            'Metabolic','Digestive','CNS'),
                            times = c(5,5,5,3,4,6)),
                   diseases = c('Heart attack','Stroke','CHD',
                                'Arrhythmia','Heart failure',
                                'Hepatitis','Malaria','HIV',
                                'TB','Influenza','HCC','CRC',
                                'Lung','Gastric','Breast',
                                'Diabetes','Hypertention',
                                'Mucolipidoses','IBD','Celiac Disease',
                                "Crohn's Disease",'Diarrhea',
                                'Autism','ADHD','Depression','Meningitis',
                                'Migraine','GBM'),
                   freqs = sample(20:60,28))

ggplot(diseases,aes(area = freqs,fill = type,
                    label = diseases, subgroup = type))+
  geom_treemap(color = 'gray20')+
  geom_treemap_subgroup_border() +
  geom_treemap_subgroup_text(place = "centre", grow = T, 
                             alpha = 0.5, colour ="black", 
                             fontface = "italic", min.size = 0) +
  geom_treemap_text(colour = "white", place = "topleft", reflow = T)+
  scale_fill_brewer(palette = 'Set2')

3.7 离散型数据分布: geom_hline()+geom_bar()

3.7.1 堆栈式条图

代码语言:r
复制
injuries <- tibble(age = rep(c('Young adults',
                               'Middle-aged people',
                               'the elders'),each=5),
                   type = rep(c('Road injury','Self-harm','CVD',
                                'Cancers','Infectious Diseases'),3),
                   counts = c(214,123,69,24,17,
                              129,110,201,101,45,
                              56,32,212,189,78))
injuries <- injuries %>% mutate(age = factor(age, 
                                             levels = c('Young adults',
                                                        'Middle-aged people',
                                                        'the elders')))
injuries <- injuries %>%
  mutate(type = factor(type,levels = c('CVD','Road injury','Cancers',
                                       'Self-harm','Infectious Diseases')))
p1 <- ggplot(injuries,aes(type,weight = counts,fill = age))+
  geom_hline(yintercept = seq(100,400,100),color = 'gray')+
  geom_bar(color = 'black',width = .7,position = 'stack')+
  scale_fill_brewer(palette = 'Accent')+
  scale_y_continuous(expand = c(0,0))+
  theme_classic()+
  theme(axis.text.x = element_text(angle = 45,hjust=1))

p2 <- ggplot(injuries,aes(type,weight = counts,fill = age))+
  geom_bar(color = 'black',width = .7,position = 'fill')+
  scale_fill_brewer(palette = 'Accent')+
  scale_y_continuous(expand = c(0,0))+
  theme_classic()+
  theme(axis.text.x = element_text(angle = 45,hjust=1))

plot_grid(p1,p2,ncol=1,labels = c('A','B'))

3.7.2 条图排序,突出最重要的分组

作图前对y轴数值进行排序,order默认从小到大

代码语言:R
复制
set.seed(2019)
injuries <- tibble(
                   type = c('Heart attack','Stroke','CHD',
                                'Arrhythmia','Heart failure',
                                'Hepatitis','Malaria','HIV',
                                'TB','Influenza','HCC','CRC',
                                'Lung','Gastric','Breast',
                                'Diabetes','Hypertention',
                                'Mucolipidoses','IBD','Celiac Disease',
                                "Crohn's Disease",'Diarrhea',
                                'Autism','ADHD','Depression','Meningitis',
                                'Migraine','GBM'),
                   counts = sample(20:100,28))
injuries <- injuries %>% mutate(
  type = factor(type,levels = type[order(counts)]))
p1 <- ggplot(injuries,aes(type,weight = counts))+
  geom_hline(yintercept = seq(20,90,10),color = 'gray')+
  geom_bar(color = 'black',width = .7,fill = '#00bdaa',size = .3)+
  scale_y_continuous(expand = c(0,0))+
  theme_classic()+
  theme(axis.text.x = element_text(angle = 45,hjust=1))
injuries2 <- injuries %>% mutate(
  type = factor(type,levels = rev(type[order(counts)])))
# or:
#injuries2 <- injuries %>% mutate(
# type = factor(type,levels = type[order(counts,decreasing = T)]))
p2 <- ggplot(injuries2,aes(type,weight = counts))+
  geom_hline(yintercept = seq(20,90,10),color = 'gray')+
  geom_bar(color = 'black',width = .7,fill = '#a1dd70',size = .3)+
  scale_y_continuous(expand = c(0,0))+
  theme_classic()+
  theme(axis.text.x = element_text(angle = 45,hjust=1))
plot_grid(p1, p2, ncol = 1,labels = c('A','B'),align = c('v','h'))
当有正负值时
代码语言:r
复制
set.seed(2019)
df <- tibble(diseases = c('Heart attack','Stroke','CHD',
                          'Arrhythmia','Heart failure',
                          'Hepatitis','Malaria','HIV',
                          'TB','Influenza','HCC','CRC',
                          'Lung','Gastric','Breast',
                          'Diabetes','Hypertention',
                          'Mucolipidoses','IBD','Celiac Disease',
                          "Crohn's Disease",'Diarrhea',
                          'Autism','ADHD','Depression','Meningitis',
                          'Migraine','GBM'),
             trends = rnorm(28,0,2))
df <- df %>% mutate(
  diseases = factor(diseases,levels = rev(diseases[order(trends)])))

ggplot(df,aes(diseases,weight = trends))+
  geom_hline(yintercept = seq(-4,3,1),color = 'gray')+
  geom_bar(color = 'black',width = .6,fill = '#5bd1d7',size = .3)+
  scale_y_continuous(limits = c(-5,3.5))+
  theme_classic()+
  ylab('Temporal trends of diseases')+
  theme(axis.text.x = element_text(angle = 45,hjust=1))
瘦身--棒棒糖图:geom_segment()+geom_point()
代码语言:r
复制
ggplot(injuries,aes(y = type,x = counts))+
  geom_segment(aes(y = type, yend = type,x = 0,xend = counts),
               color = 'black')+
  geom_point(color = 'black',fill = '#28c3d4',
             size = 3,shape = 21)+
  scale_y_discrete(expand = c(0.02,0.02))+
  scale_x_continuous(expand = c(0,0.8),
                     breaks = seq(20,100,20),
                     labels = seq(20,100,20))+
  theme_light()+
  ylab('')+
  theme(
    panel.grid.major.y = element_blank(),
    panel.border = element_blank(),
    axis.ticks.y = element_blank()) 
一个分组由不同亚型组成
代码语言:R
复制
set.seed(2019)
diseases <- data.frame(type = rep(c('CVD','Infections','Cancer',
                                    'Metabolic','Digestive','CNS'),
                                  times = c(5,5,5,3,4,6)),
                       disease = c('Heart attack','Stroke','CHD',
                                    'Arrhythmia','Heart failure',
                                    'Hepatitis','Malaria','HIV',
                                    'TB','Influenza','HCC','CRC',
                                    'Lung','Gastric','Breast',
                                    'Diabetes','Hypertention',
                                    'Mucolipidoses','IBD','Celiac Disease',
                                    "Crohn's Disease",'Diarrhea',
                                    'Autism','ADHD','Depression','Meningitis',
                                    'Migraine','GBM'),
                       freqs = sample(20:60,28))
diseases2 <- diseases %>% 
  mutate(disease = 
           factor(disease,levels = disease[order(freqs,decreasing = TRUE)]))
sums <- tapply(diseases$freqs,type,sum)
diseases3 <- diseases2 %>% 
  mutate(type = factor(type,levels = 
                         names(sums)[order(sums,decreasing = T)]),
         disease = interaction(disease,type,drop = TRUE))          
ggplot(diseases3,aes(disease, weight = freqs, fill = type))+
  geom_bar(color = 'black',width = .6,size = .3)+
  scale_y_continuous(expand = c(0,0))+
  scale_x_discrete(labels = gsub('\\.[a-zA-Z]{0,20}','',
                                 levels(diseases3$disease)))+
  scale_fill_lancet()+
  theme_classic()+
  theme(axis.text.x = element_text(angle = 45,hjust=1))
杠铃图geom_segment()+geom_point)()+
代码语言:R
复制
df <- tibble(Age = rep(c('0-19','20-34','35-49','50-64','65+'),2),
             Sex = rep(c('Male','Female'),each = 5),
             Values = c(18,21,20,26,29,15,18,15,22,23))
df2 <- df %>% tidyr::spread(key = Sex, value = Values)
p2 <- ggplot()+
  geom_segment(data = df2,aes(x = Female, xend = Male, 
                              y = Age, yend = Age),
               size = 1.5,color = 'gray')+
  geom_point(data = df, aes(Values, Age, fill = Sex),
             shape = 21,size = 3)+
  labs(x = 'Values')+
  scale_fill_jco()+
  theme_light()+
  theme(legend.position = 'top',
        panel.grid.minor=element_blank(),
        panel.grid.major.y=element_blank(),
        panel.grid.major.x=element_line(),
        axis.ticks=element_blank(),
        panel.border=element_blank())
plot_grid(p1, p2, ncol = 2,labels = c('A','B'),align = c('v','h'))
圆心旋转饼图
整洁数据,将数据整理成这样的格式
整洁数据,将数据整理成这样的格式
代码语言:r
复制
data <- data.frame(
  individual=paste( "Mister ", seq(1,60), sep=""),
  group=c( rep('A', 10), rep('B', 30), rep('C', 14), rep('D', 6)) ,
  value=sample( seq(10,100), 60, replace=T)
)

empty_bar <- 4
to_add <- data.frame( matrix(NA, empty_bar*nlevels(data$group), ncol(data)) )
colnames(to_add) <- colnames(data)
to_add$group <- rep(levels(data$group), each=empty_bar)
data <- rbind(data, to_add)
data <- data %>% arrange(group)
data$id <- seq(1, nrow(data))

label_data <- data
number_of_bar <- nrow(label_data)
angle <- 90 - 360 * (label_data$id-0.5) /number_of_bar     
label_data$hjust <- ifelse( angle < -90, 1, 0)
label_data$angle <- ifelse(angle < -90, angle+180, angle)

ggplot(data, aes(x=as.factor(id), y=value, fill=group)) +     
  geom_bar(stat="identity", alpha=0.5) +
  ylim(-100,120) +
  theme_minimal() +
  theme(
    legend.position = "none",
    axis.text = element_blank(),
    axis.title = element_blank(),
    panel.grid = element_blank(),
    plot.margin = unit(rep(-1,4), "cm")) +
  coord_polar() + 
  geom_text(data=label_data, aes(x=id, y=value+10, label=individual, 
                                 hjust=hjust), color="black", 
            fontface="bold",alpha=0.6, size=2.5,
            angle= label_data$angle, inherit.aes = FALSE ) 
金字塔图:
代码语言:r
复制
populations <- tibble(age = rep(c("0-4","5-9","10-14","15-19",
                              "20-24","25-29","30-34",
                              "35-39","40-44","45-49","50-54",
                              "55-59","60-64","65-69","70-74",
                              "75-79","80-84",'85+'),2),
                      sex = rep(c('Male','Female'),each = 18),
                      pops = c(91646,65397,75560,108622,245368,
                               278821,196527,153679,153202,248852,
                               388813,350713,241516,130285,103534,
                               119681,76644,47576,
                               85376,62108,72649,107474,238128,269100,
                               192116,159126,156828,240783,378479,
                               361806,225168,123183,112859,149189,
                               105804,78370
                               ))
populations <- populations %>% mutate(
  age = factor(age, levels = c("0-4","5-9","10-14","15-19",
                               "20-24","25-29","30-34",
                               "35-39","40-44","45-49","50-54",
                               "55-59","60-64","65-69","70-74",
                               "75-79","80-84",'85+')),
  pops = pops/1e3,
  pops = ifelse(sex == 'Female',-pops, pops))
library(ggthemes)
ggplot(populations,aes(x =age,y=pops,fill=sex,width=0.8)) +
  coord_flip() +
  geom_bar(data=subset(populations,sex=="Female"),stat = "identity") +
  geom_bar(data=subset(populations,sex=="Male"), stat = "identity") +
  scale_y_continuous(breaks = seq(-400,400,length=9),
                     labels = paste0(as.character
                                     (c(abs(seq(-400,400,length=9)))), "k"), 
                     limits = c(-450, 450)) +
  theme_economist_white(horizontal = FALSE) +
  scale_fill_economist() + 
  labs(title="2010",
       y="Population",x="Age") + 
  guides(fill=guide_legend(reverse = TRUE))+
  theme(
    legend.position =c(0.8,0.9),
    legend.title = element_blank(),
    plot.title = element_text(size=20),
    plot.caption = element_text(size=12,hjust=0)
  )

3.8 离散型数据相关性可视化

韦恩图-VennDiagram包

代码语言:R
复制
install.packages('VennDiagram')
library(VennDiagram)
set.seed(2019)
SNP_pop_1=paste(rep("SNP_" , 200) , sample(c(1:1000) , 
                                           200 , replace=F) , sep="")
SNP_pop_2=paste(rep("SNP_" , 200) , sample(c(1:1000) , 
                                           200 , replace=F) , sep="")
SNP_pop_3=paste(rep("SNP_" , 200) , sample(c(1:1000) , 
                                           200 , replace=F) , sep="")
SNP_pop_4=paste(rep("SNP_" , 200) , sample(c(1:1000) , 
                                           200 , replace=F) , sep="")

venn.diagram(
  x = list(SNP_pop_1 , SNP_pop_2 , SNP_pop_3, SNP_pop_4),
  category.names = c("SNP pop 1" , "SNP pop 2 " , "SNP pop 3","SNP_pop_4"),
  filename = 'Venn.png',
  output = T ,
  lty = 'blank',
  fill = c('#07689f','#a2d5f2','#ff7e67','#10ddc2')
)

upset图-ggupset包

代码语言:R
复制
install.packages('ggupset')
library(ggupset)

热图heatmap()函数或pheatmap包

代码语言:R
复制
set.seed(2019)
df <- tibble(cancers = rep(c('Bladder','Bone','CNS','Breast','Cervix uteri',
                         'CRC','Lung','Liver','Prostate','Kidney',
                         'Eye','Skin','Thyroid','HLM','GBM'),3),
             regions = rep(c('Region A','Region B','Region C'),each = 15),
             incidence = sample(10:100,45))
#16-rank(incidence)显示肿瘤在不同regions从大到小排列,需要根据自己的数据更改
df <- df %>% group_by(regions) %>% mutate(ranks = 16-rank(incidence))

ggplot(df, aes(cancers, regions,fill = incidence))+
  geom_tile(color = 'gray20',size = .4)+
  theme_gray()+
  scale_x_discrete(expand = c(0,0)) + 
  scale_y_discrete(expand = c(0,0))+
  theme(axis.text.x = element_text(angle = 45,hjust=1))+ 
  labs(x = "", y = "") + 
  geom_text(aes(label = ranks))+
  scale_fill_gradient(low = '#fff5f0',high = '#cb181d')

3.9 文字注释:叠加文本 text或annotate()函数

1.ggrepel包中geom_text_repel()函数加标签

代码语言:R
复制
set.seed(2019)
df <- tibble(gene = paste0('gene',1:5000),
             expressions = c(rnorm(4995,3,0.5),
                             5.5,6,7,8,9),
             position = sample(1:100,5000,replace = T))

df <- df %>% mutate(expressions = ifelse(expressions<0,0.01,expressions),
                    level = ifelse(expressions>5,'High','Low'))

install.packages('ggrepel')
library(ggrepel)

ggplot(df, aes(x = position, y= expressions,
               fill = level,size = level))+
  geom_hline(yintercept = 5,color = 'gray',linetype =2)+
  geom_jitter(shape = 21, color = 'black')+
  scale_fill_manual(values = c('blue','gray'))+
  scale_size_manual(values = c(4,2))+
  geom_text_repel(data = df[df$expressions>=5,],
            aes(x = position, y= expressions,label = gene))+
  theme(legend.position = 'none')

2.gghighlight包高亮文本

代码语言:R
复制
install.packages('gghighlight')
library(gghighlight)

gghighlight_point(df,aes(position, expressions, color = level,
                         size = level),
                  expressions > 5)+
  geom_hline(yintercept = 5,color = 'gray',linetype =2)+
  scale_color_manual(values = c('blue','gray'))+
  scale_size_manual(values = c(4,2))+
  theme(legend.position = 'none')

grid包在绘图区域外添加文本

annotate()函数可以标记多种元素,如文本、矩形、散点等

代码语言:R
复制
ggplot(mtcars, aes(x = wt, y = mpg)) + 
  geom_point(shape = 21, color = 'black',fill = '#ff8a5c',size=4)+
  geom_smooth(method = lm)+
  annotate("text", x = 5, y = 32, label = "italic(R) ^ 2 == 0.75",
           parse = TRUE,size = 6, color = 'blue')+
  theme_bw()

3.10 p值标注-ggsignif, ggpval, ggstatsplot包

ggsignif

代码语言:r
复制
install.packages("ggsignif")
library(ggsignif)

set.seed(2019)
df <- tibble(group = rep(paste('Group',LETTERS[1:4]),times = c(10,8,10,13)),
             values = c(rnorm(10,15,2),rnorm(8,14,2),
                        rnorm(10,8,1),rnorm(13,5,2)))

p1 <- ggplot(df,aes(group,values))+
  geom_boxplot()+
  geom_jitter(size = 2,shape = 21,fill = 'gray40')+
  geom_signif(comparisons = list(c('Group A','Group B'),
                                 c('Group A','Group C'),
                                 c('Group A','Group D')),
              map_signif_level = FALSE, textsize=6, #TRUE标记星号
              test = t.test,
              step_increase = 0.2) +
  ylim(NA, 25)+
  theme_classic()
  plot_grid(p1,p2,ncol=2,align = 'h',labels = c('A','B'))

ggstatsplot

代码语言:r
复制
install.packages("ggstatsplot")
library(ggstatsplot)
ggbetweenstats(
  data = iris,
  plot.type = 'box',
  x = Species,
  y = Sepal.Length,
  pairwise.comparisons = T,
  mean.plotting = F,
  messages = T,
  type = 'p',
  palette = 'Set1') + 
  scale_y_continuous(breaks = seq(3, 8, by = 1))+
  theme_classic()+
  theme(legend.position = c(.11,.8))
ggstatsplot
ggstatsplot

3.11 seqlogo与进化树

seqlogo(基序图):ggseqlogo包,需要将序列转化字符串

不同位置展示该位置上碱基的构成比例,字母越大则表示该碱基占比越大。也可以用来展示蛋白序列。

代码语言:R
复制
install.packages('ggseqlogo')
library(ggseqlogo)

data(ggseqlogo_sample)
ggseqlogo( seqs_dna$MA0001.1 )
MA0001.1
MA0001.1
代码语言:R
复制
ggplot()+
  annotate("rect", xmin = 0.5, xmax = 3.5, ymin = -0.05, ymax = 1.9, 
           alpha=0.1, col="black", fill="yellow")+
  geom_logo(seqs_dna$MA0001.1, stack_width = 0.9,
            col_scheme = 'nucleotide2')+
  annotate("segment", x=4, xend = 8, y=1.2, yend = 1.2, size=2)+
  annotate("text", x=6, y=1.3, label="Text annotation")+
  theme_logo()
加标注
加标注

进化树-ggtree包

环形结构进化树

代码语言:R
复制
nwk <- system.file("extdata", "sample.nwk", package="treeio")
tree <- read.tree(nwk)
circ <- ggtree(tree, layout = "circular")
df <- data.frame(first=c("a", "b", "a", "c", "d", "d", 
                         "a", "b", "e", "e", "f", "c", "f"),
                 second= c("z", "z", "z", "z", "y", "y", 
                           "y", "y", "x", "x", "x", "a", "a"))
rownames(df) <- tree$tip.label

set.seed(2019)
df2 <- as.data.frame(matrix(rnorm(39), ncol=3))
rownames(df2) <- tree$tip.label
colnames(df2) <- LETTERS[1:3]

p1 <- gheatmap(circ, df, offset=.8, width=.2,
               colnames_angle=95, colnames_offset_y = .25) +
  scale_fill_viridis_d(option="D", name="discrete\nvalue")

library(ggnewscale)
p2 <- p1 + new_scale_fill()
p3 <- gheatmap(p2, df2, offset=15, width=.3,
         colnames_angle=90, colnames_offset_y = .25) +
  scale_fill_viridis_c(option = 'C',name="continuous\nvalue")

plot_grid(p1,p3,ncol = 2,labels = c('A','B'))
环形结构进化树
环形结构进化树

和玹图-chordDiagram()函数,输入数据为数据框或矩阵

代码语言:R
复制
library(circlize)

set.seed(2019)

numbers <- sample(c(1:1000), 100, replace = T)
data <- matrix( numbers, ncol=5)
rownames(data) <- paste0("Set-", seq(1,20))
colnames(data) <- paste0("Pair-", seq(1,5))

chordDiagram(data, transparency = 0.5)

元素增多

代码语言:r
复制
circos.par("track.height" = 0.4)
set.seed(2019)
data = data.frame(
  factor = sample(letters[1:8], 1000, replace = TRUE),
  x = rnorm(1000),
  y = runif(1000)
)


circos.initialize( factors=data$factor, x=data$x )

circos.trackPlotRegion(factors = data$factor, y = data$y, 
                       panel.fun = function(x, y) {
                         circos.axis()
                       })
circos.trackPoints(data$factor, data$x, data$y, col="deeppink",pch = 16)
circos.trackHist(data$factor, data$x, col="#69b3a2")

4.配色-RColorBrewer, ggsci, paletteer包

配色网站:https://colorbrewer2.org/

4.1离散色-离散变量上色

ggsci包匹配期刊配色,但不一定符合杂志最新配色要求

RColorBrewer包集成到ggplot2中,使用scale_fill_brewer()函数即可。推荐使用Set2,Set3,Accent配色板

RColorBrewer包配色板
RColorBrewer包配色板

paletteer包含25个包,869种配离散色

代码语言:R
复制
devtools::install_github("EmilHvitfeldt/paletteer")
library(paletteer)

p1 <- ggplot(dsamp, aes(carat, price, fill = cut)) +
    geom_point(shape = 21, color = 'black', size = 3) +
    scale_fill_paletteer_d("awtools::mpalette") +
    theme_classic() +
    labs(title = 'awtools-mpalette', x = '', y = '') +
    theme(legend.position = c(.15,.8),
          legend.background = element_rect(fill = NA),
          plot.title = element_text(hjust = 0.5))
p1
paletteer包配色
paletteer包配色

scale_fill_manual()函数自定义

4.2 连续色-paletteer包中scale_fill_paletteer_c()函数即可调用。

5.主题选择与整体布局

展示数据大小,选择网格背景 ,方便比较大小;

展示数据相关性,选择空白背景的test和classic主题。

代码语言:r
复制
set.seed(2019)
dsamp <- diamonds[sample(nrow(diamonds), 1000), ]

P1 <- ggplot(dsamp,aes(carat, price, fill = cut))+
  geom_point(shape = 21, color = 'black', size = 3)+
  scale_fill_manual(name = '',
                    values = c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99'))+
  labs(title = 'default theme',x='',y='')+
  theme(legend.position = c(.15,.8),
        legend.background = element_rect(fill = NA,color = NA),
        plot.title = element_text(hjust = 0.5))

P2 <- ggplot(dsamp,aes(carat, price, fill = cut))+
  geom_point(shape = 21, color = 'black', size = 3)+
  scale_fill_manual(name = '',
                    values = c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99'))+
  labs(title = 'bw theme',x='',y='')+
  theme_bw()+
  theme(legend.position = c(.15,.8),
        legend.background = element_rect(fill = NA,color = NA),
        plot.title = element_text(hjust = 0.5))
主题选择
主题选择

5.2 图例位置、大小应与插图搭配-theme()函数

代码语言:R
复制
set.seed(2019)
id <- 1:10
df <- tibble(x = rep(id,2),
             y = c(id*1.8+rnorm(10,0,1),
                   x=id*3+rnorm(10,0,1)),
             group = rep(c('Group A','Group B'),each = 10))

p1 <- ggplot(df, aes(x,y,fill = group,group = group))+
  geom_line()+
  geom_point(shape = 21, color = 'gray40', size = 3)+
  theme_classic()

p2 <- ggplot(df, aes(x,y,fill = group,group = group))+
  geom_line()+
  geom_point(shape = 21, color = 'gray40', size = 3)+
  theme_classic()+
  theme(legend.position = c(.15,.85))图例位置
图例位置
图例位置

5.3多面板图facet_wrap(~clarity,ncol = 3)分三列 面板

5.4元素大小

散点图geom_point()函数中size参数调整

柱状图geom_bar()函数中width参数调整

scale_size_continuous()函数调整size属性

6.细节修饰-标度函数

7.图片拼接与导出

7.1对齐与拼接

plot_grid()函数拼接,align = c('v','h')参数纵 横向对齐

patchwork包-支持多层嵌套拼图,直接+相连

7.2导出-'Cairo'包,export包(导出PPT)

代码语言:R
复制
CairoPDF('file_name.pdf',height = 6,width = 6)
graph2ppt(object)

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。

如有侵权,请联系 cloudcommunity@tencent.com 删除。

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 一、前言
  • 二、ggplot2的绘图原理
    • 2.1 散点图
    • 三、常见图形绘制
      • 3.1 单个连续型变量
        • 3.2 两个或多个连续型变量
          • 3.3 连续型变量绘图优化
            • 气泡图
            • 二维密度图:适合大数据量,stat_density_2d()函数
            • 散点+密度图:ggExtra包
            • 层峦叠峰:ggridges包,geom_density_ridges()函数
          • 3.4 两个连续型变量相关性图-ggcorrplot包
            • 3.5 连续型变量趋势图
              • 3.5.1线图:geom_line()+geom_point()函数
              • 3.5.2配比坡度图
              • 3.5.3面积图 geom_area()函数
              • 3.5.4平滑曲线 geom_smooth()函数
              • 3.5.5生存曲线 survminer,survival包
              • 3.5.6三元图 ggtern包
              • 3.5.7 雷达图:ggradar包
            • 3.6 离散型数据构成可视化
              • 3.6.1.饼图:ggplot中由条形图变形而来,条图和坐标系转换而成
              • 3.6.2玫瑰图:饼图x轴进行真实变量映射
              • 3.6.3戒指图:设置条柱的宽度width及x轴取值范围
              • 3.6.4.树图
            • 3.7 离散型数据分布: geom_hline()+geom_bar()
              • 3.7.1 堆栈式条图
              • 3.7.2 条图排序,突出最重要的分组
            • 3.8 离散型数据相关性可视化
              • 韦恩图-VennDiagram包
              • upset图-ggupset包
              • 热图heatmap()函数或pheatmap包
            • 3.9 文字注释:叠加文本 text或annotate()函数
              • 1.ggrepel包中geom_text_repel()函数加标签
              • 2.gghighlight包高亮文本
              • grid包在绘图区域外添加文本
              • annotate()函数可以标记多种元素,如文本、矩形、散点等
            • 3.10 p值标注-ggsignif, ggpval, ggstatsplot包
              • 3.11 seqlogo与进化树
                • seqlogo(基序图):ggseqlogo包,需要将序列转化字符串
                • 进化树-ggtree包
              • 和玹图-chordDiagram()函数,输入数据为数据框或矩阵
              • 4.配色-RColorBrewer, ggsci, paletteer包
                • 4.1离散色-离散变量上色
                  • 4.2 连续色-paletteer包中scale_fill_paletteer_c()函数即可调用。
                  • 5.主题选择与整体布局
                    • 5.2 图例位置、大小应与插图搭配-theme()函数
                      • 5.3多面板图facet_wrap(~clarity,ncol = 3)分三列 面板
                        • 5.4元素大小
                        • 6.细节修饰-标度函数
                        • 7.图片拼接与导出
                          • 7.1对齐与拼接
                            • plot_grid()函数拼接,align = c('v','h')参数纵 横向对齐
                            • patchwork包-支持多层嵌套拼图,直接+相连
                          • 7.2导出-'Cairo'包,export包(导出PPT)
                          相关产品与服务
                          图数据库 KonisGraph
                          图数据库 KonisGraph(TencentDB for KonisGraph)是一种云端图数据库服务,基于腾讯在海量图数据上的实践经验,提供一站式海量图数据存储、管理、实时查询、计算、可视化分析能力;KonisGraph 支持属性图模型和 TinkerPop Gremlin 查询语言,能够帮助用户快速完成对图数据的建模、查询和可视化分析。
                          领券
                          问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档