PyComplexHeatmap 是一个用于绘制复杂热图(聚类热图)的 Python 包,专为生物数据设计。这个模块的官网如下:
https://dingwb.github.io/PyComplexHeatmap/build/html/index.html
教程中所有的示例数据都在这里可以找到:https://github.com/DingWB/PyComplexHeatmap/tree/main/data
插播:我们生信技能树每个月都有一期带领初学者,0基础的生信入门培训,会有各种贴心的答疑,最新一期在8月4号,感兴趣的可以去看看呀:生信入门&数据挖掘线上直播课8月班。
安装在我的conda小环境sc中:
## bash 命令
conda activate sc
pip install PyComplexHeatmap -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
# 创建juper运行的脚本
touch PyComplexHeatmap.ipynb
导入模块检验是否安装成功:
import os,sys
%matplotlib inline
import matplotlib.pylab as plt
import numpy as np
import pandas as pd
plt.rcParams['figure.dpi'] = 100
plt.rcParams['savefig.dpi']=300
plt.rcParams['font.family']='sans serif' #please remove this line if font is not installed
plt.rcParams['font.sans-serif']='Arial' # please remove this line if Arial font is not installed
plt.rcParams['pdf.fonttype']=42
# sys.path.append(os.path.expanduser("~/Projects/Github/PyComplexHeatmap/"))
import PyComplexHeatmap as pch
print(pch.__version__)
成功,返回的版本为1.8.2~
# Generate example dataset (random)
df = pd.DataFrame(['GroupA'] * 5 + ['GroupB'] * 5, columns=['AB'])
df['CD'] = ['C'] * 3 + ['D'] * 3 + ['G'] * 4
df['EF'] = ['E'] * 6 + ['F'] * 2 + ['H'] * 2
df['F'] = np.random.normal(0, 1, 10)
df.index = ['sample' + str(i) for i in range(1, df.shape[0] + 1)]
df
10行,4列,是一个数据框。
plt.figure(figsize=(6, 8))
col_ha = pch.HeatmapAnnotation(df=df,plot=True,legend=True,legend_gap=5,hgap=0.5,axis=1)
plt.show()
配色真的是丑啊~
还可以选择将整个数据框 df
传递给 HeatmapAnnotation
,或者分别将 df
的每一列传递给不同的注释函数。
基因表达箱线图数据:
df_box = pd.DataFrame(np.random.randn(10, 4), columns=['Gene' + str(i) for i in range(1, 5)])
df_box.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_box
箱线图的数据:
df_bar = pd.DataFrame(np.random.uniform(0, 10, (10, 2)), columns=['TMB1', 'TMB2'])
df_bar.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_bar
散点图数据:
df_scatter = pd.DataFrame(np.random.uniform(0, 10, 10), columns=['Scatter'])
df_scatter.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_scatter
箱线图bar1数据:
df_bar1 = pd.DataFrame(np.random.uniform(0, 10, (10, 2)), columns=['T1-A', 'T1-B'])
df_bar1.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_bar1
箱线图bar4数据:
df_bar4 = pd.DataFrame(np.random.uniform(0, 10, (10, 1)), columns=['T4'])
df_bar4.index = ['sample' + str(i) for i in range(1, df_box.shape[0] + 1)]
df_bar4.iloc[7,0]=np.nan
df_bar4
以上这些数据都有一个共同的行名:sample id
plt.figure(figsize=(5, 4))
col_ha = pch.HeatmapAnnotation(label=pch.anno_label(df.AB, merge=True,rotation=15),
AB=pch.anno_simple(df.AB,add_text=True,legend=True), axis=1,
CD=pch.anno_simple(df.CD, add_text=True,legend=True,text_kws={'color':'black'}),
Exp=pch.anno_boxplot(df_box, cmap='turbo',legend=True),
Scatter=pch.anno_scatterplot(df_scatter,grid=True),
Bar1=pch.anno_barplot(df_bar1,legend=True,cmap='Dark2'),
Bar4=pch.anno_barplot(df_bar4,legend=True,cmap='turbo'),
plot=True,legend=True,legend_gap=5,hgap=0.5)
col_ha.show_ticklabels(df.index.tolist(),fontdict={'color':'blue'},rotation=-30)
plt.show()
结果如下:
先生成一个随机数据:
df_heatmap = pd.DataFrame(np.random.randn(30, 10), columns=['sample' + str(i) for i in range(1, 11)])
df_heatmap.index = ["Fea" + str(i) for i in range(1, df_heatmap.shape[0] + 1)]
df_heatmap.iloc[1, 2] = np.nan
df_heatmap.head()
绘图:
每列按照df.AB里面的2分组分割成2部分,每行按照行聚类结果,自聚类成2部分。
plt.figure(figsize=(5, 6))
cm = pch.ClusterMapPlotter(data=df_heatmap,
col_cluster=True,row_cluster=True,
col_split=df.AB,row_split=2,
col_split_gap=0.5,row_split_gap=0.8,
label='values',row_dendrogram=True,
show_rownames=True,show_colnames=True,
row_names_side='right',yticklabels_kws=dict(right=True),
tree_kws={'row_cmap': 'Set1','colors':'blue'},verbose=0,legend_gap=5,
cmap='RdYlBu_r',xticklabels_kws={'labelrotation':-90,'labelcolor':'blue'})
plt.savefig("example0.pdf", bbox_inches='tight')
plt.show()
结果如下:这里设置的颜色比上面的好看多了~
将上面两部分的内容合并起来,即在热图上加上注释条信息:
先给每行的Fea变量一个分组信息:
# 给每行一个分组信息
df_rows = df_heatmap.apply(lambda x:x.name if x.sample4 > 0.5 else None,axis=1)
df_rows=df_rows.to_frame(name='Selected')
df_rows['XY']=df_rows.index.to_series().apply(lambda x:'A' if int(x.replace('Fea',''))>=15 else 'B')
df_rows.head()
行列注释:
# 列注释
col_ha = pch.HeatmapAnnotation(label=pch.anno_label(df.AB, merge=True,rotation=15),
AB=pch.anno_simple(df.AB,add_text=True,legend=True), axis=1,
CD=pch.anno_simple(df.CD, add_text=True,legend=True,text_kws={'color':'black'}),
Exp=pch.anno_boxplot(df_box, cmap='turbo',legend=True),
Scatter=pch.anno_scatterplot(df_scatter),
Bar1=pch.anno_barplot(df_bar1,legend=True,cmap='Dark2'),
Bar4=pch.anno_barplot(df_bar4,legend=True,cmap='turbo'),
legend=True,legend_gap=5,hgap=0.5)
# 行注释
row_ha = pch.HeatmapAnnotation(
Scatter=pch.anno_scatterplot(df_heatmap.sample4.apply(lambda x:round(x,2)),
height=15,cmap='jet',legend=False),
Bar=pch.anno_barplot(df_heatmap.sample4.apply(lambda x:round(x,2)),
height=15,cmap='rainbow',legend=False),
selected=pch.anno_label(df_rows,colors='red',relpos=(-0.05,0.4)),
label_kws={'rotation':15,'horizontalalignment':'left','verticalalignment':'bottom'},
axis=0,verbose=0)
绘制最终结果:
plt.figure(figsize=(7.5, 8))
cm = pch.ClusterMapPlotter(data=df_heatmap, top_annotation=col_ha,right_annotation=row_ha,
col_cluster=True,row_cluster=True,
col_split=df.AB,row_split=2,
col_split_gap=0.5,row_split_gap=0.8,
label='values',row_dendrogram=True,
show_rownames=False,show_colnames=True,
tree_kws={'row_cmap': 'Set1'},verbose=0,legend_gap=5,
cmap='RdYlBu_r',xticklabels_kws={'labelrotation':-90,'labelcolor':'blue'})
plt.savefig("example0.pdf", bbox_inches='tight')
plt.show()
这些图配色都有些丑,下期分享如何调整配色~