Plotly是新一代的Python数据可视化开发库,它提供了完善的交互能力和灵活的绘制选项。本文将介绍新手如何安装plotly并编写第一个plotly绘图程序,以及使用plotly绘制常见的5种数据图表。
与Matplotlib和Seaborn相比,Plotly将数据可视化提升到一个新的层次。Plotly内置完整的交互能力及编辑工具,支持在线和离线模式,提供稳定的API以便与现有应用集成,既可以在web浏览器中展示数据图表,也可以存入本地拷贝。Plotly唯一的缺点是太灵活,提供了太多的可选项。
学编程,上汇智网,在线练习环境,一对一助教答疑。
Plotly是运行在JSON格式上的平台,在Python中我们可以使用plot.ly包来访问这个API。打开一个终端软后输入以下命令安装plotly:
~$ pip install plotly
Plotly的图表使用在线web服务托管,因此你需要首先创建一个在线账户来保存你的图表。要提取你的个人API KEY请访问这个链接:https://plot.ly/settings/api#/。拿到API KEY之后就可以使用set_credential_files()
函数来初始化,例如:
import plotly
plotly.tools.set_credentials_file(
username=’YourAccountName’, # 账户名
api_key=’YourAPIKey’ # api key
)
正如之前所述,plotly的可视化建立在JSON数据结构之上。
import plotly.plotly as py # 用来与plotly服务器通信
import plotly.graph_objs as go # 用来生成图形对象
graph_objs类包含了一些通用的数据结构,在不同的可视化类型中保持一致。
我们先从trace开始,这是包含数据和绘制指令的单独一层,下面展示了trace结构的一个示例:
trace1 = {
"x": ["2017-09-30", "2017-10-31", "2017-11-30", ...],
"y": [327900.0, 329100.0, 331300.0, ...],
"line": {
"color": "#385965",
"width": 1.5
},
"mode": "lines",
"name": "Hawaii",
"type": "scatter",
}
如你所见,trace是一个字典,其中保存了要绘制的数据,以及颜色、线性等方面的绘制信息。
我们可以用列表组织多个trace,这个列表就成为data。trace在data中的顺序就决定了它们在最终的图表中的摆放顺序。一个典型的data看起来像这样:
data = [trace1, trace2, trace3, trace4]
layout用来设置数据图表的布局,这包含例如标题、轴标题、字体等方面的显示特征。和trace一样,layout也是一个字典对象:
layout = {
"showlegend": True,
"title": {"text": "Zillow Home Value Index for Top 5 States"},
"xaxis": {
"rangeslider": {"visible": True},
"title": {"text": "Year from 1996 to 2017"},
"zeroline": False
},
"yaxis": {
"title": {"text": "ZHVI BottomTier"},
"zeroline": False
}
}
最后,我们可以使用go.Figure()
方法来编译data和layout,结果将传给我们选择的绘图函数。
fig = go.Figure(data = data, layout = layout)
下面的代码绘制条形图:
#Bar Chart
#Mean house values by Bedrooms type and year
import plotly.graph_objs as go
import plotly.plotly as py
trace1 = go.Bar(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_1bedroom,
name = "ZHVI_1bedroom",
marker = dict(color = 'rgb(102,255,255)'),
text = df_groupby_datebr['ZHVI_1bedroom'])
trace2 = go.Bar(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_2bedroom,
name = "ZHVI_2bedroom",
marker = dict(color = 'rgb(102,178,255)'),
text = df_groupby_datebr['ZHVI_2bedroom'])
trace3 = go.Bar(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_3bedroom.values,
name = "ZHVI_3bedroom",
marker = dict(color = 'rgb(102,102,255)'),
text = df_groupby_datebr['ZHVI_3bedroom'])
trace4 = go.Bar(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_4bedroom.values,
name = "ZHVI_4bedroom",
marker = dict(color = 'rgb(178, 102, 255)'),
text = df_groupby_datebr['ZHVI_4bedroom'])
trace5 = go.Bar(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_5BedroomOrMore.values,
name = "ZHVI_5BedroomOrMore",
marker = dict(color = 'rgb(255, 102, 255)'),
text = df_groupby_datebr['ZHVI_5BedroomOrMore'])
data = [trace1, trace2, trace3, trace4, trace5]
layout = go.Layout(barmode = "group", title="Bar Chart: Mean House Values by Bedrooms and Year",
xaxis= dict(title= 'Year',ticklen= 5,zeroline= False),
yaxis= dict(title= 'Mean House Values',ticklen= 5,zeroline= False))
fig = go.Figure(data = data, layout = layout)
url = py.plot(fig, validate=False)
使用go.Bar()
创建一个条形图类型的图表。使用go.Layout()
函数,我们可以指定一些重要的信息,例如barmode = "group"
可以按年度分组不同的bar等等。
下面的代码绘制线形图:
#Line Plot
#Mean house values by bedrooms and year
trace1 = go.Scatter(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_1bedroom,
mode = "lines+markers",
name = "ZHVI_1bedroom",
marker = dict(color = 'rgb(102,255,255)'),
text = df_groupby_datebr['ZHVI_1bedroom'])
trace2 = go.Scatter(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_2bedroom,
mode = "lines+markers",
name = "ZHVI_2bedroom",
marker = dict(color = 'rgb(102,178,255)'),
text = df_groupby_datebr['ZHVI_2bedroom'])
trace3 = go.Scatter(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_3bedroom.values,
mode = "lines+markers",
name = "ZHVI_3bedroom",
marker = dict(color = 'rgb(102,102,255)'),
text = df_groupby_datebr['ZHVI_3bedroom'])
trace4 = go.Scatter(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_4bedroom.values,
mode = "lines+markers",
name = "ZHVI_4bedroom",
marker = dict(color = 'rgb(178, 102, 255)'),
text = df_groupby_datebr['ZHVI_4bedroom'])
trace5 = go.Scatter(
x = df_groupby_datebr.index.values,
y = df_groupby_datebr.ZHVI_5BedroomOrMore.values,
mode = "lines+markers",
name = "ZHVI_5BedroomOrMore",
marker = dict(color = 'rgb(255, 102, 255)'),
text = df_groupby_datebr['ZHVI_5BedroomOrMore'])
data = [trace1, trace2, trace3, trace4, trace5]
layout = go.Layout(title = 'Line Plot: Mean House Values by Bedrooms and Year',
xaxis= dict(title= 'Year',ticklen= 5,zeroline= False),
yaxis= dict(title= 'Mean House Values',ticklen= 5,zeroline= False))
fig = go.Figure(data = data, layout = layout)
url = py.plot(fig, validate=False)
使用go.Scatter()
初始化线形图trace。我们可以使用mode参数来修改标记模式。例如:
mode = "lines+markers"
下面的代码绘制时序线图:
#Time Series Line Chart
state_list = df_state.groupby('RegionName')[['ZHVI_BottomTier']].mean().sort_values(
by='ZHVI_BottomTier', ascending=False)[:5].index.values.tolist()
colors = dict(zip(state_list, sns.color_palette("GnBu_d", len(state_list)).as_hex()))
trace_list = []
for state in state_list:
trace = go.Scatter(
y=df_state[df_state['RegionName']==state]['ZHVI_BottomTier'].tolist(),
x=df_state[df_state['RegionName']==state]['Date'].tolist(),
mode='lines',
name=state,
line = dict(
color = colors[state],
width = 1.5,
# dash = 'dot'
)
)
trace_list.append(trace)
layout = go.Layout(
xaxis=dict(title='Year from 1996 to 2017', zeroline=False, rangeslider=dict(visible=True)),
yaxis=dict(title='ZHVI BottomTier', zeroline=False),
title='Zillow Home Value Index for Top 5 States',
showlegend=True,
)
fig = go.Figure(data=trace_list, layout=layout)
url = py.plot(fig, validate=False, filename='ZHVI BottomTier')
这里我们添加了一个范围滑杆来调节我们可以在主图中包含的数据。我们也使用一个字典为每种状态设置不同的颜色。为此我们使用了seaborn的color_palette()函数。由于plotly不支持RGB元组,我们可以使用as_hex()
函数将其转换为16进制代码。
下面的代码绘制多个散列图:
#Multiple Scatter Plots
from plotly import tools
trace1 = go.Scatter(x=df_sts.MedianListingPrice_1Bedroom,
y=df_sts.MedianListingPrice_2Bedroom, mode='markers',
name = "1Bedroom&2Bedroom", marker = dict(
color = 'rgb(102,255,255)'))
trace2 = go.Scatter(x=df_sts.MedianListingPrice_2Bedroom,
y=df_sts.MedianListingPrice_3Bedroom, mode='markers',
name = "2Bedroom&3Bedroom", marker = dict(
color = 'rgb(102,178,255)'))
trace3 = go.Scatter( x=df_sts.MedianListingPrice_3Bedroom,
y=df_sts.MedianListingPrice_4Bedroom, mode='markers',
name = "3Bedroom&4Bedroom", marker = dict(
color = 'rgb(102,102,255)'))
trace4 = go.Scatter(x=df_sts.MedianListingPrice_4Bedroom,
y=df_sts.MedianListingPrice_5BedroomOrMore, mode='markers',
name = "4Bedroom&5+Bedroom", marker = dict(
color = 'rgb(178, 102, 255)'))
fig = tools.make_subplots(rows=2, cols=2, subplot_titles=("1Bedroom & 2Bedroom", "2Bedroom & 3Bedroom",
"3Bedroom&4Bedroom", "4Bedroom&5+Bedroom"))
fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 2)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 2, 2)
fig['layout']['xaxis3'].update(title='Median Listing Price')#showgrid=False
fig['layout']['xaxis4'].update(title='Median Listing Price')
fig['layout']['yaxis1'].update(title='Median Listing Price')
fig['layout']['yaxis3'].update(title='Median Listing Price')
fig['layout'].update(height=600, width=600, title='Mutiple Scatter Plots: Median Listing Price of' +
' Bedrooms')
url = py.plot(fig, validate=False)
要创建这个布局,我们没有将traces添加到单一字典,而是使用make_subplots()
函数创建不同的子图,然后使用append_trace()
将trace添加到指定的位置。
使用下面的代码绘制等值线图:
#Choropleth
ZHVI_state_year = df_state.groupby(['RegionName','year'])[['MedianListingPricePerSqft_AllHomes']].mean()
ZHVI_county_year = df_county.groupby(['RegionName','year'])[['MedianListingPricePerSqft_AllHomes']].mean()
ZHVI_state_2017 = df_state[df_state.year==2017].groupby(['RegionName'])[['MedianListingPricePerSqft_AllHomes']].mean()
ZHVI_county_2017 = df_county[df_county.year==2017].groupby(['RegionName'])[['MedianListingPricePerSqft_AllHomes']].mean()
#%%
values = ZHVI_county_2017['MedianListingPricePerSqft_AllHomes'].tolist()
fips = ZHVI_county_2017['MedianListingPricePerSqft_AllHomes'].index.tolist()
ZHVI_county_2017['MedianListingPricePerSqft_AllHomes'].describe()
colorscale = [
'rgb(102,255,255)',
'rgb(102,178,255)',
'rgb(102,102,255)',
'rgb(178, 102, 255)',
]
fig = ff.create_choropleth(
fips=fips, values=values, scope=['usa'],
binning_endpoints=[80.9, 102.8, 135.5], colorscale=colorscale,
title='United States', legend_title='ZHVI_BottomTier by County'
)
print('done')
url = py.plot(fig, validate=False, filename='ZHVI_cities')
对于等值线图,我们可以利用图表工厂类走个捷径,工厂类包含了一组用于创建复杂图表的快捷函数。
import plotly.figure_factory as ff
在ff.create_choropleth()
调用时,我们传入一组FIPS值,或每个国家、城市或 州的地理标识代码。