Python数据分析:基于Plotly的动态可视化绘图
上QQ阅读APP看本书,新人免费读10天
设备和账号都新为新人

2.5 线形图

2.5.1 基本案例

线形图的绘制在散点图的绘制中提及过,用Plotly绘制线形图使用Scatter函数。如图2-10所示是线形图的简单实现,见文件2.5_LineChart_1.py。本案例使用Pandas生成时间序列作为横轴标签,对浦发银行2017年3月1日—2017年4月28日的股价涨跌幅进行了展现,数据来源是Wind数据库。

图2-10 基本线形图

        # 2.5-1 基本案例
        import plotly as py
        import plotly.graph_objs as go

        # Basic Line
        pyplt = py.offline.plot
        # 600000浦发银行20170301-20170428涨跌幅度数据,数据来源Wind
        profit_rate = [-0.001, -0.013, -0.004, 0.002, 0.003, -0.001, -0.009, 0.0, \
    0.007,
        -0.005, 0.0, 0.001, -0.006, -0.006, -0.009, -0.013, 0.005, 0.007, \
            0.004, -0.006, -0.009, -0.004, 0.015, 0.007, 0.001, 0.003, -0.009, \
            -0.005, 0.001, -0.008, -0.016, 0.002, -0.013, -0.009, -0.014, 0.009, \
            -0.003, 0.002, -0.001, 0.011, 0.004]
        date = pd.date_range(start = '3/1/2017', end = '4/30/2017')
        trace = [go.Scatter(
            x = date,
            y = profit_rate
        )]
        layout = dict(
          title = ’浦发银行20170301-20170428涨跌幅变化’,
          xaxis = dict(title = 'Date'),
          yaxis = dict(title = 'profit_rate')
      )

      fig = dict(data = trace, layout = layout)
      pyplt(fig, filename='tmp/basic-line.html')

2.5.2 数据缺口与连接

在实际应用过程中,数据集往往并不完美,可能有缺失的数据,在Plotly中可以通过设置Scatter函数中的connectgaps属性来显示这些数据缺口或对缺口进行连接。如图2-11所示是在官方案例的基础上进行的调整,包含了多条线形图的绘制、线条样式设置,以及数据缺口保留与连接的控制,见文件2.5_LineChart_2.py。

图2-11 线形图缺失数据展示与连接

该案例的代码如下。

        # 2.5-2 应用案例
        # Average High and Low Temperatures in New York
        import plotly as py
        import plotly.graph_objs as go
        pyplt = py.offline.plot
        month = ['January', 'February', 'March', 'April', 'May', 'June', 'July',
                'August', 'September', 'October', 'November', 'December'] # x
    轴坐标
        high_2000 = [32.5, 37.6, 49.9, 53.0, None, 75.4, 76.5, 76.6, 70.7, 60.6,
    45.1, 29.3]
        low_2000 = [13.8, 22.3, 32.5, 37.2, None, 56.1, 57.7, 58.3, 51.2, 42.8,
    31.6, 15.9]
        high_2007 = [36.5, 26.6, 43.6, 52.3, None, 81.4, 80.5, 82.2, 76.0, 67.3,
    46.1, 35.0]
        low_2007 = [23.6, 14.0, 27.0, 36.8, None, 57.7, 58.9, 61.2, 53.3, 48.5,
    31.0, 23.6]
        high_2014 = [28.8, 28.5, 37.0, 56.8, None, 79.7, 78.5, 77.8, 74.1, 62.6,
    45.3, 39.9]
        low_2014 = [12.7, 14.3, 18.6, 35.5, None, 58.0, 60.0, 58.6, 51.7, 45.2,
    32.2, 29.1]
        # 6组数据
        # Create and style traces
        trace0 = go.Scatter(
            x = month,
            y = high_2014,
            name = 'High 2014',
            line = dict(
              color = ('rgb(205, 12, 24)'),
              width = 4),
            connectgaps = True
        )
        trace1 = go.Scatter(
            x = month,
            y = low_2014,
            name = 'Low 2014',
            line = dict(
              color = ('rgb(22, 96, 167)'),
              width = 4, ),
            connectgaps = False
    )
    trace2 = go.Scatter(
        x = month,
        y = high_2007,
        name = 'High 2007',
        line = dict(
          color = ('rgb(205, 12, 24)'),
          width = 4,
          dash = 'dash'),
        connectgaps = False
    )
    # dash虚线(短线), dot虚线(点), dashdot
    trace3 = go.Scatter(
        x = month,
        y = low_2007,
        name = 'Low 2007',
        line = dict(
          color = ('rgb(22, 96, 167)'),
          width = 4,
          dash = 'dash'),
        connectgaps = False
    )
    trace4 = go.Scatter(
        x = month,
        y = high_2000,
        name = 'High 2000',
        line = dict(
          color = ('rgb(205, 12, 24)'),
          width = 4,
          dash = 'dot'),
        connectgaps = False
    )
    trace5 = go.Scatter(
        x = month,
        y = low_2000,
        name = 'Low 2000',
          line = dict(
              color = ('rgb(22, 96, 167)'),
              width = 4,
              dash = 'dot'),
          connectgaps = False
      )
      data = [trace0, trace1, trace2, trace3, trace4, trace5]

      # Edit the layout
      layout = dict(title = 'Average High and Low Temperatures in New York',
                  xaxis = dict(title = 'Month'),
                  yaxis = dict(title = 'Temperature (degrees F)'),
                  )

      fig = dict(data=data, layout=layout)
      pyplt(fig, filename='tmp/styled-line.html')

在数据部分,原先的缺失数据被设置为None。在Scatter函数中,设置connectgaps属性为Fasle,表示不连接,显示数据缺口;设置connectgaps属性为True,表示连接缺失值左右相邻的数据点。在图2-11中,对“High 2014”线形图进行了连接,其他线条则采用显示缺口的形式。

Scatter函数中的line属性用于对线形图的样式进行控制;color用于设置颜色;width用于设置宽度;dash用于设置类型,dash表示由短线组成的虚线,dot表示由点组成的虚线,dashdot表示由点和短线组成的虚线。

2.5.3 数据插值

通过调整Scatter函数line属性中的shape值可以对插值的方法进行控制,完成数据点的插值设置。插值的方法简单来说就是根据已有的零散数据点,找到一条满足一定条件的曲线,使之经过全部的数据点。Plotly提供的插值方法有6种,分别是’linear'、'spline'、'hv'、'vh'、'hvh’和’vhv'。例如,设置shape='spline',表示通过三次样条方法对数据点进行插值。图2-12所示为官方案例,展示了6种不同的插值方法,见文件2.5_LineChart_3.py。

图2-12 不同插值方法的对比

该案例的代码如下。

    # 2.5-3 应用案例
    import plotly as py
    import plotly.graph_objs as go
    pyplt = py.offline.plot
    trace1 = go.Scatter(
        x=[1, 2, 3, 4, 5],
        y=[1, 3, 2, 3, 1],
        mode='lines+markers',
        name="'linear'",
        hoverinfo='name',
        line=dict(
          shape='linear'
        )
    )
    trace2 = go.Scatter(
        x=[1, 2, 3, 4, 5],
        y=[6, 8, 7, 8, 6],
        mode='lines+markers',
        name="'spline'",
        text=["tweak line smoothness<br>with 'smoothing' in line object"],
        hoverinfo='text+name',
        line=dict(
          shape='spline'
        )
    )
    trace3 = go.Scatter(
        x=[1, 2, 3, 4, 5],
        y=[11, 13, 12, 13, 11],
        mode='lines+markers',
        name="'vhv'",
        hoverinfo='name',
        line=dict(
          shape='vhv'
        )
    )
    trace4 = go.Scatter(
        x=[1, 2, 3, 4, 5],
        y=[16, 18, 17, 18, 16],
        mode='lines+markers',
        name="'hvh'",
        hoverinfo='name',
        line=dict(
          shape='hvh'
        )
    )
    trace5 = go.Scatter(
        x=[1, 2, 3, 4, 5],
        y=[21, 23, 22, 23, 21],
        mode='lines+markers',
        name="'vh'",
        hoverinfo='name',
        line=dict(
          shape='vh'
        )
    )
    trace6 = go.Scatter(
        x=[1, 2, 3, 4, 5],
        y=[26, 28, 27, 28, 26],
        mode='lines+markers',
        name="'hv'",
        hoverinfo='name',
        line=dict(
              shape='hv'
          )
        )
        data = [trace1, trace2, trace3, trace4, trace5, trace6]
        layout = dict(
          legend=dict(
              y=0.5,
              traceorder='reversed',
              font=dict(
                  size=16
              )
          )
        )
        fig = dict(data=data, layout=layout)
        pyplt(fig, filename='tmp/line-shapes.html')

2.5.4 填充线形图

填充线形图是线形图的一种衍生,通过选择性地显示线条和对线条图进行填充来完成。如图2-13所示展示了恒宝股份、湘潭电化、大港股份的股票在一段时期内开盘的最高价与最低价,每条可见线条对应股票的开盘价,线条的上影线对应当天的最高价,线条的下影线对应当天的最低价,见文件2.5_LineChart_4.py。

图2-13 填充线形图

要绘制这样一个可视化图形,先把其拆成两部分,一部分是对三条可见线条(开盘价线条)进行绘制,另一部分是对三条填充线条进行绘制。下面这段代码完成了这个操作。

        x = x + x_rev,
        y = y1_upper + y1_lower,
        fill = 'tozerox',
        fillcolor = 'rgba(0,0,205,0.2)',
        line = go.Line(color = 'transparent'),

首先,x + x_rev是从1到10、再从10到1的序列,y1_upper + y1_lower是从第1天的最高价至第10天的最高价、再从第10天的最高价至第1天的最高价的序列,注意这里的y1_lower已经在数据设置部分设置为逆序,由此可以得到两条线,通过对fill属性的设置,即可对两条线之间的部分进行颜色填充,最后设置line中的color属性为’transparent',对线条进行隐藏,运行结果如图2-13所示。

该案例的代码如下。

        # 2.5-4 应用案例
        import plotly as py
        import plotly.graph_objs as go
        pyplt = py.offline.plot
        x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
        x_rev = x[::-1]

        # Line 1002104恒宝股份20170518-20170602
        y1 = [8.86, 8.85, 8.69, 8.4, 8.62, 9, 8.99, 8.85, 8.59, 9.31]
        y1_upper = [9.05, 9.03, 9.08, 8.76, 8.63, 9.04, 9.09, 9.16, 8.9, 9.45]
        y1_lower = [8.86, 8.85, 8.64, 8.36, 8.33, 8.43, 8.93, 8.84, 8.53, 8.52]
        y1_lower = y1_lower[::-1] # 逆序

        # Line 2002125湘潭电化20170518-20170602
        y2 = [10.39, 10.35, 9.85, 9.73, 9.77, 9.8, 9.75, 9.65, 9.16, 9.34]
        y2_upper = [10.58, 10.52, 10.34, 10.14, 9.87, 9.87, 9.94, 9.6, 9.42, 9.5]
        y2_lower = [10.15, 10.21, 9.72, 9.68, 9.24, 9.48, 9.62, 9.12, 9.12, 9.34]
        y2_lower = y2_lower[::-1]

        # Line 3002077大港股份20170518-20170602
        y3 = [11.88, 13.07, 12.75, 12.02, 12.1, 12.61, 12.42, 12.42, 11.18, 10.72]
        y3_upper = [11.98, 13.07, 13.4, 12.91, 12.45, 13.1, 12.61, 12.65, 12.45,
    11.16]
        y3_lower = [11.6, 11.75, 12.75, 12.02, 11.8, 11.92, 12.17, 12.29, 11.18,
    10.35]
        y3_lower = y3_lower[::-1]

        trace1 = go.Scatter(
            x = x + x_rev,
            y = y1_upper + y1_lower,
            fill = 'tozerox',
            fillcolor = 'rgba(0,0,205,0.2)',
            line = go.Line(color = 'transparent'),
            showlegend = False,
            name = ’恒宝股份’,
        )
        trace2 = go.Scatter(
            x = x + x_rev,
            y = y2_upper + y2_lower,
            fill = 'tozerox',
            fillcolor = 'rgba(30,144,255,0.2)',
            line = go.Line(color = 'transparent'),
            name = ’湘潭电化’,
            showlegend = False,
        )
        trace3 = go.Scatter(
            x = x+x_rev,
            y = y3_upper+y3_lower,
            fill = 'tozerox',
            fillcolor = 'rgba(112,128,144,0.2)',
            line = go.Line(color = 'transparent'),
            showlegend = False,
            name = ’大港股份’,
        )
        trace4 = go.Scatter(
            x = x,
            y = y1,
      line = go.Line(color = 'rgb(0,0,205)'),
      mode = 'lines',
      name = ’恒宝股份’,
  )
  trace5 = go.Scatter(
      x = x,
      y = y2,
      line = go.Line(color='rgb(30,144,255)'),
      mode = 'lines',
      name = ’湘潭电化’,
  )
  trace6 = go.Scatter(
      x = x,
      y = y3,
      line = go.Line(color='rgb(112,128,144)'),
      mode = 'lines',
      name = ’大港股份’,
  )

  data = go.Data([trace1, trace2, trace3, trace4, trace5, trace6])

  layout = go.Layout(
      paper_bgcolor = 'rgb(255,255,255)',
      plot_bgcolor = 'rgb(229,229,229)',
      xaxis = go.XAxis(
          gridcolor = 'rgb(255,255,255)',
          range = [1,10],
          showgrid = True,
          showline = False,
          showticklabels = True,
          tickcolor = 'rgb(127,127,127)',
          ticks = 'outside',
          zeroline = False
      ),
      yaxis = go.YAxis(
          gridcolor = 'rgb(255,255,255)',
          showgrid = True,
          showline = False,
          showticklabels = True,
          tickcolor = 'rgb(127,127,127)',
          ticks = 'outside',
          zeroline = False
      ),
    )
    fig = go.Figure(data = data, layout = layout)
    pyplt(fig, filename = 'tmp/shaded_lines.html')

2.5.5 应用案例

新闻来源统计线形图案例的运行结果如图2-14所示,代码见文件2.5_LineChart_5.py。

图2-14 新闻来源统计线形图

该案例的代码如下。

        # 2.5-5 应用案例
        import plotly as py
        import plotly.graph_objs as go
        pyplt = py.offline.plot
        title = 'Main Source for News'

        labels = ['Television', 'Newspaper', 'Internet', 'Radio']

        colors = ['rgba(67,67,67,1)', 'rgba(115,115,115,1)', 'rgba(49,130,189,
    1)', 'rgba(189,189,189,1)']

        mode_size = [8, 8, 12, 8]

        line_size = [2, 2, 4, 2]

        x_data = [
            [2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011,
    2013],
            [2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011,
    2013],
            [2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011,
    2013],
            [2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011,
    2013],
        ]

        y_data = [
            [74, 82, 80, 74, 73, 72, 74, 70, 70, 66, 66, 69],
            [45, 42, 50, 46, 36, 36, 34, 35, 32, 31, 31, 28],
            [13, 14, 20, 24, 20, 24, 24, 40, 35, 41, 43, 50],
            [18, 21, 18, 21, 16, 14, 13, 18, 17, 16, 19, 23],
        ]

        traces = []

        for i in range(0, 4):
            traces.append(go.Scatter(
                x = x_data[i],
                y = y_data[i],
                mode = 'lines',
                line = dict(color = colors[i], width = line_size[i]),
                connectgaps = True,
            ))

            traces.append(go.Scatter(
                x = [x_data[i][0], x_data[i][11]],
                y = [y_data[i][0], y_data[i][11]],
                mode = 'markers',
                marker = dict(color = colors[i], size = mode_size[i])
      ))

  layout = go.Layout(
      xaxis = dict(
          showline = True,
          showgrid = False,
          showticklabels = True, # True为显示坐标标记
          linecolor = 'rgb(204, 204, 204)', # x轴线的颜色
          linewidth = 2,
          autotick = False,  # True为自动删除部分日期,False为保持原状
          ticks = 'outside', # x轴上的刻度线,在图内或图外
          tickcolor = 'rgb(204, 204, 204)', # x轴上的刻度线的颜色
          tickwidth = 2, # x轴上的刻度线的宽度
          ticklen = 10,  # x轴上的刻度线的长度
          tickfont=dict(        # x轴上的坐标标记的字体样式、大小、颜色
            family = 'Arial',
            size = 12,
            color = 'rgb(82, 82, 82)',
          ),
      ),
      yaxis=dict(
          showgrid = False,
          zeroline = False,
          showline = False,
          showticklabels = False,
      ),
      autosize = False,
      margin = dict(
          autoexpand = False,
          l = 100,
          r = 20,
          t = 110,
      ),
      showlegend = False,
  )

  annotations = []

  # Adding labels
    for y_trace, label, color in zip(y_data, labels, colors):
        # labeling the left_side of the plot
        annotations.append(dict(xref = 'paper', x = 0.05, y = y_trace[0],
                                  xanchor = 'right', yanchor = 'middle',
                                  text = label + ' {}%'.format(y_trace[0]),
                                  font = dict(family = 'Arial',
                                          size = 16,
                                          color = colors, ),
                                  showarrow = False))
        # labeling the right_side of the plot
        annotations.append(dict(xref = 'paper', x = 0.95, y = y_trace[11],
                                  xanchor = 'left', yanchor = 'middle',
                                  text = '{}%'.format(y_trace[11]),
                                  font = dict(family = 'Arial',
                                          size = 16,
                                          color = colors, ),
                                  showarrow = False))
    # Title
    annotations.append(dict(xref = 'paper', yref = 'paper', x = 0.0, y = 1.05,
                              xanchor = 'left', yanchor = 'bottom',
                              text = 'Main Source for News',
                              font = dict(family = 'Arial',
                                      size = 30,
                                      color = 'rgb(37,37,37)'),
                              showarrow = False))
    # Source
    annotations.append(dict(xref = 'paper', yref = 'paper', x = 0.5, y = -0.1,
                              xanchor = 'center', yanchor = 'top',
                              text = 'Source: PewResearch Center & ' +
                                  'Storytelling with data',
                              font = dict(family = 'Arial',
                                      size = 12,
                                      color = 'rgb(150,150,150)'),
                              showarrow = False))

    layout['annotations'] = annotations

    fig = go.Figure(data = traces, layout = layout)
    pyplt(fig, filename = 'tmp/news-source.html')

2.5.6 参数解读

由于线条图的绘制方法与散点图的绘制方法是一样的,都使用Scatter函数,所以它们的参数也是一样的,读者可以参考2.3.4节的相关内容。