how to plot box plot in Python

created at 07-01-2021 views: 21

Introduction to box plots:

Because the box chart is a graphical summary based on the five-number generalization method, before introducing the box chart, let's briefly introduce the five-number generalization method.

The five-number summary method uses the following five numbers to summarize data:

  1. Minimum value (Q1-1.5IQR)
  2. The first quartile (Q1)
  3. Median (Q2)
  4. The third quartile (Q3)
  5. Maximum value (Q3+1.5IQR)

The box plot is a graphical representation of the five-number generalization method. As shown below:

Application of box plot

  1. It can be used as a method to detect outliers;
  2. Graphical summary for multiple sets of data, which is convenient for visual comparison and analysis of each set of data.

The main methods of drawing box plots

  • Method 1: Use the Series.plot(), DataFrame.plot() or DataFrame.boxplot() method in the pandas package;
  • Method 2: Use cataplot() or boxplot() in the seaborn package, where seaborn.boxplot() is a situation when the parameter kind='box' of seaborn.cataplot();
  • Method 3: Use the boxplot() method of the axes object in the matplotlib package.


Export the various packages needed and prepare the data:

%matplotlib notebook
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as 

tips = pd.read_csv('examples/tips.csv')
tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])

The table structure of the “tips” is:

Specific example of method 1:

fig,axes = plt.subplots()
tips['tip_pct'][tips.tip_pct <0.5].plot(kind='box',ax=axes)
axes.set_ylabel('values of tip_pct')
fig.savefig('p1.png') # Save the drawn figure as p1.png

The diagram drawn by the above code is:

Specific reference for the usage of Series.plot():

DataFrame.plot() example:

fig,axes = plt.subplots(1,4)
color = dict(boxes='DarkGreen', whiskers='DarkOrange',
               medians='DarkBlue', caps='Red')
# boxes means the box, whisker means the tentacles
# medians indicates the median, caps indicates the maximum and minimum limits

                               title='Different boxplots',color=color,sym='r+')
# sym parameter indicates the way of outlier marking

axes[0].set_ylabel('values of total_bill')
axes[1].set_ylabel('values of tip')
axes[2].set_ylabel('values of size')
axes[3].set_ylabel('values of tips_pct')

fig.subplots_adjust(wspace=1,hspace=1) # Adjust the spacing between subplots
fig.savefig('p2.png') # Save the drawn picture as p2.png


DataFrame.plot() example

Specific reference for the usage of DataFrame.plot():

DataFrame.boxplot() example:

fig,axes = plt.subplots()
# column parameter indicates the data to be drawn into a box chart, which can be one or more columns
# by parameter indicates the grouping basis

axes.set_ylabel('values of tip_pct')
fig.savefig('p3.png') # Save the drawn figure as p3.png


DataFrame.boxplot() example

Specific reference for the usage of DataFrame.boxplot():

Specific example of method 2:

Example of seaborn.cataplot():

           data=tips[tips.tip_pct <0.5])
# hue indicates the basis of grouping

fig.savefig('p4.png') # Save the drawn figure as p4.png


Example of seaborn.cataplot()

Specific reference for the usage of seaborn.catplot():

Example of seaborn.boxplot():

fig,axes = plt.subplots()
            data=tips[tips.tip_pct <0.5],orient='v',ax=axes)
# orient parameter indicates the direction of the box plot

axes.set_title('Boxplots grouped by smoker')
fig.savefig('p5.png') # Save the drawn figure as p5.png

Example of seaborn.boxplot()

Specific reference for the usage of seaborn.boxplot():

Specific example of Method 3:

axes.boxplot() example:

fig,axes = plt.subplots()
# sym parameter indicates the marking method of outliers
# positions represents the position label of the box plot

fig.savefig('p6.png') # Save the drawn figure as p6.png


axes.boxplot() example

Specific reference for the usage of axes.boxplot():

created at:07-01-2021
edited at: 07-01-2021: