how to plot box plot in Python

created at 07-01-2021 views: 2

Introduction to box plots:

Because the box chart is a graphical summary based on the five-number generalization method, before introducing the box chart, let's briefly introduce the five-number generalization method.

The five-number summary method uses the following five numbers to summarize data:

  1. Minimum value (Q1-1.5IQR)
  2. The first quartile (Q1)
  3. Median (Q2)
  4. The third quartile (Q3)
  5. Maximum value (Q3+1.5IQR)

The box plot is a graphical representation of the five-number generalization method. As shown below:

Application of box plot

  1. It can be used as a method to detect outliers;
  2. Graphical summary for multiple sets of data, which is convenient for visual comparison and analysis of each set of data.

The main methods of drawing box plots

  • Method 1: Use the Series.plot(), DataFrame.plot() or DataFrame.boxplot() method in the pandas package;
  • Method 2: Use cataplot() or boxplot() in the seaborn package, where seaborn.boxplot() is a situation when the parameter kind='box' of seaborn.cataplot();
  • Method 3: Use the boxplot() method of the axes object in the matplotlib package.

plot

Export the various packages needed and prepare the data:

%matplotlib notebook
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as 

tips = pd.read_csv('examples/tips.csv')
tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])

The table structure of the “tips” is:

Specific example of method 1:

fig,axes = plt.subplots()
tips['tip_pct'][tips.tip_pct <0.5].plot(kind='box',ax=axes)
axes.set_ylabel('values of tip_pct')
fig.savefig('p1.png') # Save the drawn figure as p1.png

The diagram drawn by the above code is:

Specific reference for the usage of Series.plot():

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.plot.html#pandas.Series.plot

DataFrame.plot() example:

fig,axes = plt.subplots(1,4)
color = dict(boxes='DarkGreen', whiskers='DarkOrange',
               medians='DarkBlue', caps='Red')
# boxes means the box, whisker means the tentacles
# medians indicates the median, caps indicates the maximum and minimum limits

tips.plot(kind='box',ax=axes,subplots=True,
                               title='Different boxplots',color=color,sym='r+')
# sym parameter indicates the way of outlier marking

axes[0].set_ylabel('values of total_bill')
axes[1].set_ylabel('values of tip')
axes[2].set_ylabel('values of size')
axes[3].set_ylabel('values of tips_pct')

fig.subplots_adjust(wspace=1,hspace=1) # Adjust the spacing between subplots
fig.savefig('p2.png') # Save the drawn picture as p2.png

result:

DataFrame.plot() example

Specific reference for the usage of DataFrame.plot():

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html

DataFrame.boxplot() example:

fig,axes = plt.subplots()
tips.boxplot(column='tip_pct',by=['smoker','time'],ax=axes)
# column parameter indicates the data to be drawn into a box chart, which can be one or more columns
# by parameter indicates the grouping basis

axes.set_ylabel('values of tip_pct')
fig.savefig('p3.png') # Save the drawn figure as p3.png

result:

DataFrame.boxplot() example

Specific reference for the usage of DataFrame.boxplot():

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.boxplot.html?highlight=dataframe%20boxplot#pandas.DataFrame.boxplot

Specific example of method 2:

Example of seaborn.cataplot():

sns.catplot(x='tip_pct',y='day',hue='smoker',kind='box',
           data=tips[tips.tip_pct <0.5])
# hue indicates the basis of grouping

fig.savefig('p4.png') # Save the drawn figure as p4.png

result:

Example of seaborn.cataplot()

Specific reference for the usage of seaborn.catplot():

http://seaborn.pydata.org/generated/seaborn.catplot.html?highlight=seaborn%20catplot#seaborn.catplot

Example of seaborn.boxplot():

fig,axes = plt.subplots()
sns.boxplot(x='day',y='tip_pct',hue='smoker',
            data=tips[tips.tip_pct <0.5],orient='v',ax=axes)
# orient parameter indicates the direction of the box plot

axes.set_title('Boxplots grouped by smoker')
fig.savefig('p5.png') # Save the drawn figure as p5.png

Example of seaborn.boxplot()

Specific reference for the usage of seaborn.boxplot():

http://seaborn.pydata.org/generated/seaborn.boxplot.html#seaborn.boxplot

Specific example of Method 3:

axes.boxplot() example:

fig,axes = plt.subplots()
axes.boxplot(x=tips.tip_pct,sym='rd',positions=[2])
# sym parameter indicates the marking method of outliers
# positions represents the position label of the box plot

axes.set_xlabel('tip_pct')
fig.savefig('p6.png') # Save the drawn figure as p6.png

result:

axes.boxplot() example

Specific reference for the usage of axes.boxplot():

https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.boxplot.html?highlight=axes%20boxplot#matplotlib.axes.Axes.boxplot

Please log in to leave a comment.