Because the box chart is a graphical summary based on the five-number generalization method, before introducing the box chart, let's briefly introduce the five-number generalization method.
The five-number summary method uses the following five numbers to summarize data:
The box plot is a graphical representation of the five-number generalization method. As shown below:
Series.plot()
, DataFrame.plot()
or DataFrame.boxplot()
method in the pandas package;cataplot()
or boxplot()
in the seaborn package, where seaborn.boxplot()
is a situation when the parameter kind='box'
of seaborn.cataplot()
;boxplot()
method of the axes object in the matplotlib package.%matplotlib notebook
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as
tips = pd.read_csv('examples/tips.csv')
tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])
The table structure of the “tips” is:
fig,axes = plt.subplots()
tips['tip_pct'][tips.tip_pct <0.5].plot(kind='box',ax=axes)
axes.set_ylabel('values of tip_pct')
fig.savefig('p1.png') # Save the drawn figure as p1.png
The diagram drawn by the above code is:
Specific reference for the usage of Series.plot()
:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.plot.html#pandas.Series.plot
DataFrame.plot() example:
fig,axes = plt.subplots(1,4)
color = dict(boxes='DarkGreen', whiskers='DarkOrange',
medians='DarkBlue', caps='Red')
# boxes means the box, whisker means the tentacles
# medians indicates the median, caps indicates the maximum and minimum limits
tips.plot(kind='box',ax=axes,subplots=True,
title='Different boxplots',color=color,sym='r+')
# sym parameter indicates the way of outlier marking
axes[0].set_ylabel('values of total_bill')
axes[1].set_ylabel('values of tip')
axes[2].set_ylabel('values of size')
axes[3].set_ylabel('values of tips_pct')
fig.subplots_adjust(wspace=1,hspace=1) # Adjust the spacing between subplots
fig.savefig('p2.png') # Save the drawn picture as p2.png
result:
Specific reference for the usage of DataFrame.plot():
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html
DataFrame.boxplot() example:
fig,axes = plt.subplots()
tips.boxplot(column='tip_pct',by=['smoker','time'],ax=axes)
# column parameter indicates the data to be drawn into a box chart, which can be one or more columns
# by parameter indicates the grouping basis
axes.set_ylabel('values of tip_pct')
fig.savefig('p3.png') # Save the drawn figure as p3.png
result:
Specific reference for the usage of DataFrame.boxplot():
Example of seaborn.cataplot():
sns.catplot(x='tip_pct',y='day',hue='smoker',kind='box',
data=tips[tips.tip_pct <0.5])
# hue indicates the basis of grouping
fig.savefig('p4.png') # Save the drawn figure as p4.png
result:
Specific reference for the usage of seaborn.catplot():
http://seaborn.pydata.org/generated/seaborn.catplot.html?highlight=seaborn%20catplot#seaborn.catplot
Example of seaborn.boxplot():
fig,axes = plt.subplots()
sns.boxplot(x='day',y='tip_pct',hue='smoker',
data=tips[tips.tip_pct <0.5],orient='v',ax=axes)
# orient parameter indicates the direction of the box plot
axes.set_title('Boxplots grouped by smoker')
fig.savefig('p5.png') # Save the drawn figure as p5.png
Specific reference for the usage of seaborn.boxplot():
http://seaborn.pydata.org/generated/seaborn.boxplot.html#seaborn.boxplot
axes.boxplot() example:
fig,axes = plt.subplots()
axes.boxplot(x=tips.tip_pct,sym='rd',positions=[2])
# sym parameter indicates the marking method of outliers
# positions represents the position label of the box plot
axes.set_xlabel('tip_pct')
fig.savefig('p6.png') # Save the drawn figure as p6.png
result:
Specific reference for the usage of axes.boxplot():