5 Pandas chart beautification styles

created at 07-13-2021 views: 16

Introduction

Pandas is an efficient data processing library. It uses dataframe and series as basic data types to present two-dimensional data similar to excel.

In Jupyter, the output of Pandas will be beautified. Different from the text format displayed by the IDE, Jupyter can modify the style of the table through CSS.

When we make excel tables, we often highlight important data, or use different colors to indicate the size of the data. This is also achievable in Pandas, and it is very concise.

Pandas provides the DataFrame.style property, which returns a Styler object to beautify the data style.

pandas styles

Generally, we need to pass the style function as a parameter to the following method to beautify the chart.

  • Styler.applymap: Act on elements
  • Styler.apply: Act on rows, columns or the entire table

Here are some examples to show the commonly used beautification forms in detail.

Highlight

For ease of presentation, the data example uses the data of the top ten countries in the world population in 2021.

import pandas as pd
data = pd.read_excel(r"E:\\jupyter_notebook\\2021世界人口数据.xlsx")
data

data structure

Let's first look at the information of the table:

data.info()

data information

Except for the first two columns, the other columns are all numeric types.

Now highlight the maximum value of the specified column:

def highlight_max(s):
    '''
    Highlight (yellow) processing for the maximum value of the column
    '''
    is_max = s == s.max()
    return ['background-color: yellow' if v else '' for v in is_max]

data.style.apply(highlight_max,subset=['2021人口', '2020人口', '面积','单位面积人口','人口增幅','世界占比'])

highlight result

If you don't want to highlight the element background, you can also directly change the color of the specified element to achieve the purpose of highlighting the key point.

Mark the elements whose population per unit area is greater than 200:

def color_red(s):
    is_max = s > 200
    return ['color : red' if v else '' for v in is_max]

data.style.apply(color_red,subset=['单位面积人口'])

mark specific elements

data bar display

In the Excel conditional format, there is a data bar display method to visually express the data size.

The Pandas Style method also has the expression form of the data bar, which is realized by df.style.bar.

Using the previous demographic data example, let's see how to manipulate the data bar.

import pandas as pd
data = pd.read_excel(r"E:\\jupyter_notebook\\2021世界人口数据.xlsx")
# The data bar shows the data size of the specified column
data.style.bar(subset=['2021人口', '2020人口'], color='#FFA500')

data bar display

color scale display

The color scale is also the heat map, which is the same as the data bar and is used to express the size of the data.

The use of color scale in Pandas Style is also very simple, implemented with df.style.background_gradient.

import seaborn as sns

# Use seaborn to get the color
cm = sns.light_palette("green", as_cmap=True)
# Color scale realization
data.style.background_gradient(cmap=cm,subset=['2021人口', '2020人口', '面积','单位面积人口','人口增幅','世界占比'])

color scale display

You can adjust the color gradation range by selecting the maximum and minimum color ratio.

Before adjustment:

import seaborn as sns

# Color scale realization, here use the built-in color scale type, do not adjust the color range
data.style.background_gradient(cmap='viridis',high=0.2,low=0.1,subset=['2021人口', '2020人口', '面积','单位面积人口','人口增幅','世界占比'])

 adjust the color gradation range

After adjustment:

After adjustment

percentage display

Some figures need to be displayed as a percentage to be accurately expressed, such as population growth in population data and world share.

Pandas can display the percentage in the data frame, through Styler.format.

data.style.format("{:.2%}",subset=['人口增幅','世界占比'])

percentage display

mark missing values

There may be missing values in the data set. What should I do if I want to highlight the missing values?

There are several commonly used methods, one is to replace with-symbol, and the other is to highlight

First create a table with missing values, or use population data.

import pandas as pd
import numpy as np
data = pd.read_excel(r"E:\\jupyter_notebook\\2021世界人口数据.xlsx")
data.iloc[1, 4] = np.nan
data.iloc[3, 1] = np.nan
data.iloc[6, 6] = np.nan
data

mark missing values

There are three missing values in the above data. We use the - symbol to replace the missing values:

data.style.format(None, na_rep="-")

result of mark missing values

Try again to highlight missing values:

data.style.highlight_null(null_color='red')

highlight missing values

Attachment: output the style to excel

The data beautification style in Pandas can not only be displayed in the notebook, but also output to excel.

The to_excel method is used here, and openpyxl is used as the kernel

import pandas as pd
import numpy as np
data = pd.read_excel(r"E:\\jupyter_notebook\\2021世界人口数据.xlsx")
data.style.background_gradient(cmap='viridis',subset=['2021人口', '2020人口', '面积','单位面积人口','人口增幅','世界占比']).\
                              to_excel('style.xlsx', engine='openpyxl')

final result

created at:07-13-2021
edited at: 07-13-2021: