great_tables
  • Get Started
  • Examples
  • Reference
  • Blog
  1. Style
  2. Colorizing with Data
  • Intro
  • Overview
  • Table Structure
    • Header and Footer
    • Stub (Row Labels)
    • Column Labels
  • Format
    • Formatting Values
    • Nanoplots
  • Style
    • Styling the Table Body
    • Styling the whole table
    • Colorizing with Data
  • Theming
    • Table Theme Options
    • Premade Themes
  • Selecting table parts
    • Column Selection
    • Row Selection
    • Location selection
  • Extra
    • Contributing Guidelines

On this page

  • Setting palette colors
  • Coloring missing values with na_color
  • Using domain= to color values across columns
  • Bringing it all together
  1. Style
  2. Colorizing with Data

Colorizing with Data

You sometimes come across heat maps in data visualization, and they’re used to represent data values with color gradients. This technique is great for identifying patterns, trends, outliers, and missing data when there’s lots of data. Tables can have this sort of treatment as well! Typically, formatted numeric values are shown along with some color treatment coinciding with the underlying data values.

We can make this possible in Great Tables by using the data_color() method. Let’s start with a simple example, using a Polars DataFrame with three columns of values. We can introduce that data to GT and use data_color() without any arguments.

from great_tables import GT
import polars as pl

simple_df = pl.DataFrame(
    {
        "integer": [1, 2, 3, 4, 5],
        "float": [2.3, 1.3, 5.1, None, 4.4],
        "category": ["one", "two", "three", "one", "three"],
    }
)

GT(simple_df).data_color()
integer float category
1 2.3 one
2 1.3 two
3 5.1 three
4 None one
5 4.4 three

This works but doesn’t look all too appealing. However, we can take note of a few things straight away. The first thing is that data_color() doesn’t format the values but rather it applies color fill values to the cells. The second thing is that you don’t have to intervene and modify the text color so that there’s enough contrast, Great Tables will do that for you (this behavior can be deactivated with the autocolor_text= argument though).

Setting palette colors

While this first example illustrated some basic things, the common thing to do in practices to provide a list of colors to the palette= argument. Let’s choose two colors "green" and "red" and place them in that order.

GT(simple_df).data_color(palette=["blue", "red"])
integer float category
1 2.3 one
2 1.3 two
3 5.1 three
4 None one
5 4.4 three

Now that we’ve moved away from the default palette and specified colors, we can see that lower numerical values are closer to blue and higher values are closer to red (those in the middle have colors that are a blend of the two; in this case, more in the purple range). Categorical values behave similarly, they take on ordinal values based on their first appearance (from top to bottom) and those values are used to generate the background colors.

Coloring missing values with na_color

There is a lone "None" value in the float column, and it has a gray background. Thoughout the Great Tables package, missing values are treated in different ways and, in this case, it’s given a default color value. We can change that with the na_color= argument. Let’s try it now:

GT(simple_df).data_color(palette=["blue", "red"], na_color="#FFE4C4")
integer float category
1 2.3 one
2 1.3 two
3 5.1 three
4 None one
5 4.4 three

Now, the gray color has been changed to Bisque. Note that when it comes to colors, you can use any combination of CSS/X11 color names and hexadecimal color codes.

Using domain= to color values across columns

The previous usages of the data_color() method were such that the color ranges encompassed the boundaries of the data values. That can be changed with the domain= argument, which expects a list of two values (a lower and an upper value). Let’s use the range [0, 10] on the first two columns, integer and float, and not the third (since a numerical domain is incompatible with string-based values). Here’s the table code for that:

(
    GT(simple_df)
    .data_color(
        columns=["integer", "float"],
        palette=["blue", "red"],
        domain=[0, 10],
        na_color="white"
    )
)
integer float category
1 2.3 one
2 1.3 two
3 5.1 three
4 None one
5 4.4 three

Nice! We can clearly see that the color ramp in the first column (integer) only proceeds from blue (value: 1) to purple (value: 5) and there isn’t a reddish color in sight (would need a value close to 10).

Bringing it all together

For a more advanced treatment of data colorization in the table, let’s take the sza dataset (available in the great_tables.data submodule) and vigorously reshape it with Polars so that solar zenith angles are arranged as rows by month, and the half-hourly clock times are the columns (from early morning to solar noon).

Once the pivot()ing is done, we can introduce that that table to the GT class, placing the names of the months in the table stub. We will use data_color() with a domain that runs from 90 to 0 (here, 90° is sunrise, and 0° is represents the sun angle that’s directly overhead). There are months where the sun rises later in the morning, before the sunrise times we’ll see missing values in the dataset, and na_color="white" will handle those cases. Okay, that’s the plan, and now here’s the code:

from great_tables import html
from great_tables.data import sza
import polars.selectors as cs

sza_pivot = (
    pl.from_pandas(sza)
    .filter((pl.col("latitude") == "20") & (pl.col("tst") <= "1200"))
    .select(pl.col("*").exclude("latitude"))
    .drop_nulls()
    .pivot(values="sza", index="month", on="tst", sort_columns=True)
)

(
    GT(sza_pivot, rowname_col="month")
    .data_color(
        domain=[90, 0],
        palette=["rebeccapurple", "white", "orange"],
        na_color="white",
    )
    .tab_header(
        title="Solar Zenith Angles from 05:30 to 12:00",
        subtitle=html("Average monthly values at latitude of 20&deg;N."),
    )
)
Solar Zenith Angles from 05:30 to 12:00
Average monthly values at latitude of 20°N.
0530 0600 0630 0700 0730 0800 0830 0900 0930 1000 1030 1100 1130 1200
jan None None None 84.9 78.7 72.7 66.1 61.5 56.5 52.1 48.3 45.5 43.6 43.0
feb None None 88.9 82.5 75.8 69.6 63.3 57.7 52.2 47.4 43.1 40.0 37.8 37.2
mar None None 85.7 78.8 72.0 65.2 58.6 52.3 46.2 40.5 35.5 31.4 28.6 27.7
apr None 88.5 81.5 74.4 67.4 60.3 53.4 46.5 39.7 33.2 26.9 21.3 17.2 15.5
may None 85.0 78.2 71.2 64.3 57.2 50.2 43.2 36.1 29.1 26.1 15.2 8.8 5.0
jun 89.2 82.7 76.0 69.3 62.5 55.7 48.8 41.9 35.0 28.1 21.1 14.2 7.3 2.0
jul 88.8 82.3 75.7 69.1 62.3 55.5 48.7 41.8 35.0 28.1 21.2 14.3 7.7 3.1
aug None 83.8 77.1 70.2 63.3 56.4 49.4 42.4 35.4 28.3 21.3 14.3 7.3 1.9
sep None 87.2 80.2 73.2 66.1 59.1 52.1 45.1 38.1 31.3 24.7 18.6 13.7 11.6
oct None None 84.1 77.1 70.2 63.3 56.5 49.9 43.5 37.5 32.0 27.4 24.3 23.1
nov None None 87.8 81.3 74.5 68.3 61.8 56.0 50.2 45.3 40.7 37.4 35.1 34.4
dec None None None 84.3 78.0 71.8 66.1 60.5 55.6 50.9 47.2 44.2 42.4 41.8

Because this is a table for presentation, we can’t neglect using tab_header(). A title and subtitle can provide just enough information to guide the reader out through your table visualization.

Styling the whole table
Table Theme Options