Colorizing with Data

You sometimes come across heat maps in data visualization, and they’re used to represent data values with color gradients. This technique is great for identifying patterns, trends, outliers, and missing data when there’s lots of data. Tables can have this sort of treatment as well! Typically, formatted numeric values are shown along with some color treatment coinciding with the underlying data values.

We can make this possible in Great Tables by using the data_color() method. Let’s start with a simple example, using a Polars DataFrame with three columns of values. We can introduce that data to GT() and use data_color() without any arguments.

from great_tables import GT
import polars as pl

simple_df = pl.DataFrame(
    {
        "integer": [1, 2, 3, 4, 5],
        "float": [2.3, 1.3, 5.1, None, 4.4],
        "category": ["one", "two", "three", "one", "three"],
    }
)

GT(simple_df).data_color()

integer	float	category
1	2.3	one
2	1.3	two
3	5.1	three
4	None	one
5	4.4	three

This works but doesn’t look all too appealing. However, we can take note of a few things straight away. The first thing is that data_color() doesn’t format the values but rather it applies color fill values to the cells. The second thing is that you don’t have to intervene and modify the text color so that there’s enough contrast, Great Tables will do that for you (this behavior can be deactivated with the autocolor_text= argument though).

Setting palette colors

While this first example illustrated some basic things, the common thing to do in practices to provide a list of colors to the palette= argument. Let’s choose two colors "green" and "red" and place them in that order.

GT(simple_df).data_color(palette=["blue", "red"])

integer	float	category
1	2.3	one
2	1.3	two
3	5.1	three
4	None	one
5	4.4	three

Now that we’ve moved away from the default palette and specified colors, we can see that lower numerical values are closer to blue and higher values are closer to red (those in the middle have colors that are a blend of the two; in this case, more in the purple range). Categorical values behave similarly, they take on ordinal values based on their first appearance (from top to bottom) and those values are used to generate the background colors.

Coloring missing values with `na_color`

There is a lone "None" value in the float column, and it has a gray background. Thoughout the Great Tables package, missing values are treated in different ways and, in this case, it’s given a default color value. We can change that with the na_color= argument. Let’s try it now:

GT(simple_df).data_color(palette=["blue", "red"], na_color="#FFE4C4")

integer	float	category
1	2.3	one
2	1.3	two
3	5.1	three
4	None	one
5	4.4	three

Now, the gray color has been changed to Bisque. Note that when it comes to colors, you can use any combination of CSS/X11 color names and hexadecimal color codes.

Using `domain=` to color values across columns

The previous usages of the data_color() method were such that the color ranges encompassed the boundaries of the data values. That can be changed with the domain= argument, which expects a list of two values (a lower and an upper value). Let’s use the range [0, 10] on the first two columns, integer and float, and not the third (since a numerical domain is incompatible with string-based values). Here’s the table code for that:

(
    GT(simple_df)
    .data_color(
        columns=["integer", "float"],
        palette=["blue", "red"],
        domain=[0, 10],
        na_color="white"
    )
)

integer	float	category
1	2.3	one
2	1.3	two
3	5.1	three
4	None	one
5	4.4	three

Nice! We can clearly see that the color ramp in the first column (integer) only proceeds from blue (value: 1) to purple (value: 5) and there isn’t a reddish color in sight (would need a value close to 10).

Bringing it all together

For a more advanced treatment of data colorization in the table, let’s take the sza dataset (available in the great_tables.data submodule) and vigorously reshape it with Polars so that solar zenith angles are arranged as rows by month, and the half-hourly clock times are the columns (from early morning to solar noon).

Once the pivot()ing is done, we can introduce that that table to the GT() class, placing the names of the months in the table stub. We will use data_color() with a domain that runs from 90 to 0 (here, 90° is sunrise, and 0° is represents the sun angle that’s directly overhead). There are months where the sun rises later in the morning, before the sunrise times we’ll see missing values in the dataset, and na_color="white" will handle those cases. Okay, that’s the plan, and now here’s the code:

from great_tables import html
from great_tables.data import sza
import polars.selectors as cs

sza_pivot = (
    pl.from_pandas(sza)
    .filter((pl.col("latitude") == "20") & (pl.col("tst") <= "1200"))
    .select(pl.col("*").exclude("latitude"))
    .drop_nulls()
    .pivot(values="sza", index="month", on="tst", sort_columns=True)
)

(
    GT(sza_pivot, rowname_col="month")
    .data_color(
        domain=[90, 0],
        palette=["rebeccapurple", "white", "orange"],
        na_color="white",
    )
    .tab_header(
        title="Solar Zenith Angles from 05:30 to 12:00",
        subtitle=html("Average monthly values at latitude of 20&deg;N."),
    )
)

	0530	0600	0630	0700	0730	0800	0830	0900	0930	1000	1030	1100	1130	1200
Solar Zenith Angles from 05:30 to 12:00
Average monthly values at latitude of 20°N.
jan	None	None	None	84.9	78.7	72.7	66.1	61.5	56.5	52.1	48.3	45.5	43.6	43.0
feb	None	None	88.9	82.5	75.8	69.6	63.3	57.7	52.2	47.4	43.1	40.0	37.8	37.2
mar	None	None	85.7	78.8	72.0	65.2	58.6	52.3	46.2	40.5	35.5	31.4	28.6	27.7
apr	None	88.5	81.5	74.4	67.4	60.3	53.4	46.5	39.7	33.2	26.9	21.3	17.2	15.5
may	None	85.0	78.2	71.2	64.3	57.2	50.2	43.2	36.1	29.1	26.1	15.2	8.8	5.0
jun	89.2	82.7	76.0	69.3	62.5	55.7	48.8	41.9	35.0	28.1	21.1	14.2	7.3	2.0
jul	88.8	82.3	75.7	69.1	62.3	55.5	48.7	41.8	35.0	28.1	21.2	14.3	7.7	3.1
aug	None	83.8	77.1	70.2	63.3	56.4	49.4	42.4	35.4	28.3	21.3	14.3	7.3	1.9
sep	None	87.2	80.2	73.2	66.1	59.1	52.1	45.1	38.1	31.3	24.7	18.6	13.7	11.6
oct	None	None	84.1	77.1	70.2	63.3	56.5	49.9	43.5	37.5	32.0	27.4	24.3	23.1
nov	None	None	87.8	81.3	74.5	68.3	61.8	56.0	50.2	45.3	40.7	37.4	35.1	34.4
dec	None	None	None	84.3	78.0	71.8	66.1	60.5	55.6	50.9	47.2	44.2	42.4	41.8

Because this is a table for presentation, we can’t neglect using tab_header(). A title and subtitle can provide just enough information to guide the reader out through your table visualization.

Setting palette colors

Coloring missing values with na_color

Using domain= to color values across columns

Bringing it all together

Coloring missing values with `na_color`

Using `domain=` to color values across columns