great_tables
  • Get Started
  • Examples
  • Reference
  • Blog

On this page

  • Unit and scientific notation
  • Nanoplots
  • Hope all your (science-y) tables are great!

Great Tables for Scientific Publishing

Author

Rich Iannone

Published

July 8, 2024

Great Tables version 0.10.0 has be released today and it contains a host of new features to support tables meant for scientific publishing.

In this post, we’ll review the big pieces that scientific tables need:

  • Unit notation: rendering units and chemical formulas (e.g., °C or C6H6).
  • Scientific notation: formatting for very large and small numbers (e.g., 3.50 × 10−11)
  • Nanoplots: compact visualizations for revealing trends.

We’ve added six new datasets, to help quickly show off scientific publishing! We’ll use the new reactions and gibraltar datasets to create examples in the fields of Atmospheric Chemistry and Meteorology, respectively.

Tip

Rich presented on this topic at SciPy 2024!

At SciPy 2024 (on July 11, 2024), Rich delivered a talk called Great Tables for Everyone and it presented some of the tables shown in this post. If you weren’t in attendence that’s okay, you can watch the recorded talk and the materials are available in GitHub.

Unit and scientific notation

We added the reactions dataset to serve as the basis for examples in the discipline of Atmospheric Chemistry. The dataset contains reaction rate constants for gas-phase reactions of 1,683 organic compounds. Each of these compounds can potentially undergo reaction with hydroxyl radicals (OH), nitrate radicals (NO3), or chlorine atoms (Cl). These reaction rate constants are typically very small values in units of cm3 molecules–1 s–1. In the upcoming example, we’ll pare down this massive dataset to only 11 rows representing the class of organic compounds known as mercaptans.

To make this table work well in a scientific reporting context, we need three pieces:

  • way to represent units, like cm3
  • method for typesetting chemical formulae, as in CH4
  • formatting for very small numbers in scientific notation.

Great Tables provides the necessary functionality for all three requirements. Here is a summary table that tabulates rate constants for mercaptan compounds undergoing reaction with OH, O3, and Cl:

Show the Code
from great_tables import GT
from great_tables.data import reactions
import polars as pl
import polars.selectors as ps

reactions_mini = (
    pl.from_pandas(reactions)
    .filter(pl.col("cmpd_type") == "mercaptan")
    .select([
        "cmpd_name",
        "cmpd_formula",
        ps.ends_with("k298")
    ])
    .with_columns(
        cmpd_formula=pl.concat_str(
            "%" + pl.col("cmpd_formula") + "%"
        )
    )
)

(
    GT(reactions_mini, rowname_col="cmpd_name")
    .tab_header(title="Gas-phase reactions of selected mercaptan compounds")
    .tab_spanner(
        columns=ps.ends_with("k298"),
        label="Reaction Rate Constant (298 K),<br>{{cm^3 molecules^–1 s^–1}}"
    )
    .fmt_units(columns="cmpd_formula")
    .fmt_scientific(columns=ps.ends_with("k298"))
    .sub_missing()
    .cols_hide(columns="O3_k298")
    .cols_label(
        cmpd_formula="",
        OH_k298="OH",
        NO3_k298="{{%NO3%}}",
        Cl_k298="Cl",
    )
    .opt_stylize(style=1, color="blue")
    .opt_horizontal_padding(scale=3)
    .opt_table_font(stack="humanist")
)
Gas-phase reactions of selected mercaptan compounds
Reaction Rate Constant (298 K),
cm3 molecules–1 s–1
OH NO3 Cl
methanethiol CH4S 3.50 × 10−11 9.20 × 10−13 2.00 × 10−10
ethanethiol C2H6S 4.50 × 10−11 1.21 × 10−12 1.75 × 10−10
propanethiol C3H8S 5.30 × 10−11 — 2.14 × 10−10
2-propanethiol C3H8S 3.90 × 10−11 — 2.70 × 10−10
1-butanethiol C4H10S 5.60 × 10−11 — —
2-methyl-1-propanethiol C4H10S 4.60 × 10−11 — —
2-butanethiol C4H10S 3.80 × 10−11 — 1.65 × 10−10
t-butylsulfide C4H10S 2.90 × 10−11 — —
2-methylbutanethiol C5H12S 5.20 × 10−11 — —
n-pentanethiol C5H12S — — 1.97 × 10−10
1,2-ethanedithiol C2H6S2 3.80 × 10−11 — —

This is a nice-looking table! And note these pieces:

  • The label= argument to functions like .tab_spanner() supports the use of curly braces ({{/}}) for the specialized units notation. So using "{{cm^3 molecules^–1 s^–1}}" in the input will become cm3 molecules–1 s–1 in the output
  • The .fmt_units() method converts values that are already in units notation in the table body. For example, a cell with text "%CH4S%" becomes CH4S (the surrounding % indicates that the text should be interpreted as chemistry notation).
  • The .fmt_scientific() method formats values (in this case, very small values) to scientific notation (e.g., 3.50 × 10–11). Not doing so would make the table look very strange to a researcher that is familar with this sort of data.

The combination of units notation (and chemistry notation, which is a part of that) really makes the presentation of this table complete and understandable to a practioner of the field. Great Tables supports the use of units notation in spanner labels (with .tab_spanner()) and also in column labels (with .cols_labels()). The column label ‘NO3’ was created with the latter method by supplying the text "{{%NO3%}}" as the column label for the NO3_k298 column.

Nanoplots

We added the nanoplots feature to Great Tables in v0.4.0 (check out the intro blog post for a quick explainer) so that tables can contain small, info-packed plots that fit reasonably well into a table context. They are interactive in that hovering over the data points provides additional plot information. This approach brings together the advantages of plots (elucidation of trends in data) and tables (access to numerical values representing the data points) in a single summary visualization.

Version 0.10.0 of Great Tables adds the gibraltar dataset, which provides meteorological data (temeperature, humidity, wind speed, etc.) for the entire month of May 2024 at Gibraltar Airport Station.

Nanoplots, as mentioned, are great for condensing a lot of information into a small area. Our example here with the gibraltar dataset takes all of the temperature and humidity data for the first 10 days of May 2023 and displays them in easy-to-explore nanoplots across two columns:

Show the Code
from great_tables import GT, nanoplot_options
from great_tables.data import gibraltar
import polars as pl

nano_opts = nanoplot_options(
    data_point_radius=4,
    data_point_stroke_width=4,
    data_point_stroke_color="black",
    data_point_fill_color="white",
    data_line_stroke_width=4,
    data_line_stroke_color="gray",
    show_data_line=True,
    show_data_points=True,
    show_data_area=False,
  )

gibraltar_mini = (
    pl.from_pandas(gibraltar)
    .filter(pl.col("date") <= "2023-05-10")
    .with_columns(pl.col("humidity") * 100)
    .select(["date", "temp", "humidity"])
    .group_by("date")
    .agg(pl.col("temp"), pl.col("humidity"))
    .sort("date")
)

(
  GT(gibraltar_mini)
  .tab_header(
    title="Meteorological Summary of Gibraltar Station",
    subtitle="Data taken from May 1-10, 2023."
  )
  .fmt_nanoplot(
    columns="temp", autoscale=True, options=nano_opts
  )
  .fmt_nanoplot(
    columns="humidity", autoscale=True, options=nano_opts
  )
  .fmt_date(
    columns="date",
    date_style="wd_m_day_year"
  )
  .cols_label(
    date="Date",
    temp="Temperature, {{:degree:C}}",
    humidity="Humidity, % (RH)",
  )
  .cols_align(
    align="left",
    columns=["temp", "humidity"]
  )
)
Meteorological Summary of Gibraltar Station
Data taken from May 1-10, 2023.
Date Temperature, °C Humidity, % (RH)
Mon, May 1, 2023
30.016.118.918.917.818.918.917.817.817.818.918.917.817.817.217.817.217.817.818.921.121.122.222.222.222.222.222.221.121.121.121.120.020.020.020.020.018.920.018.918.917.817.217.217.217.217.217.2
1003768737773687373736464737372888277777364686060606464646873737378787883838883888894949494949494
Tue, May 2, 2023
30.016.117.217.217.217.217.817.817.817.817.817.817.817.817.817.818.918.918.918.918.918.918.918.920.021.121.121.122.222.222.222.222.222.222.222.222.221.121.120.020.018.918.918.918.918.918.9
100371001001001009494949494949494888888888888888888888378736864646064606060646468687878838383838383
Wed, May 3, 2023
30.016.118.918.918.920.018.918.918.920.020.018.918.918.918.918.920.020.020.020.021.121.120.020.021.121.121.121.122.221.121.121.121.121.121.121.121.120.020.020.020.020.020.018.918.918.918.9
10037838383737878837373787878737368687373686873736868646864686868686868687378787378787883838383
Thu, May 4, 2023
30.016.118.917.817.217.217.217.217.217.217.217.216.117.217.817.817.817.817.817.218.917.820.021.121.122.222.822.222.822.822.823.926.125.026.126.125.025.025.025.025.026.125.023.922.822.822.822.822.221.1
10037838388888888888882828882736883838382787768686869656965656561475447474741413941394447504747445360
Fri, May 5, 2023
30.016.121.121.121.121.121.121.121.121.121.120.018.917.217.217.217.217.817.817.817.817.817.818.918.918.918.920.020.020.021.121.120.023.925.023.922.822.822.222.222.220.020.020.020.018.918.918.9
10037.060.056.056.053.053.053.053.053.056.064.064.082.094.094.094.088.088.088.088.088.088.083.083.083.083.078.078.078.073.073.078.057.054.053.057.057.060.064.069.078.078.078.083.088.088.088.0
Sat, May 6, 2023
30.016.118.918.918.918.917.817.817.817.217.217.817.217.217.818.920.021.122.222.223.923.922.822.822.822.823.925.023.925.025.025.025.025.022.822.822.822.221.121.121.121.120.020.0
10037.088.094.094.083.088.088.083.088.094.088.082.082.077.078.073.068.064.064.053.053.057.057.057.057.057.050.050.044.047.047.047.044.057.047.047.050.064.060.053.064.068.078.0
Sun, May 7, 2023
30.016.120.020.018.920.020.020.020.020.020.021.120.020.020.020.020.020.021.121.122.222.823.925.026.126.126.125.022.822.823.923.922.822.823.923.922.823.922.822.822.222.823.923.923.922.822.8
10037.078.078.083.078.078.078.078.078.078.068.078.073.073.073.078.078.073.073.064.065.061.054.047.047.051.057.065.065.061.061.065.065.061.061.065.061.065.065.069.061.053.047.047.047.047.0
Mon, May 8, 2023
30.016.122.221.121.120.020.018.918.918.917.818.917.818.917.817.817.817.817.817.818.918.918.918.918.920.018.920.021.121.121.121.121.121.120.020.021.120.021.121.122.222.823.922.822.822.222.221.121.118.918.918.9
100376068737878888388888394889494941001001009494949494888888787878787878838378837878736965696573737878888888
Tue, May 9, 2023
30.016.120.021.122.823.922.822.822.221.117.817.818.918.918.917.817.817.217.217.217.217.817.817.817.817.817.817.818.920.018.918.921.122.827.830.030.030.027.227.828.928.928.927.227.227.227.227.226.126.125.0
10037.083.073.057.050.053.053.057.064.094.094.088.088.094.010094.094.010010094.094.088.088.088.088.088.088.083.078.083.083.078.069.045.037.037.037.048.042.042.040.040.048.048.045.045.045.047.047.050.0
Wed, May 10, 2023
30.016.125.025.025.025.025.025.023.923.923.923.923.922.822.822.822.221.122.223.922.822.222.222.222.222.222.222.822.822.822.822.222.822.822.822.822.222.222.222.221.121.121.121.120.020.020.020.020.020.0
10037505050505050535353504750535053686047507369696969696565656569656161616464646973736873737373737373

Once we have the data aggregated in the form of list columns, the .fmt_nanoplot() method shows us the trends of temperature and relative humidity values throughout the day (from 00:00 to 24:00). One interesting observation that can be made from the table is that on May 9, 2023 there was a late-day temperature increase that coincided with a corresponding decrease in relative humidity. Making such an observation without nanoplots would be quite a bit more difficult and would require some serious determination, necessitating a careful scanning of numbers across a row cells.

Units notation is ever useful and it is applied in one of the column labels of this table. It could potentially be difficult to format even simple things like the units of temperature. In this case we wanted to add in the temperature units of °C for the temperature column. Units notation has a collection of symbols available, including ":degree:" (colons encapsulate the collection of symbol keywords), for insertion within units notation text. The example takes advantage of the available symbols and so having °C as part of a label is not too hard to express.

Hope all your (science-y) tables are great!

We did scientific work pretty heavily in the past and so we understand that great tables in the realm of science publication is something that could and should be possible. We’ll keep doing more to make this even better in upcoming releases.