Substituting Values

Real-world data often contains missing values, zeros, or extreme numbers that can look awkward or misleading when displayed directly in a table. The sub_*() family of methods lets you replace these problematic values with more meaningful text, improving readability without altering your underlying data.

Setting Up the Example Data

For the examples on this page, we will use a small DataFrame with a mix of values that includes missing data, zeros, and both very small and very large numbers.

import pandas as pd
from great_tables import GT

df = pd.DataFrame({
    "item": ["Widget A", "Widget B", "Widget C", "Widget D", "Widget E"],
    "count": [150, 0, 42, None, 3],
    "rate": [0.003, 0.0, 0.542, 0.871, None],
    "revenue": [4500.00, 0.00, 1e13, 75.50, None],
})

gt_tbl = GT(df, rowname_col="item")
gt_tbl
count rate revenue
Widget A 150.0 0.003 4500.0
Widget B 0.0 0.0 0.0
Widget C 42.0 0.542 10000000000000.0
Widget D 0.871 75.5
Widget E 3.0

Notice how the table displays None values and raw numbers in a way that may not be ideal for a presentation table. The sub_*() methods let us address each of these cases.

Substituting Missing Values

The sub_missing() method replaces None (or NaN) values with a text string of your choice. By default, it inserts an em dash, but you can provide any replacement text through the missing_text= argument.

gt_tbl.sub_missing(missing_text="N/A")
count rate revenue
Widget A 150.0 0.003 4500.0
Widget B 0.0 0.0 0.0
Widget C 42.0 0.542 10000000000000.0
Widget D N/A 0.871 75.5
Widget E 3.0 N/A N/A

You can also target specific columns, leaving other columns to display their missing values as-is.

gt_tbl.sub_missing(columns="count", missing_text="not reported")
count rate revenue
Widget A 150.0 0.003 4500.0
Widget B 0.0 0.0 0.0
Widget C 42.0 0.542 10000000000000.0
Widget D not reported 0.871 75.5
Widget E 3.0

Only the count column has its missing value replaced with our custom text. The rate and revenue columns still show their default missing representation.

Substituting Zero Values

When zeros are not meaningful in context (for example, in a column that tracks incidents or errors), you can use sub_zero() to replace them with explanatory text. The default replacement is "nil", but this is customizable through the zero_text= argument.

gt_tbl.sub_zero(columns="count", zero_text="none")
count rate revenue
Widget A 150.0 0.003 4500.0
Widget B none 0.0 0.0
Widget C 42.0 0.542 10000000000000.0
Widget D 0.871 75.5
Widget E 3.0

Here the zero in the count column now reads as "none", which is clearer for the reader.

Substituting Small Values

Very small numbers can be distracting in a table, especially when they fall below a meaningful threshold. The sub_small_vals() method replaces positive values between zero and a given threshold with indicator text like "<0.01".

gt_tbl.sub_small_vals(columns="rate", threshold=0.01)
count rate revenue
Widget A 150.0 <0.01 4500.0
Widget B 0.0 0.0 0.0
Widget C 42.0 0.542 10000000000000.0
Widget D 0.871 75.5
Widget E 3.0

The value 0.003 in the rate column is now displayed as "<0.01" since it falls below the threshold. All other values remain unchanged.

You can also handle negative small values by setting sign="-". This substitutes values that are between 0 and the negative of the threshold.

df_neg = pd.DataFrame({
    "label": ["X", "Y", "Z"],
    "change": [-0.002, -5.3, 0.7],
})

GT(df_neg, rowname_col="label").sub_small_vals(columns="change", threshold=0.01, sign="-")
change
X >-0.01
Y -5.3
Z 0.7

The -0.002 value is replaced since its absolute value falls below the threshold. The sign="-" argument tells the method to look for small negative values rather than small positive ones.

Substituting Large Values

In some datasets, extremely large values can skew the reader’s perception of the data. The sub_large_vals() method lets you cap the displayed values at a threshold, replacing anything above it with indicator text.

gt_tbl.sub_large_vals(columns="revenue", threshold=1e10)
count rate revenue
Widget A 150.0 0.003 4500.0
Widget B 0.0 0.0 0.0
Widget C 42.0 0.542 >=10000000000.0
Widget D 0.871 75.5
Widget E 3.0

The value 1e13 in the revenue column is now shown as ">=10000000000.0" rather than displaying the full number. You can customize the pattern with the large_pattern= argument.

gt_tbl.sub_large_vals(columns="revenue", threshold=1e10, large_pattern="OVER {x}")
count rate revenue
Widget A 150.0 0.003 4500.0
Widget B 0.0 0.0 0.0
Widget C 42.0 0.542 OVER 10000000000.0
Widget D 0.871 75.5
Widget E 3.0

The {x} placeholder in the pattern is replaced with the threshold value, giving you full control over how the capped text reads.

General Value Substitution

For more flexible replacement logic, sub_values() provides three modes of matching: by exact value, by regex pattern, or by a custom function.

Matching by Value

You can supply a specific value (or list of values) to match against. Any cell containing that value gets replaced.

GT(df, rowname_col="item").sub_values(columns="count", values=0, replacement="zero")
count rate revenue
Widget A 150.0 0.003 4500.0
Widget B zero 0.0 0.0
Widget C 42.0 0.542 10000000000000.0
Widget D 0.871 75.5
Widget E 3.0

Every cell in the count column that contains exactly 0 is replaced with the string "zero". You can also pass a list of values to match against multiple targets.

Matching by Pattern

A regex pattern can target string-based cell content for replacement.

df_text = pd.DataFrame({
    "code": ["PASS-001", "FAIL-002", "PASS-003"],
    "result": ["ok", "error", "ok"],
})

GT(df_text).sub_values(columns="code", pattern=r"^FAIL.*", replacement="FLAGGED")
code result
PASS-001 ok
FLAGGED error
PASS-003 ok

The regex matches any cell starting with "FAIL" and replaces the entire content with "FLAGGED". This is particularly useful for cleaning up status codes or categorized identifiers.

Matching by Function

The most flexible approach uses a custom function. The function receives a cell value and should return True for values that need to be replaced.

GT(df, rowname_col="item").sub_values(
    columns="revenue",
    fn=lambda x: x is not None and x > 10000,
    replacement="HIGH"
)
count rate revenue
Widget A 150.0 0.003 4500.0
Widget B 0.0 0.0 0.0
Widget C 42.0 0.542 HIGH
Widget D 0.871 75.5
Widget E 3.0

The function evaluates each cell value individually. When it returns True, that cell is replaced with the specified text. This mode handles complex logic that cannot be expressed as a simple value match or regex pattern.

Combining Substitution Methods

You can chain multiple sub_*() calls together to handle several cases in a single table. The methods are applied in the order they are called.

(
    GT(df, rowname_col="item")
    .sub_missing(missing_text="N/A")
    .sub_zero(columns=["count", "rate"], zero_text="none")
    .sub_small_vals(columns="rate", threshold=0.01)
    .sub_large_vals(columns="revenue", threshold=1e10, large_pattern=">10B")
)
count rate revenue
Widget A 150.0 <0.01 4500.0
Widget B none none 0.0
Widget C 42.0 0.542 >10B
Widget D N/A 0.871 75.5
Widget E 3.0 N/A N/A

This combination addresses missing data, zero values, very small numbers, and very large numbers all at once, producing a clean and informative table.

The sub_*() methods work as a pre-processing step before formatting. This means you can apply fmt_*() methods to the same columns and the substituted text will remain in place for cells that were already replaced, while the formatter handles the remaining values.