Real-world data often contains missing values, zeros, or extreme numbers that can look awkward or misleading when displayed directly in a table. The sub_*() family of methods lets you replace these problematic values with more meaningful text, improving readability without altering your underlying data.
Setting Up the Example Data
For the examples on this page, we will use a small DataFrame with a mix of values that includes missing data, zeros, and both very small and very large numbers.
import pandas as pd
from great_tables import GT
df = pd.DataFrame({
"item": ["Widget A", "Widget B", "Widget C", "Widget D", "Widget E"],
"count": [150, 0, 42, None, 3],
"rate": [0.003, 0.0, 0.542, 0.871, None],
"revenue": [4500.00, 0.00, 1e13, 75.50, None],
})
gt_tbl = GT(df, rowname_col="item")
gt_tbl
|
count |
rate |
revenue |
| Widget A |
150.0 |
0.003 |
4500.0 |
| Widget B |
0.0 |
0.0 |
0.0 |
| Widget C |
42.0 |
0.542 |
10000000000000.0 |
| Widget D |
|
0.871 |
75.5 |
| Widget E |
3.0 |
|
|
Notice how the table displays None values and raw numbers in a way that may not be ideal for a presentation table. The sub_*() methods let us address each of these cases.
Substituting Missing Values
The sub_missing() method replaces None (or NaN) values with a text string of your choice. By default, it inserts an em dash, but you can provide any replacement text through the missing_text= argument.
gt_tbl.sub_missing(missing_text="N/A")
|
count |
rate |
revenue |
| Widget A |
150.0 |
0.003 |
4500.0 |
| Widget B |
0.0 |
0.0 |
0.0 |
| Widget C |
42.0 |
0.542 |
10000000000000.0 |
| Widget D |
N/A |
0.871 |
75.5 |
| Widget E |
3.0 |
N/A |
N/A |
You can also target specific columns, leaving other columns to display their missing values as-is.
gt_tbl.sub_missing(columns="count", missing_text="not reported")
|
count |
rate |
revenue |
| Widget A |
150.0 |
0.003 |
4500.0 |
| Widget B |
0.0 |
0.0 |
0.0 |
| Widget C |
42.0 |
0.542 |
10000000000000.0 |
| Widget D |
not reported |
0.871 |
75.5 |
| Widget E |
3.0 |
|
|
Only the count column has its missing value replaced with our custom text. The rate and revenue columns still show their default missing representation.
Substituting Zero Values
When zeros are not meaningful in context (for example, in a column that tracks incidents or errors), you can use sub_zero() to replace them with explanatory text. The default replacement is "nil", but this is customizable through the zero_text= argument.
gt_tbl.sub_zero(columns="count", zero_text="none")
|
count |
rate |
revenue |
| Widget A |
150.0 |
0.003 |
4500.0 |
| Widget B |
none |
0.0 |
0.0 |
| Widget C |
42.0 |
0.542 |
10000000000000.0 |
| Widget D |
|
0.871 |
75.5 |
| Widget E |
3.0 |
|
|
Here the zero in the count column now reads as "none", which is clearer for the reader.
Substituting Small Values
Very small numbers can be distracting in a table, especially when they fall below a meaningful threshold. The sub_small_vals() method replaces positive values between zero and a given threshold with indicator text like "<0.01".
gt_tbl.sub_small_vals(columns="rate", threshold=0.01)
|
count |
rate |
revenue |
| Widget A |
150.0 |
<0.01 |
4500.0 |
| Widget B |
0.0 |
0.0 |
0.0 |
| Widget C |
42.0 |
0.542 |
10000000000000.0 |
| Widget D |
|
0.871 |
75.5 |
| Widget E |
3.0 |
|
|
The value 0.003 in the rate column is now displayed as "<0.01" since it falls below the threshold. All other values remain unchanged.
You can also handle negative small values by setting sign="-". This substitutes values that are between 0 and the negative of the threshold.
df_neg = pd.DataFrame({
"label": ["X", "Y", "Z"],
"change": [-0.002, -5.3, 0.7],
})
GT(df_neg, rowname_col="label").sub_small_vals(columns="change", threshold=0.01, sign="-")
|
change |
| X |
>-0.01 |
| Y |
-5.3 |
| Z |
0.7 |
The -0.002 value is replaced since its absolute value falls below the threshold. The sign="-" argument tells the method to look for small negative values rather than small positive ones.
Substituting Large Values
In some datasets, extremely large values can skew the reader’s perception of the data. The sub_large_vals() method lets you cap the displayed values at a threshold, replacing anything above it with indicator text.
gt_tbl.sub_large_vals(columns="revenue", threshold=1e10)
|
count |
rate |
revenue |
| Widget A |
150.0 |
0.003 |
4500.0 |
| Widget B |
0.0 |
0.0 |
0.0 |
| Widget C |
42.0 |
0.542 |
>=10000000000.0 |
| Widget D |
|
0.871 |
75.5 |
| Widget E |
3.0 |
|
|
The value 1e13 in the revenue column is now shown as ">=10000000000.0" rather than displaying the full number. You can customize the pattern with the large_pattern= argument.
gt_tbl.sub_large_vals(columns="revenue", threshold=1e10, large_pattern="OVER {x}")
|
count |
rate |
revenue |
| Widget A |
150.0 |
0.003 |
4500.0 |
| Widget B |
0.0 |
0.0 |
0.0 |
| Widget C |
42.0 |
0.542 |
OVER 10000000000.0 |
| Widget D |
|
0.871 |
75.5 |
| Widget E |
3.0 |
|
|
The {x} placeholder in the pattern is replaced with the threshold value, giving you full control over how the capped text reads.
General Value Substitution
For more flexible replacement logic, sub_values() provides three modes of matching: by exact value, by regex pattern, or by a custom function.
Matching by Value
You can supply a specific value (or list of values) to match against. Any cell containing that value gets replaced.
GT(df, rowname_col="item").sub_values(columns="count", values=0, replacement="zero")
|
count |
rate |
revenue |
| Widget A |
150.0 |
0.003 |
4500.0 |
| Widget B |
zero |
0.0 |
0.0 |
| Widget C |
42.0 |
0.542 |
10000000000000.0 |
| Widget D |
|
0.871 |
75.5 |
| Widget E |
3.0 |
|
|
Every cell in the count column that contains exactly 0 is replaced with the string "zero". You can also pass a list of values to match against multiple targets.
Matching by Pattern
A regex pattern can target string-based cell content for replacement.
df_text = pd.DataFrame({
"code": ["PASS-001", "FAIL-002", "PASS-003"],
"result": ["ok", "error", "ok"],
})
GT(df_text).sub_values(columns="code", pattern=r"^FAIL.*", replacement="FLAGGED")
| code |
result |
| PASS-001 |
ok |
| FLAGGED |
error |
| PASS-003 |
ok |
The regex matches any cell starting with "FAIL" and replaces the entire content with "FLAGGED". This is particularly useful for cleaning up status codes or categorized identifiers.
Matching by Function
The most flexible approach uses a custom function. The function receives a cell value and should return True for values that need to be replaced.
GT(df, rowname_col="item").sub_values(
columns="revenue",
fn=lambda x: x is not None and x > 10000,
replacement="HIGH"
)
|
count |
rate |
revenue |
| Widget A |
150.0 |
0.003 |
4500.0 |
| Widget B |
0.0 |
0.0 |
0.0 |
| Widget C |
42.0 |
0.542 |
HIGH |
| Widget D |
|
0.871 |
75.5 |
| Widget E |
3.0 |
|
|
The function evaluates each cell value individually. When it returns True, that cell is replaced with the specified text. This mode handles complex logic that cannot be expressed as a simple value match or regex pattern.
Combining Substitution Methods
You can chain multiple sub_*() calls together to handle several cases in a single table. The methods are applied in the order they are called.
(
GT(df, rowname_col="item")
.sub_missing(missing_text="N/A")
.sub_zero(columns=["count", "rate"], zero_text="none")
.sub_small_vals(columns="rate", threshold=0.01)
.sub_large_vals(columns="revenue", threshold=1e10, large_pattern=">10B")
)
|
count |
rate |
revenue |
| Widget A |
150.0 |
<0.01 |
4500.0 |
| Widget B |
none |
none |
0.0 |
| Widget C |
42.0 |
0.542 |
>10B |
| Widget D |
N/A |
0.871 |
75.5 |
| Widget E |
3.0 |
N/A |
N/A |
This combination addresses missing data, zero values, very small numbers, and very large numbers all at once, producing a clean and informative table.
The sub_*() methods work as a pre-processing step before formatting. This means you can apply fmt_*() methods to the same columns and the substituted text will remain in place for cells that were already replaced, while the formatter handles the remaining values.