Thresholds

Thresholds are a key concept in Pointblank that allow you to define acceptable limits for failing validation tests. Rather than a simple pass/fail model, thresholds enable you to signal failure at different severity levels (‘warning’, ‘error’, and ‘critical’), giving you fine-grained control over how data quality issues are reported and handled.

When used with actions (covered in the next section), thresholds create a robust system for responding to data quality issues based on their severity. This approach allows you to:

set different tolerance levels for different types of validation checks
escalate responses based on the severity of data quality issues
configure different notification strategies for different threshold levels
create a more nuanced data validation workflow than simple pass/fail tests

A Simple Example

Let’s start with a basic example that demonstrates how thresholds work in practice:

import pointblank as pb

(
    pb.Validate(data=pb.load_dataset(dataset="small_table", tbl_type="polars"))
    .col_vals_not_null(
        columns="c",

        # Set thresholds for the validation step ---
        thresholds=pb.Thresholds(warning=1, error=0.2)
    )
    .interrogate()
)

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C
Pointblank Validation
2025-07-28\|14:07:27 Polars
#AAAAAA	1	col_vals_not_null()	c	—	✓	13	11 0.85	2 0.15	●	○	—

In this example, we’re validating that column c contains no Null values. We’ve set:

A ‘warning’ threshold of 1 (triggers when 1 or more values are Null)
An ‘error’ threshold of 0.2 (triggers when 20% or more values are Null)

Looking at the results:

the FAIL column shows that 2 test units have failed
the W column (for ‘warning’) shows a filled gray circle, indicating the warning threshold has been exceeded
the E column (for ‘error’) shows an open yellow circle, indicating the error threshold has not been exceeded
the C column (for ‘critical’) shows a dash since we didn’t set a critical threshold

Types of Threshold Values

Thresholds in Pointblank can be specified in two different ways:

Absolute Thresholds

Absolute thresholds are specified as integers and represent a fixed number of failing test units:

# Warning threshold of exactly 5 failing test units
thresholds_absolute = pb.Thresholds(warning=5)

With this configuration, the ‘warning’ threshold would be triggered if 5 or more test units fail.

Proportional Thresholds

Proportional thresholds are specified as decimals between 0 and 1, representing a percentage of the total test units:

# Error threshold of 10% of test units failing
thresholds_proportional = pb.Thresholds(error=0.1)

With this configuration, the ‘error’ threshold would be triggered if 10% or more of the test units fail.

Understanding Severity Levels

The three threshold levels in Pointblank (‘warning’, ‘error’, and ‘critical’) are inspired by traditional logging levels used in software development. These names suggest a progression of severity:

‘warning’ (level 30): indicates potential issues that don’t necessarily prevent normal operation
‘error’ (level 40): suggests more serious problems that might impact data quality
‘critical’ (level 50): represents the most severe issues that likely require immediate attention

These numerical values (30, 40, 50) are used internally by Pointblank when determining threshold hierarchy and can be accessed through the {level_num} field in action metadata (covered in the next User Guide article).

While these names imply certain severity levels, they’re ultimately just convenient labels for different thresholds. You have complete flexibility in how you use them:

you could use ‘warning’ for issues that should block a pipeline
you might configure ‘critical’ for minor issues that just need documentation
the ‘error’ level could trigger informational emails rather than actual error handling

The naming is primarily a suggestion to help organize your validation strategy. What matters most is how you configure actions for each threshold level to suit your specific data quality requirements.

Threshold Behavior

It’s important to understand a few key behaviors of thresholds:

thresholds are inclusive: a value equal to or exceeding the threshold will trigger the associated level
thresholds can be mixed: you can use absolute values for some levels and proportional for others
threshold levels are hierarchical: ‘critical’ is more severe than ‘error’, which is more severe than ‘warning’
when a test fails, all applicable threshold levels are marked in the report (though actions may only execute for the highest level by default)

Setting Global Thresholds

You can set thresholds globally for all validation steps in a workflow using the thresholds= parameter in Validate:

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),

        # Setting thresholds for all validation steps ---
        thresholds=pb.Thresholds(warning=1, error=0.1)
    )
    .col_vals_not_null(columns="a")
    .col_vals_gt(columns="a", value=2)
    .interrogate()
)

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C	EXT
Pointblank Validation
2025-07-28\|14:07:27 PolarsWARNING1ERROR0.1CRITICAL—
#4CA64C	1	col_vals_not_null()	a	—	✓	13	13 1.00	0 0.00	○	○	—	—
#EBBC14	2	col_vals_gt()	a	2	✓	13	9 0.69	4 0.31	●	●	—

With this approach, the same thresholds are applied to every validation step in the workflow.

Overriding Thresholds for Specific Steps

You can override global thresholds for specific validation steps by providing the thresholds= parameter in individual validation methods:

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),

        # Setting global thresholds ---
        thresholds=pb.Thresholds(warning=1, error=0.1)
    )
    .col_vals_not_null(columns="a")
    .col_vals_gt(
        columns="a", value=2,

        # Step-specific threshold that overrides global ---
        thresholds=pb.Thresholds(warning=3)
    )
    .interrogate()
)

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C	EXT
Pointblank Validation
2025-07-28\|14:07:27 PolarsWARNING1ERROR0.1CRITICAL—
#4CA64C	1	col_vals_not_null()	a	—	✓	13	13 1.00	0 0.00	○	○	—	—
#AAAAAA	2	col_vals_gt()	a	2	✓	13	9 0.69	4 0.31	●	—	—

In this example, the second validation step uses its own ‘warning’ threshold of 3, overriding the global setting of 1.

Ways to Define Thresholds

Pointblank offers multiple ways to define thresholds to accommodate different coding styles and requirements.

1. Using the `Thresholds` Class (Recommended)

The most explicit and flexible approach is using the Thresholds class:

# Set individual thresholds for different levels
thresholds_all_levels = pb.Thresholds(warning=0.05, error=0.1, critical=0.25)

# Set only specific levels
thresholds_error_only = pb.Thresholds(error=0.15)

This approach allows you to:

set any combination of threshold levels
use descriptive parameter names for clarity
skip levels you don’t need to set

2. Using a Tuple

For concise code, you can use a tuple where positions represent ‘warning’, ‘error’, and ‘critical’ levels in that order:

# (warning, error, critical)
thresholds_tuple = (1, 0.1, 0.25)

# Shorter tuples are also allowed
thresholds_tuple_warning = (3,)            # Only the 'warning' threshold
thresholds_tuple_warning_error = (3, 0.2)  # Both 'warning' and 'error' thresholds

While concise, this approach requires you to start with the ‘warning’ level and add levels in order.

3. Using a Dictionary

You can also use a dictionary with keys that match the threshold level names:

# Can use any combination of threshold levels
thresholds_dict = {"warning": 1, "critical": 0.15}

The dictionary must use the exact keys "warning", "error", and/or "critical".

4. Using a Single Value

The simplest approach is using a single numeric value, which sets just the ‘warning’ threshold:

# Sets 'warning' threshold to `5`
thresholds_single = 5

This is equivalent to pb.Thresholds(warning=5).

Thresholds and Validation Steps

Let’s look at a more complete validation workflow that demonstrates different threshold configurations:

# Create a validation workflow with global and step-specific thresholds
(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),

        # Global thresholds applied to all steps unless overridden ---
        thresholds=pb.Thresholds(warning=0.05, error=0.1, critical=0.2)
    )

    # Step 1: Uses global thresholds ---
    .col_vals_not_null(columns="b")

    # Step 2: Overrides with step-specific thresholds ---
    .col_vals_gt(
        columns="a", value=2,
        thresholds=pb.Thresholds(warning=1, critical=0.3) # No 'error' threshold
    )

    # Step 3: Uses a simplified tuple notation ---
    .col_vals_not_null(columns="c", thresholds=(2, 0.15))

    .interrogate()
)

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C	EXT
Pointblank Validation
2025-07-28\|14:07:27 PolarsWARNING0.05ERROR0.1CRITICAL0.2
#4CA64C	1	col_vals_not_null()	b	—	✓	13	13 1.00	0 0.00	○	○	○	—
#FF3300	2	col_vals_gt()	a	2	✓	13	9 0.69	4 0.31	●	—	●
#EBBC14	3	col_vals_not_null()	c	—	✓	13	11 0.85	2 0.15	●	●	—

Thresholds and Actions

While thresholds by themselves provide visual indicators of validation severity in reports, their real power emerges when combined with Actions. The Actions system (covered in the next article) allows you to specify what happens when a threshold is exceeded.

For example, you might configure:

A ‘warning’ threshold that logs a message
An ‘error’ threshold that sends an email notification
A ‘critical’ threshold that blocks a data pipeline

Here’s a simple preview of how thresholds and actions work together:

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),

        # Define thresholds for all three severity levels ---
        thresholds=pb.Thresholds(warning=1, error=2, critical=3),

        # Define actions for different threshold levels ---
        actions=pb.Actions(
            warning="Warning: {step} has {FAIL} failing values",
            error="ERROR: Step {step} exceeded the 'error' threshold",
            critical="CRITICAL: Data quality issue in column {col}"
        )
    )
    .col_vals_not_null(columns="c")
    .interrogate()
)

ERROR: Step 1 exceeded the 'error' threshold

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C
Pointblank Validation
2025-07-28\|14:07:27 PolarsWARNING1ERROR2CRITICAL3
#EBBC14	1	col_vals_not_null()	c	—	✓	13	11 0.85	2 0.15	●	●	○

Conclusion

Thresholds are a powerful feature that transform Pointblank from a simple validation tool into a sophisticated data quality monitoring system. By setting appropriate thresholds, you can:

Define different severity levels for data quality issues
Customize tolerance levels for different types of validation checks
Create a more nuanced approach to data validation than binary pass/fail
Enable targeted actions based on the severity of issues detected

In the next article, we’ll explore the Actions system in depth, showing you how to define automatic responses when thresholds are exceeded.