Thresholds

Thresholds are a key concept in Pointblank that allow you to define acceptable limits for failing validation tests. Rather than a simple pass/fail model, thresholds enable you to signal failure at different severity levels (‘warning’, ‘error’, and ‘critical’), giving you fine-grained control over how data quality issues are reported and handled.

When used with Actions (covered in the next section), thresholds create a robust system for responding to data quality issues based on their severity. This approach allows you to:

A Simple Example

Let’s start with a basic example that demonstrates how thresholds work in practice:

import pointblank as pb

(
    pb.Validate(data=pb.load_dataset(dataset="small_table", tbl_type="polars"))
    .col_vals_not_null(
        columns="c",

        # Set thresholds for the validation step ---
        thresholds=pb.Thresholds(warning=1, error=0.2)
    )
    .interrogate()
)
Pointblank Validation
2025-05-19|17:22:10
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#AAAAAA 1
col_vals_not_null
col_vals_not_null()
c 13 11
0.85
2
0.15

In this example, we’re validating that column c contains no Null values. We’ve set:

  • A ‘warning’ threshold of 1 (triggers when 1 or more values are Null)
  • An ‘error’ threshold of 0.2 (triggers when 20% or more values are Null)

Looking at the results:

  • the FAIL column shows that 2 test units have failed
  • the W column (for ‘warning’) shows a filled gray circle, indicating the warning threshold has been exceeded
  • the E column (for ‘error’) shows an open yellow circle, indicating the error threshold has not been exceeded
  • the C column (for ‘critical’) shows a dash since we didn’t set a critical threshold

Types of Threshold Values

Thresholds in Pointblank can be specified in two different ways:

Absolute Thresholds

Absolute thresholds are specified as integers and represent a fixed number of failing test units:

# Warning threshold of exactly 5 failing test units
thresholds_absolute = pb.Thresholds(warning=5)

With this configuration, the ‘warning’ threshold would be triggered if 5 or more test units fail.

Proportional Thresholds

Proportional thresholds are specified as decimals between 0 and 1, representing a percentage of the total test units:

# Error threshold of 10% of test units failing
thresholds_proportional = pb.Thresholds(error=0.1)

With this configuration, the ‘error’ threshold would be triggered if 10% or more of the test units fail.

Understanding Severity Levels

The three threshold levels in Pointblank (‘warning’, ‘error’, and ‘critical’) are inspired by traditional logging levels used in software development. These names suggest a progression of severity:

  • ‘warning’ (level 30): indicates potential issues that don’t necessarily prevent normal operation
  • ‘error’ (level 40): suggests more serious problems that might impact data quality
  • ‘critical’ (level 50): represents the most severe issues that likely require immediate attention

These numerical values (30, 40, 50) are used internally by Pointblank when determining threshold hierarchy and can be accessed through the {level_num} field in action metadata (covered in the next User Guide article).

While these names imply certain severity levels, they’re ultimately just convenient labels for different thresholds. You have complete flexibility in how you use them:

  • you could use ‘warning’ for issues that should block a pipeline
  • you might configure ‘critical’ for minor issues that just need documentation
  • the ‘error’ level could trigger informational emails rather than actual error handling

The naming is primarily a suggestion to help organize your validation strategy. What matters most is how you configure actions for each threshold level to suit your specific data quality requirements.

Threshold Behavior

It’s important to understand a few key behaviors of thresholds:

  • thresholds are inclusive: a value equal to or exceeding the threshold will trigger the associated level
  • thresholds can be mixed: you can use absolute values for some levels and proportional for others
  • threshold levels are hierarchical: ‘critical’ is more severe than ‘error’, which is more severe than ‘warning’
  • when a test fails, all applicable threshold levels are marked in the report (though actions may only execute for the highest level by default)

Setting Global Thresholds

You can set thresholds globally for all validation steps in a workflow using the thresholds= parameter in Validate:

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),

        # Setting thresholds for all validation steps ---
        thresholds=pb.Thresholds(warning=1, error=0.1)
    )
    .col_vals_not_null(columns="a")
    .col_vals_gt(columns="a", value=2)
    .interrogate()
)
Pointblank Validation
2025-05-19|17:22:10
PolarsWARNING1ERROR0.1CRITICAL
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_not_null
col_vals_not_null()
a 13 13
1.00
0
0.00
#EBBC14 2
col_vals_gt
col_vals_gt()
a 2 13 9
0.69
4
0.31

With this approach, the same thresholds are applied to every validation step in the workflow.

Overriding Thresholds for Specific Steps

You can override global thresholds for specific validation steps by providing the thresholds= parameter in individual validation methods:

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),

        # Setting global thresholds ---
        thresholds=pb.Thresholds(warning=1, error=0.1)
    )
    .col_vals_not_null(columns="a")
    .col_vals_gt(
        columns="a", value=2,

        # Step-specific threshold that overrides global ---
        thresholds=pb.Thresholds(warning=3)
    )
    .interrogate()
)
Pointblank Validation
2025-05-19|17:22:10
PolarsWARNING1ERROR0.1CRITICAL
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_not_null
col_vals_not_null()
a 13 13
1.00
0
0.00
#AAAAAA 2
col_vals_gt
col_vals_gt()
a 2 13 9
0.69
4
0.31

In this example, the second validation step uses its own ‘warning’ threshold of 3, overriding the global setting of 1.

Ways to Define Thresholds

Pointblank offers multiple ways to define thresholds to accommodate different coding styles and requirements.

2. Using a Tuple

For concise code, you can use a tuple where positions represent ‘warning’, ‘error’, and ‘critical’ levels in that order:

# (warning, error, critical)
thresholds_tuple = (1, 0.1, 0.25)

# Shorter tuples are also allowed
thresholds_tuple_warning = (3,)            # Only the 'warning' threshold
thresholds_tuple_warning_error = (3, 0.2)  # Both 'warning' and 'error' thresholds

While concise, this approach requires you to start with the ‘warning’ level and add levels in order.

3. Using a Dictionary

You can also use a dictionary with keys that match the threshold level names:

# Can use any combination of threshold levels
thresholds_dict = {"warning": 1, "critical": 0.15}

The dictionary must use the exact keys "warning", "error", and/or "critical".

4. Using a Single Value

The simplest approach is using a single numeric value, which sets just the ‘warning’ threshold:

# Sets 'warning' threshold to `5`
thresholds_single = 5

This is equivalent to pb.Thresholds(warning=5).

Thresholds and Validation Steps

Let’s look at a more complete validation workflow that demonstrates different threshold configurations:

# Create a validation workflow with global and step-specific thresholds
(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),

        # Global thresholds applied to all steps unless overridden ---
        thresholds=pb.Thresholds(warning=0.05, error=0.1, critical=0.2)
    )

    # Step 1: Uses global thresholds ---
    .col_vals_not_null(columns="b")

    # Step 2: Overrides with step-specific thresholds ---
    .col_vals_gt(
        columns="a", value=2,
        thresholds=pb.Thresholds(warning=1, critical=0.3) # No 'error' threshold
    )

    # Step 3: Uses a simplified tuple notation ---
    .col_vals_not_null(columns="c", thresholds=(2, 0.15))

    .interrogate()
)
Pointblank Validation
2025-05-19|17:22:10
PolarsWARNING0.05ERROR0.1CRITICAL0.2
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_not_null
col_vals_not_null()
b 13 13
1.00
0
0.00
#FF3300 2
col_vals_gt
col_vals_gt()
a 2 13 9
0.69
4
0.31
#EBBC14 3
col_vals_not_null
col_vals_not_null()
c 13 11
0.85
2
0.15

Thresholds and Actions

While thresholds by themselves provide visual indicators of validation severity in reports, their real power emerges when combined with Actions. The Actions system (covered in the next article) allows you to specify what happens when a threshold is exceeded.

For example, you might configure:

  • A ‘warning’ threshold that logs a message
  • An ‘error’ threshold that sends an email notification
  • A ‘critical’ threshold that blocks a data pipeline

Here’s a simple preview of how thresholds and actions work together:

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),

        # Define thresholds for all three severity levels ---
        thresholds=pb.Thresholds(warning=1, error=2, critical=3),

        # Define actions for different threshold levels ---
        actions=pb.Actions(
            warning="Warning: {step} has {FAIL} failing values",
            error="ERROR: Step {step} exceeded the 'error' threshold",
            critical="CRITICAL: Data quality issue in column {col}"
        )
    )
    .col_vals_not_null(columns="c")
    .interrogate()
)
ERROR: Step 1 exceeded the 'error' threshold
Pointblank Validation
2025-05-19|17:22:10
PolarsWARNING1ERROR2CRITICAL3
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#EBBC14 1
col_vals_not_null
col_vals_not_null()
c 13 11
0.85
2
0.15

Conclusion

Thresholds are a powerful feature that transform Pointblank from a simple validation tool into a sophisticated data quality monitoring system. By setting appropriate thresholds, you can:

  1. Define different severity levels for data quality issues
  2. Customize tolerance levels for different types of validation checks
  3. Create a more nuanced approach to data validation than binary pass/fail
  4. Enable targeted actions based on the severity of issues detected

In the next article, we’ll explore the Actions system in depth, showing you how to define automatic responses when thresholds are exceeded.