Thresholds

Thresholds enable you to signal failure at different severity levels. They also allow for the triggering of custom actions, a topic which is covered in the next section. For instance you might be testing a column for null/missing values. When doing so, you’d want to know when there are at least 10% missing values in the column. Alternatively, it could be the case that even a single missing value is critical to your work. Threshold settings in Pointblank give you the flexibility to devise data-failure signaling to whatever tolerances are important to you.

Let’s start with the basics though. Here’s an example of a simple validation where threshold values are set in the col_vals_not_null() validation step (this type of validation expects that there are no null/missing values in a particular column):

import pointblank as pb

validation_1 = (
    pb.Validate(data=pb.load_dataset(dataset="small_table"))
    .col_vals_not_null(
        columns="c", thresholds=(1, 0.2)
    )
    .interrogate()
)

validation_1
Pointblank Validation
2025-03-07|19:49:40
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#AAAAAA 1
col_vals_not_null
col_vals_not_null()
c 13 11
0.85
2
0.15

The code uses thresholds=(1, 0.2) to set a ‘warning’ threshold of 1 and an ‘error’ threshold of 0.2 (which is 20%) failing test units. You might notice the following in the validation table:

The one final threshold, C (‘critical’), wasn’t set so appears on the validation table as a dash.

Two Types of Threshold Values: Proportional and Absolute

Threshold values can be specified in two ways:

  • proportional: a decimal value like 0.1 is taken to mean 10% of all test units failed
  • absolute: a whole number represents a fixed number of test units failed

Threshold values act as cutoffs and are inclusive. So, any value of failing test units greater than or equal to the threshold value will result in exceeding the threshold. So if a threshold is defined with a value of 5, then 5 failing test units will result in an exceedence.

Using the Validation(thresholds=) Argument

We can also define thresholds globally. This means that every validation step will re-use the same set of threshold values.

import pointblank as pb

validation_2 = (
    pb.Validate(data=pb.load_dataset(dataset="small_table"), thresholds=(1, 0.1))
    .col_vals_not_null(columns="a")
    .col_vals_gt(columns="b", value=2)
    .interrogate()
)

validation_2

In this, both the col_vals_not_null() and col_vals_gt() steps will use the thresholds= value set in the Validate call. Now, if you want to override these global threshold values for a given validation step, you can always use the threshold= argument when calling a validation method (this argument is present within every validation method).

Ways to Define Thresholds

There are more than a few ways to set threshold levels. We provide this flexibility because it’s often useful to have simple shorthand methods for such a common task.

Using a Tuple or a Single Value

The fastest way to define a threshold is to use a tuple with positional entries for the ‘warning’, ‘error’, and ‘critical’ levels.

thresholds_tuple = (1, 0.1, 0.25) # (warning, error, critical)

Note that a shorter tuple is also allowed:

  • (1, ): ‘warning’ state at 1 failing test unit
  • (1, 0.1): ‘warning’ state at 1 failing test unit, error state at 10% failing test units

You can even use a scalar value (float between 0 and 1 or an integer). That single value represents the threshold value for the ‘warning’ level:

thresholds_single = 1

While using a tuple or a scalar can be very succinct, a problem that arises is that the ordering of values always begins at the ‘warning’ level. This means you cannot define a threshold level for just the ‘error’ level, for example. This is fine for many cases, however, there are other ways to express thresholds without these constraints.

Using the Thresholds Class

Using the Thresholds class lets you define the threshold levels using the warning=, error=, and critical= arguments. And unlike the method of setting thresholds with a tuple, any of the threshold levels can be unset. Here’s an example where you might want to set the ‘error’ and ‘critical’ levels (leaving the ‘warning’ level unset):

thresholds_class = pb.Thresholds(error=0.3, critical=0.5)

Using a Dictionary to Set Thresholds

A specially-crafted dictionary is acceptable as input to any thresholds= argument. You need to ensure that the keys are named using either "warning", "error", or "critical". Any combination of keys is fine, but be careful to use only the aforementioned names (otherwise, you’ll receive a ValueError). Here’s an example that sets the ‘warning’ and ‘critical’ levels:

thresholds_dict = {"warning": 1, "critical": 0.1}