Thresholds

This is a work in progress. And some of this article is just an outline for now.

Thresholds enable you to signal failure at different severity levels. In the near future, thresholds will be able to trigger custom actions. For example, when testing a column for Null/missing values with col_vals_not_null() you might want to warn on any missing values and stop where there are 10% missing values in the column.

import pointblank as pb

validation_1 = (
    pb.Validate(data=pb.load_dataset(dataset="small_table"))
    .col_vals_not_null(columns="c", thresholds=(1, 0.1))
    .interrogate()
)

validation_1
Pointblank Validation
2025-01-31|16:28:51
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
#CF142B 1
col_vals_not_null
col_vals_not_null()
c 13 11
0.85
2
0.15
2025-01-31 16:28:51 UTC< 1 s2025-01-31 16:28:51 UTC

The code uses thresholds=(1, 0.1) to set a warn threshold of 1 and a stop threshold of 0.1 (which is 10%) failing test units. Notice these pieces in the validation table:

The one final threshold, N (notify), wasn’t set so appears on the validation table as a dash.

Using the Validation(threshold=) Argument

We can also define thresholds globally. This means that every validation step will re-use the same set of threshold values.

import pointblank as pb

validation_2 = (
    pb.Validate(data=pb.load_dataset(dataset="small_table"), thresholds=(1, 0.1))
    .col_vals_not_null(columns="a")
    .col_vals_gt(columns="b", value=2)
    .interrogate()
)

validation_2

In this, both the col_vals_not_null() and col_vals_gt() steps will use the thresholds= value set in the Validate() call. Now, if you want to override these global threshold values for a given validation step, you can always use the threshold= argument when calling a validation method (the argument is present in every validation method).

Ways to Define Thresholds

Using a Tuple

The fastest way to define a threshold is to use a tuple with entries for warn, stop, and nofify levels.

# (warn, stop, notify)
threshold = (1, 2, 3)

Validate(data=..., threshold=threshold)

Note that a shorter tuple or even single values are also allowed:

  • (1, 2): warn state at 1 failing test unit, stop state at 2 failing test units
  • 1 or (1, ): warn state at 1 failing test unit

The Threshold Class

Threshold Cutoff Values

Threshold values can be specified in two ways:

  • percentage: a decimal value like 0.1 to mean 10% test units failed
  • number: a fixed number of test units failed

Threshold cutoffs are inclusive so any value of failing test units greater than or equal to the cutoff will result in triggering the threshold. So if a threshold is defined with a cutoff value of 5, then 5 failing test units will result in threshold.

Triggering Actions

This is not currently implemented.

Use Case: Stopping on any Failures

Use Case: Global tolerance bands