Thresholds

Thresholds(self, warn_at=None, stop_at=None, notify_at=None)

Definition of threshold values.

Parameters

warn_at : int | float | bool | None = None

The threshold for the ‘warn’ level. This can be an absolute count or a fraction of the total. Using True will set this threshold to 1.

stop_at : int | float | bool | None = None

The threshold for the ‘stop’ level. This can be an absolute count or a fraction of the total. Using True will set this threshold to 1.

notify_at : int | float | bool | None = None

The threshold for the ‘notify’ level. This can be an absolute count or a fraction of the total. Using True will set this threshold to 1.

Returns

: Thresholds

A Thresholds object. This can be used when using the Validate class (to set thresholds globally) or when defining validation steps through Validate’s methods (so that threshold values are scoped to individual validation steps, overriding any global thresholds).

Examples

In a data validation workflow, you can set thresholds for the number of failing test units at different levels. For example, you can set a threshold to warn when the number of failing test units exceeds 10% of the total number of test units:

thresholds = pb.Thresholds(warn_at=0.1)

thresholds
Thresholds(warn_at=0.1, stop_at=None, notify_at=None)

You can also set thresholds for the ‘stop’ and ‘notify’ levels:

thresholds = pb.Thresholds(warn_at=0.1, stop_at=0.2, notify_at=0.05)

thresholds
Thresholds(warn_at=0.1, stop_at=0.2, notify_at=0.05)

Thresholds can also be set as absolute counts. Here’s an example where the ‘warn’ level is set to 5 failing test units:

thresholds = pb.Thresholds(warn_at=5)

thresholds
Thresholds(warn_at=5, stop_at=None, notify_at=None)

The Thresholds object can be used to set global thresholds for all validation steps. Or, you can set thresholds for individual validation steps, which will override the global thresholds. Here’s a data validation workflow example where we set global thresholds and then override with different thresholds at the col_vals_gt() step:

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table"),
        label="Example Validation",
        thresholds=pb.Thresholds(warn_at=0.1, stop_at=0.2, notify_at=0.3)
    )
    .col_vals_not_null(columns=["c", "d"])
    .col_vals_gt(columns="a", value=3, thresholds=pb.Thresholds(warn_at=5))
    .interrogate()
)

validation
Pointblank Validation
Example Validation
PolarsWARN0.1STOP0.2NOTIFY0.3
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
#FFBF00 1
col_vals_not_null
col_vals_not_null()
c 13 11
0.85
2
0.15
#4CA64C 2
col_vals_not_null
col_vals_not_null()
d 13 13
1.00
0
0.00
#FFBF00 3
col_vals_gt
col_vals_gt()
a 3 13 6
0.46
7
0.54

As can be seen, the last step (col_vals_gt()) has its own thresholds, which override the global thresholds set at the beginning of the validation workflow (in the Validate class).