import pointblank as pb
= (
validation_1 =pb.load_dataset(dataset="small_table"))
pb.Validate(data="c", thresholds=(1, 0.1))
.col_vals_not_null(columns
.interrogate()
)
validation_1
Thresholds
Thresholds enable you to signal failure at different severity levels. In the near future, thresholds will be able to trigger custom actions. For example, when testing a column for Null/missing values with col_vals_not_null()
you might want to warn on any missing values and stop where there are 10% missing values in the column.
The code uses thresholds=(1, 0.1)
to set a warn
threshold of 1
and a stop
threshold of 0.1
(which is 10%) failing test units. Notice these pieces in the validation table:
- The
FAIL
column shows that 2 tests units have failed - The
W
column (short forwarn
) shows a filled yellow circle indicating it’s reached threshold - The
S
column (stop
) shows an open red circle indicating it’s below threshold
The one final threshold, N
(notify
), wasn’t set so appears on the validation table as a dash.
Using the Validation(threshold=)
Argument
We can also define thresholds globally. This means that every validation step will re-use the same set of threshold values.
import pointblank as pb
= (
validation_2 =pb.load_dataset(dataset="small_table"), thresholds=(1, 0.1))
pb.Validate(data="a")
.col_vals_not_null(columns="b", value=2)
.col_vals_gt(columns
.interrogate()
)
validation_2
In this, both the col_vals_not_null()
and col_vals_gt()
steps will use the thresholds=
value set in the Validate()
call. Now, if you want to override these global threshold values for a given validation step, you can always use the threshold=
argument when calling a validation method (the argument is present in every validation method).
Ways to Define Thresholds
Using a Tuple
The fastest way to define a threshold is to use a tuple with entries for warn
, stop
, and nofify
levels.
# (warn, stop, notify)
= (1, 2, 3)
threshold
=..., threshold=threshold) Validate(data
Note that a shorter tuple or even single values are also allowed:
(1, 2)
:warn
state at 1 failing test unit,stop
state at 2 failing test units1
or(1, )
:warn
state at 1 failing test unit
The Threshold
Class
Threshold Cutoff Values
Threshold values can be specified in two ways:
- percentage: a decimal value like 0.1 to mean 10% test units failed
- number: a fixed number of test units failed
Threshold cutoffs are inclusive so any value of failing test units greater than or equal to the cutoff will result in triggering the threshold. So if a threshold is defined with a cutoff value of 5
, then 5 failing test units will result in threshold.
Triggering Actions
This is not currently implemented.