import pointblank as pb
import polars as pl
= pl.DataFrame(
tbl
{"a": [5, 6, 5, 7, 5, 5],
"b": [2, 3, 6, 4, 3, 6],
"c": [9, 8, 8, 9, 9, 7],
}
)
pb.preview(tbl)
Validate.col_vals_outside
Validate.col_vals_outside(
columns,
left,
right,=(True, True),
inclusive=False,
na_pass=None,
pre=None,
thresholds=True,
active )
Do column data lie outside of two specified values or data in other columns?
The col_vals_between()
validation method checks whether column values in a table do not fall within a certain range. The range is specified with three arguments: left=
, right=
, and inclusive=
. The left=
and right=
values specify the lower and upper bounds. These bounds can be specified as literal values or as column names provided within col()
. The validation will operate over the number of test units that is equal to the number of rows in the table (determined after any pre=
mutation has been applied).
Parameters
columns : str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals
-
A single column or a list of columns to validate. Can also use
col()
with column selectors to specify one or more columns. If multiple columns are supplied or resolved, there will be a separate validation step generated for each column. left : float | int | Column
-
The lower bound of the range. This can be a single numeric value or a single column name given in
col()
. The latter option allows for a column-column comparison for the lower bound. right : float | int | Column
-
The upper bound of the range. This can be a single numeric value or a single column name given in
col()
. The latter option allows for a column-column comparison for the upper bound. inclusive : tuple[bool, bool] = (True, True)
-
A tuple of two boolean values indicating whether the comparison should be inclusive. The position of the boolean values correspond to the
left=
andright=
values, respectively. By default, both values areTrue
. na_pass : bool = False
-
Should any encountered None, NA, or Null values be considered as passing test units? By default, this is
False
. Set toTrue
to pass test units with missing values. pre : Callable | None = None
-
A pre-processing function or lambda to apply to the data table for the validation step.
thresholds : int | float | bool | tuple | dict | Thresholds = None
-
Failure threshold levels so that the validation step can react accordingly when exceeding the set levels for different states (
warn
,stop
, andnotify
). This can be created simply as an integer or float denoting the absolute number or fraction of failing test units for the ‘warn’ level. Otherwise, you can use a tuple of 1-3 values, a dictionary of 1-3 entries, or a Thresholds object. active : bool = True
-
A boolean value indicating whether the validation step should be active. Using
False
will make the validation step inactive (still reporting its presence and keeping indexes for the steps unchanged).
Returns
: Validate
-
The
Validate
object with the added validation step.
Examples
For the examples here, we’ll use a simple Polars DataFrame with three numeric columns (a
, b
, and c
). The table is shown below:
Let’s validate that values in column a
are all outside the fixed boundary values of 1
and 4
. We’ll determine if this validation had any failing test units (there are six test units, one for each row).
= (
validation =tbl)
pb.Validate(data="a", left=1, right=4)
.col_vals_outside(columns
.interrogate()
)
validation
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
#4CA64C | 1 |
|
✓ | 6 | 6 1.00 |
0 0.00 |
— | — | — | — |
Printing the validation
object shows the validation table in an HTML viewing environment. The validation table shows the single entry that corresponds to the validation step created by using col_vals_outside()
. All test units passed, and there are no failing test units.
Aside from checking a column against two literal values representing the lower and upper bounds, we can also provide column names to the left=
and/or right=
arguments (by using the helper function col()
). In this way, we can perform three additional comparison types:
left=column
,right=column
left=literal
,right=column
left=column
,right=literal
For the next example, we’ll use col_vals_outside()
to check whether the values in column b
are outside of the range formed by the corresponding values in columns a
(lower bound) and c
(upper bound).
= (
validation =tbl)
pb.Validate(data="b", left=pb.col("a"), right=pb.col("c"))
.col_vals_outside(columns
.interrogate()
)
validation
The validation table reports two failing test units. The specific failing cases are:
- Row 2:
b
is6
and the bounds are5
(a
) and8
(c
). - Row 5:
b
is6
and the bounds are5
(a
) and7
(c
).