import pointblank as pb
import polars as pl
= pl.DataFrame(
tbl
{"a": ["apple", "banana", "cherry", "date"],
"b": [1, 6, 3, 5],
}
)
pb.preview(tbl)
Validate.col_exists
=None, active=True) Validate.col_exists(columns, thresholds
Validate whether one or more columns exist in the table.
The col_exists()
method checks whether one or more columns exist in the target table. The only requirement is specification of the column names. Each validation step or expectation will operate over a single test unit, which is whether the column exists or not.
Parameters
columns : str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals
-
A single column or a list of columns to validate. Can also use
col()
with column selectors to specify one or more columns. If multiple columns are supplied or resolved, there will be a separate validation step generated for each column. thresholds : int | float | bool | tuple | dict | Thresholds = None
-
Failure threshold levels so that the validation step can react accordingly when exceeding the set levels for different states (
warn
,stop
, andnotify
). This can be created simply as an integer or float denoting the absolute number or fraction of failing test units for the ‘warn’ level. Otherwise, you can use a tuple of 1-3 values, a dictionary of 1-3 entries, or a Thresholds object. active : bool = True
-
A boolean value indicating whether the validation step should be active. Using
False
will make the validation step inactive (still reporting its presence and keeping indexes for the steps unchanged).
Returns
: Validate
-
The
Validate
object with the added validation step.
Examples
For the examples here, we’ll use a simple Polars DataFrame with a string columns (a
) and a numeric column (b
). The table is shown below:
Let’s validate that the columns a
and b
actually exist in the table. We’ll determine if this validation had any failing test units (each validation will have a single test unit).
= (
validation =tbl)
pb.Validate(data=["a", "b"])
.col_exists(columns
.interrogate()
)
validation
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
#4CA64C | 1 |
|
✓ | 1 | 1 1.00 |
0 0.00 |
— | — | — | — | |||
#4CA64C | 2 |
|
✓ | 1 | 1 1.00 |
0 0.00 |
— | — | — | — |
Printing the validation
object shows the validation table in an HTML viewing environment. The validation table shows two entries (one check per column) generated by the col_exists()
validation step. Both steps passed since both columns provided in columns=
are present in the table.
Now, let’s check for the existence of a different set of columns.
= (
validation =tbl)
pb.Validate(data=["b", "c"])
.col_exists(columns
.interrogate()
)
validation
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
#4CA64C | 1 |
|
✓ | 1 | 1 1.00 |
0 0.00 |
— | — | — | — | |||
#4CA64C66 | 2 |
|
✓ | 1 | 0 0.00 |
1 1.00 |
— | — | — | — |
The validation table reports one passing validation step (the check for column b
) and one failing validation step (the check for column c
, which doesn’t exist).