has_columns()function

Check whether one or more columns exist in a table.

USAGE

has_columns(*columns)

This function returns a callable that, when given a table, checks whether all specified columns are present. It is primarily designed for use with the active= parameter of validation methods. When a validation step has active=has_columns("col_a", "col_b"), the step will be skipped (made inactive) if either col_a or col_b is missing from the target table.

The callable is evaluated against the original table before any pre= processing is applied. This means the column check is performed on the raw input data, not on a pre-processed version of it.

A note is attached to any skipped step in the validation report explaining which columns were not found.

Parameters

*columns : str | list[str] = ()

One or more column names to check for in the table. Each argument can be a string or a list of strings. All specified columns must be present for the callable to return True.

Returns

Callable[[Any], bool]

A callable that accepts a table and returns True if every column in columns exists in the table, False otherwise.

Raises

: ValueError

If no column names are provided.

: TypeError

If any of the provided column names is not a string or list of strings.

Examples


Using has_columns() with the active= parameter to conditionally run a validation step:

import pointblank as pb
import polars as pl

tbl = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})

validation = (
    pb.Validate(data=tbl)
    .col_vals_gt(columns="a", value=0, active=pb.has_columns("a"))
    .col_vals_gt(columns="a", value=0, active=pb.has_columns("z"))
    .interrogate()
)

validation
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_gt
col_vals_gt()
a 0 3 3
1.00
0
0.00
#4CA64C66 2
col_vals_gt
col_vals_gt()
a 0

Notes

Step 2 (active_check) Step skipped — Column check failed: missing column(s) z.

The first step ran because column a exists. The second step was skipped because column z is missing, and the report note explains which column was not found.

When checking for multiple columns, the step is only active when all columns are present:

validation = (
    pb.Validate(data=tbl)
    .col_vals_gt(columns="a", value=0, active=pb.has_columns("a", "b"))
    .col_vals_gt(columns="a", value=0, active=pb.has_columns("a", "x", "y"))
    .interrogate()
)

validation
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_gt
col_vals_gt()
a 0 3 3
1.00
0
0.00
#4CA64C66 2
col_vals_gt
col_vals_gt()
a 0

Notes

Step 2 (active_check) Step skipped — Column check failed: missing column(s) x, y.

The first step is active because both a and b exist. The second step is skipped because x and y are missing.

Column names can also be provided as a list:

validation = (
    pb.Validate(data=tbl)
    .col_vals_gt(columns="a", value=0, active=pb.has_columns(["a", "b"]))
    .interrogate()
)

validation
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_gt
col_vals_gt()
a 0 3 3
1.00
0
0.00