import pointblank as pb
import polars as pl
= pl.DataFrame(
tbl
{"a": ["apple", "banana", "cherry", "date"],
"b": [1, 6, 3, 5],
"c": [1.1, 2.2, 3.3, 4.4],
}
)
pb.preview(tbl)
Validate.col_schema_match
Validate.col_schema_match(
schema,=True,
complete=True,
in_order=True,
case_sensitive_colnames=True,
case_sensitive_dtypes=True,
full_match_dtypes=None,
pre=None,
thresholds=None,
actions=None,
brief=True,
active )
Do columns in the table (and their types) match a predefined schema?
The col_schema_match()
method works in conjunction with an object generated by the Schema
class. That class object is the expectation for the actual schema of the target table. The validation step operates over a single test unit, which is whether the schema matches that of the table (within the constraints enforced by the complete=
, and in_order=
options).
Parameters
schema : Schema
-
A
Schema
object that represents the expected schema of the table. This object is generated by theSchema
class. complete :
bool
= True-
Should the schema match be complete? If
True
, then the target table must have all columns specified in the schema. IfFalse
, then the table can have additional columns not in the schema (i.e., the schema is a subset of the target table’s columns). in_order :
bool
= True-
Should the schema match be in order? If
True
, then the columns in the schema must appear in the same order as they do in the target table. IfFalse
, then the order of columns in the schema and the target table can differ. case_sensitive_colnames :
bool
= True-
Should the schema match be case-sensitive with regard to column names? If
True
, then the column names in the schema and the target table must match exactly. IfFalse
, then the column names are compared in a case-insensitive manner. case_sensitive_dtypes :
bool
= True-
Should the schema match be case-sensitive with regard to column data types? If
True
, then the column data types in the schema and the target table must match exactly. IfFalse
, then the column data types are compared in a case-insensitive manner. full_match_dtypes :
bool
= True-
Should the schema match require a full match of data types? If
True
, then the column data types in the schema and the target table must match exactly. IfFalse
then substring matches are allowed, so a schema data type ofInt
would match a target table data type ofInt64
. pre :
Callable
| None = None-
A optional preprocessing function or lambda to apply to the data table during interrogation.
thresholds :
int
|float
|bool
|tuple
|dict
| Thresholds = None-
Optional failure threshold levels for the validation step(s), so that the interrogation can react accordingly when exceeding the set levels for different states (‘warning’, ‘error’, and ‘critical’). This can be created using the
Thresholds
class or more simply as (1) an integer or float denoting the absolute number or fraction of failing test units for the ‘warn’ level, (2) a tuple of 1-3 values, or (3) a dictionary of 1-3 entries. actions : Actions | None = None
-
Optional actions to take when the validation step meets or exceeds any set threshold levels. If provided, the
Actions
class should be used to define the actions. brief :
str
| None = None-
An optional brief description of the validation step. The templating elements
"{col}"
and"{step}"
can be used to insert the column name and step number, respectively. active :
bool
= True-
A boolean value indicating whether the validation step should be active. Using
False
will make the validation step inactive (still reporting its presence and keeping indexes for the steps unchanged).
Returns
: Validate
-
The
Validate
object with the added validation step.
Examples
For the examples here, we’ll use a simple Polars DataFrame with three columns (string, integer, and float). The table is shown below:
Let’s validate that the columns in the table match a predefined schema. A schema can be defined using the Schema
class.
= pb.Schema(
schema =[("a", "String"), ("b", "Int64"), ("c", "Float64")]
columns )
You can print the schema object to verify that the expected schema is as intended.
print(schema)
Pointblank Schema
a: String
b: Int64
c: Float64
Now, we’ll use the col_schema_match()
method to validate the table against the expected schema
object. There is a single test unit for this validation step (whether the schema matches the table or not).
= (
validation =tbl)
pb.Validate(data=schema)
.col_schema_match(schema
.interrogate()
)
validation
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | E | C | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
#4CA64C | 1 |
|
✓ | 1 | 1 1.00 |
0 0.00 |
— | — | — | — |
The validation table shows that the schema matches the table. The single test unit passed since the table columns and their types match the schema.