import pointblank as pb
import polars as pl
= pl.DataFrame(
tbl
{"a": ["apple", "banana", "cherry", "date"],
"b": [1, 6, 3, 5],
"c": [1.1, 2.2, 3.3, 4.4],
}
)
pb.preview(tbl)
Validate.col_schema_match
Validate.col_schema_match(
schema,=True,
complete=True,
in_order=True,
case_sensitive_colnames=True,
case_sensitive_dtypes=True,
full_match_dtypes=None,
pre=None,
thresholds=True,
active )
Do columns in the table (and their types) match a predefined schema?
The col_schema_match()
method works in conjunction with an object generated by the Schema
class. That class object is the expectation for the actual schema of the target table. The validation step operates over a single test unit, which is whether the schema matches that of the table (within the constraints enforced by the complete=
, and in_order=
options).
Parameters
schema : Schema
-
A
Schema
object that represents the expected schema of the table. This object is generated by theSchema
class. complete : bool = True
-
Should the schema match be complete? If
True
, then the target table must have all columns specified in the schema. IfFalse
, then the table can have additional columns not in the schema (i.e., the schema is a subset of the target table’s columns). in_order : bool = True
-
Should the schema match be in order? If
True
, then the columns in the schema must appear in the same order as they do in the target table. IfFalse
, then the order of columns in the schema and the target table can differ. case_sensitive_colnames : bool = True
-
Should the schema match be case-sensitive with regard to column names? If
True
, then the column names in the schema and the target table must match exactly. IfFalse
, then the column names are compared in a case-insensitive manner. case_sensitive_dtypes : bool = True
-
Should the schema match be case-sensitive with regard to column data types? If
True
, then the column data types in the schema and the target table must match exactly. IfFalse
, then the column data types are compared in a case-insensitive manner. full_match_dtypes : bool = True
-
Should the schema match require a full match of data types? If
True
, then the column data types in the schema and the target table must match exactly. IfFalse
then substring matches are allowed, so a schema data type ofInt
would match a target table data type ofInt64
. pre : Callable | None = None
-
A pre-processing function or lambda to apply to the data table for the validation step.
thresholds : int | float | bool | tuple | dict | Thresholds = None
-
Failure threshold levels so that the validation step can react accordingly when exceeding the set levels for different states (
warn
,stop
, andnotify
). This can be created simply as an integer or float denoting the absolute number or fraction of failing test units for the ‘warn’ level. Otherwise, you can use a tuple of 1-3 values, a dictionary of 1-3 entries, or aThresholds
object. active : bool = True
-
A boolean value indicating whether the validation step should be active. Using
False
will make the validation step inactive (still reporting its presence and keeping indexes for the steps unchanged).
Returns
: Validate
-
The
Validate
object with the added validation step.
Examples
For the examples here, we’ll use a simple Polars DataFrame with three columns (string, integer, and float). The table is shown below:
Let’s validate that the columns in the table match a predefined schema. A schema can be defined using the Schema
class.
= pb.Schema(
schema =[("a", "String"), ("b", "Int64"), ("c", "Float64")]
columns )
You can print the schema object to verify that the expected schema is as intended.
print(schema)
Pointblank Schema
a: String
b: Int64
c: Float64
Now, we’ll use the col_schema_match()
method to validate the table against the expected schema
object. There is a single test unit for this validation step (whether the schema matches the table or not).
= (
validation =tbl)
pb.Validate(data=schema)
.col_schema_match(schema
.interrogate()
)
validation
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | S | N | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
#4CA64C | 1 |
|
✓ | 1 | 1 1.00 |
0 0.00 |
— | — | — | — |
The validation table shows that the schema matches the table. The single test unit passed since the table columns and their types match the schema.