API Reference

Validate

When performing data validation, you’ll need the Validate class to get the process started. It’s given the target table and you can optionally provide some metadata and/or failure thresholds (using the Thresholds class or through shorthands for this task). The Validate class has numerous methods for defining validation steps and for obtaining post-interrogation metrics and data.

Validate
Workflow for defining a set of validations on a table and interrogating for results.
Thresholds
Definition of threshold values.
Actions
Definition of action values.
FinalActions
Define actions to be taken after validation is complete.
Schema
Definition of a schema object.
DraftValidation
Draft a validation plan for a given table using an LLM.

Validation Steps

Validation steps can be thought of as sequential validations on the target data. We call Validate’s validation methods to build up a validation plan: a collection of steps that, in the aggregate, provides good validation coverage.

Validate.col_vals_gt()
Are column data greater than a fixed value or data in another column?
Validate.col_vals_lt()
Are column data less than a fixed value or data in another column?
Validate.col_vals_ge()
Are column data greater than or equal to a fixed value or data in another column?
Validate.col_vals_le()
Are column data less than or equal to a fixed value or data in another column?
Validate.col_vals_eq()
Are column data equal to a fixed value or data in another column?
Validate.col_vals_ne()
Are column data not equal to a fixed value or data in another column?
Validate.col_vals_between()
Do column data lie between two specified values or data in other columns?
Validate.col_vals_outside()
Do column data lie outside of two specified values or data in other columns?
Validate.col_vals_in_set()
Validate whether column values are in a set of values.
Validate.col_vals_not_in_set()
Validate whether column values are not in a set of values.
Validate.col_vals_null()
Validate whether values in a column are Null.
Validate.col_vals_not_null()
Validate whether values in a column are not Null.
Validate.col_vals_regex()
Validate whether column values match a regular expression pattern.
Validate.col_vals_expr()
Validate column values using a custom expression.
Validate.rows_distinct()
Validate whether rows in the table are distinct.
Validate.rows_complete()
Validate whether row data are complete by having no missing values.
Validate.col_exists()
Validate whether one or more columns exist in the table.
Validate.col_schema_match()
Do columns in the table (and their types) match a predefined schema?
Validate.row_count_match()
Validate whether the row count of the table matches a specified count.
Validate.col_count_match()
Validate whether the column count of the table matches a specified count.
Validate.conjointly()
Perform multiple row-wise validations for joint validity.
Validate.specially()
Perform a specialized validation with customized logic.

Column Selection

A flexible way to select columns for validation is to use the col() function along with column selection helper functions. A combination of col() + starts_with(), matches(), etc., allows for the selection of multiple target columns (mapping a validation across many steps). Furthermore, the col() function can be used to declare a comparison column (e.g., for the value= argument in many col_vals_*() methods) when you can’t use a fixed value for comparison.

col()
Helper function for referencing a column in the input table.
starts_with()
Select columns that start with specified text.
ends_with()
Select columns that end with specified text.
contains()
Select columns that contain specified text.
matches()
Select columns that match a specified regular expression pattern.
everything()
Select all columns.
first_n()
Select the first n columns in the column list.
last_n()
Select the last n columns in the column list.
expr_col()
Create a column expression for use in conjointly() validation.

Interrogation and Reporting

The validation plan is put into action when interrogate() is called. The workflow for performing a comprehensive validation is then: (1) Validate(), (2) adding validation steps, (3) interrogate(). After interrogation of the data, we can view a validation report table (by printing the object or using get_tabular_report()), extract key metrics, or we can split the data based on the validation results (with get_sundered_data()).

Validate.interrogate()
Execute each validation step against the table and store the results.
Validate.get_tabular_report()
Validation report as a GT table.
Validate.get_step_report()
Get a detailed report for a single validation step.
Validate.get_json_report()
Get a report of the validation results as a JSON-formatted string.
Validate.get_sundered_data()
Get the data that passed or failed the validation steps.
Validate.get_data_extracts()
Get the rows that failed for each validation step.
Validate.all_passed()
Determine if every validation step passed perfectly, with no failing test units.
Validate.assert_passing()
Raise an AssertionError if all tests are not passing.
Validate.assert_below_threshold()
Raise an AssertionError if validation steps exceed a specified threshold level.
Validate.above_threshold()
Check if any validation steps exceed a specified threshold level.
Validate.n()
Provides a dictionary of the number of test units for each validation step.
Validate.n_passed()
Provides a dictionary of the number of test units that passed for each validation step.
Validate.n_failed()
Provides a dictionary of the number of test units that failed for each validation step.
Validate.f_passed()
Provides a dictionary of the fraction of test units that passed for each validation step.
Validate.f_failed()
Provides a dictionary of the fraction of test units that failed for each validation step.
Validate.warning()
Get the ‘warning’ level status for each validation step.
Validate.error()
Get the ‘error’ level status for each validation step.
Validate.critical()
Get the ‘critical’ level status for each validation step.

Inspection and Assistance

The Inspection and Assistance group contains functions that are helpful for getting to grips on a new data table. Use the DataScan class to get a quick overview of the data, preview() to see the first and last few rows of a table, col_summary_tbl() for a column-level summary of a table, and missing_vals_tbl() to see where there are missing values in a table. Several datasets included in the package can be accessed via the load_dataset() function. On the assistance side, the assistant() function can be used to get help with Pointblank.

DataScan
Get a summary of a dataset.
preview()
Display a table preview that shows some rows from the top, some from the bottom.
col_summary_tbl()
Generate a column-level summary table of a dataset.
missing_vals_tbl()
Display a table that shows the missing values in the input table.
assistant()
Chat with the PbA (Pointblank Assistant) about your data validation needs.
load_dataset()
Load a dataset hosted in the library as specified table type.

Utility Functions

The Utility Functions group contains functions that are useful accessing metadata about the target data. Use get_column_count() or get_row_count() to get the number of columns or rows in a table. The get_action_metadata() function is useful when building custom actions since it returns metadata about the validation step that’s triggering the action. Lastly, the config() utility lets us set global configuration parameters.

get_column_count()
Get the number of columns in a table.
get_row_count()
Get the number of rows in a table.
get_action_metadata()
Access step-level metadata when authoring custom actions.
get_validation_summary()
Access validation summary information when authoring final actions.
config()
Configuration settings for the Pointblank library.

Prebuilt Actions

The Prebuilt Actions group contains a function that can be used to send a Slack notification when validation steps exceed failure threshold levels or just to provide a summary of the validation results, including the status, number of steps, passing and failing steps, table information, and timing details.

send_slack_notification()
Create a Slack notification function using a webhook URL.