API Reference
Validate
When performing data validation, you’ll need the Validate
class to get the process started. It’s given the target table and you can optionally provide some metadata and/or failure thresholds (using the Thresholds
class or through shorthands for this task). The Validate
class has numerous methods for defining validation steps and for obtaining post-interrogation metrics and data.
- Validate
- Workflow for defining a set of validations on a table and interrogating for results.
- Thresholds
- Definition of threshold values.
- Actions
- Definition of action values.
- FinalActions
- Define actions to be taken after validation is complete.
- Schema
- Definition of a schema object.
- DraftValidation
- Draft a validation plan for a given table using an LLM.
Validation Steps
Validation steps can be thought of as sequential validations on the target data. We call Validate
’s validation methods to build up a validation plan: a collection of steps that, in the aggregate, provides good validation coverage.
- Validate.col_vals_gt()
- Are column data greater than a fixed value or data in another column?
- Validate.col_vals_lt()
- Are column data less than a fixed value or data in another column?
- Validate.col_vals_ge()
- Are column data greater than or equal to a fixed value or data in another column?
- Validate.col_vals_le()
- Are column data less than or equal to a fixed value or data in another column?
- Validate.col_vals_eq()
- Are column data equal to a fixed value or data in another column?
- Validate.col_vals_ne()
- Are column data not equal to a fixed value or data in another column?
- Validate.col_vals_between()
- Do column data lie between two specified values or data in other columns?
- Validate.col_vals_outside()
- Do column data lie outside of two specified values or data in other columns?
- Validate.col_vals_in_set()
- Validate whether column values are in a set of values.
- Validate.col_vals_not_in_set()
- Validate whether column values are not in a set of values.
- Validate.col_vals_null()
- Validate whether values in a column are Null.
- Validate.col_vals_not_null()
- Validate whether values in a column are not Null.
- Validate.col_vals_regex()
- Validate whether column values match a regular expression pattern.
- Validate.col_vals_expr()
- Validate column values using a custom expression.
- Validate.rows_distinct()
- Validate whether rows in the table are distinct.
- Validate.rows_complete()
- Validate whether row data are complete by having no missing values.
- Validate.col_exists()
- Validate whether one or more columns exist in the table.
- Validate.col_schema_match()
- Do columns in the table (and their types) match a predefined schema?
- Validate.row_count_match()
- Validate whether the row count of the table matches a specified count.
- Validate.col_count_match()
- Validate whether the column count of the table matches a specified count.
- Validate.conjointly()
- Perform multiple row-wise validations for joint validity.
- Validate.specially()
- Perform a specialized validation with customized logic.
Column Selection
A flexible way to select columns for validation is to use the col()
function along with column selection helper functions. A combination of col()
+ starts_with()
, matches()
, etc., allows for the selection of multiple target columns (mapping a validation across many steps). Furthermore, the col()
function can be used to declare a comparison column (e.g., for the value=
argument in many col_vals_*()
methods) when you can’t use a fixed value for comparison.
- col()
- Helper function for referencing a column in the input table.
- starts_with()
- Select columns that start with specified text.
- ends_with()
- Select columns that end with specified text.
- contains()
- Select columns that contain specified text.
- matches()
- Select columns that match a specified regular expression pattern.
- everything()
- Select all columns.
- first_n()
- Select the first
n
columns in the column list. - last_n()
- Select the last
n
columns in the column list. - expr_col()
- Create a column expression for use in
conjointly()
validation.
Interrogation and Reporting
The validation plan is put into action when interrogate()
is called. The workflow for performing a comprehensive validation is then: (1) Validate()
, (2) adding validation steps, (3) interrogate()
. After interrogation of the data, we can view a validation report table (by printing the object or using get_tabular_report()
), extract key metrics, or we can split the data based on the validation results (with get_sundered_data()
).
- Validate.interrogate()
- Execute each validation step against the table and store the results.
- Validate.get_tabular_report()
- Validation report as a GT table.
- Validate.get_step_report()
- Get a detailed report for a single validation step.
- Validate.get_json_report()
- Get a report of the validation results as a JSON-formatted string.
- Validate.get_sundered_data()
- Get the data that passed or failed the validation steps.
- Validate.get_data_extracts()
- Get the rows that failed for each validation step.
- Validate.all_passed()
- Determine if every validation step passed perfectly, with no failing test units.
- Validate.assert_passing()
- Raise an
AssertionError
if all tests are not passing. - Validate.assert_below_threshold()
- Raise an
AssertionError
if validation steps exceed a specified threshold level. - Validate.above_threshold()
- Check if any validation steps exceed a specified threshold level.
- Validate.n()
- Provides a dictionary of the number of test units for each validation step.
- Validate.n_passed()
- Provides a dictionary of the number of test units that passed for each validation step.
- Validate.n_failed()
- Provides a dictionary of the number of test units that failed for each validation step.
- Validate.f_passed()
- Provides a dictionary of the fraction of test units that passed for each validation step.
- Validate.f_failed()
- Provides a dictionary of the fraction of test units that failed for each validation step.
- Validate.warning()
- Get the ‘warning’ level status for each validation step.
- Validate.error()
- Get the ‘error’ level status for each validation step.
- Validate.critical()
- Get the ‘critical’ level status for each validation step.
Inspection and Assistance
The Inspection and Assistance group contains functions that are helpful for getting to grips on a new data table. Use the DataScan
class to get a quick overview of the data, preview()
to see the first and last few rows of a table, col_summary_tbl()
for a column-level summary of a table, and missing_vals_tbl()
to see where there are missing values in a table. Several datasets included in the package can be accessed via the load_dataset()
function. On the assistance side, the assistant()
function can be used to get help with Pointblank.
- DataScan
- Get a summary of a dataset.
- preview()
- Display a table preview that shows some rows from the top, some from the bottom.
- col_summary_tbl()
- Generate a column-level summary table of a dataset.
- missing_vals_tbl()
- Display a table that shows the missing values in the input table.
- assistant()
- Chat with the PbA (Pointblank Assistant) about your data validation needs.
- load_dataset()
- Load a dataset hosted in the library as specified table type.
Utility Functions
The Utility Functions group contains functions that are useful accessing metadata about the target data. Use get_column_count()
or get_row_count()
to get the number of columns or rows in a table. The get_action_metadata()
function is useful when building custom actions since it returns metadata about the validation step that’s triggering the action. Lastly, the config()
utility lets us set global configuration parameters.
- get_column_count()
- Get the number of columns in a table.
- get_row_count()
- Get the number of rows in a table.
- get_action_metadata()
- Access step-level metadata when authoring custom actions.
- get_validation_summary()
- Access validation summary information when authoring final actions.
- config()
- Configuration settings for the Pointblank library.
Prebuilt Actions
The Prebuilt Actions group contains a function that can be used to send a Slack notification when validation steps exceed failure threshold levels or just to provide a summary of the validation results, including the status, number of steps, passing and failing steps, table information, and timing details.
- send_slack_notification()
- Create a Slack notification function using a webhook URL.