API Reference
Validate
When performing data validation, use the Validate class to get the process started. It takes the target table and options for metadata and failure thresholds (using the Thresholds class or shorthands). The Validate class has numerous methods for defining validation steps and for obtaining post-interrogation metrics and data.
- Validate
-
Workflow for defining a set of validations on a table and interrogating for results.
- Thresholds
-
Definition of threshold values.
- Actions
-
Definition of action values.
- FinalActions
-
Define actions to be taken after validation is complete.
- Schema
-
Definition of a schema object.
- DraftValidation
-
Draft a validation plan for a given table using an LLM.
Validation Steps
Validation steps are sequential validations on the target data. Call Validate’s validation methods to build up a validation plan: a collection of steps that provides good validation coverage.
- Validate.col_vals_gt()
-
Are column data greater than a fixed value or data in another column?
- Validate.col_vals_lt()
-
Are column data less than a fixed value or data in another column?
- Validate.col_vals_ge()
-
Are column data greater than or equal to a fixed value or data in another column?
- Validate.col_vals_le()
-
Are column data less than or equal to a fixed value or data in another column?
- Validate.col_vals_eq()
-
Are column data equal to a fixed value or data in another column?
- Validate.col_vals_ne()
-
Are column data not equal to a fixed value or data in another column?
- Validate.col_vals_between()
-
Do column data lie between two specified values or data in other columns?
- Validate.col_vals_outside()
-
Do column data lie outside of two specified values or data in other columns?
- Validate.col_vals_in_set()
-
Validate whether column values are in a set of values.
- Validate.col_vals_not_in_set()
-
Validate whether column values are not in a set of values.
- Validate.col_vals_increasing()
-
Are column data increasing by row?
- Validate.col_vals_decreasing()
-
Are column data decreasing by row?
- Validate.col_vals_null()
-
Validate whether values in a column are Null.
- Validate.col_vals_not_null()
-
Validate whether values in a column are not Null.
- Validate.col_vals_regex()
-
Validate whether column values match a regular expression pattern.
- Validate.col_vals_within_spec()
-
Validate whether column values fit within a specification.
- Validate.col_vals_expr()
-
Validate column values using a custom expression.
- Validate.col_exists()
-
Validate whether one or more columns exist in the table.
- Validate.col_pct_null()
-
Validate whether a column has a specific percentage of Null values.
- Validate.rows_distinct()
-
Validate whether rows in the table are distinct.
- Validate.rows_complete()
-
Validate whether row data are complete by having no missing values.
- Validate.col_schema_match()
-
Do columns in the table (and their types) match a predefined schema?
- Validate.row_count_match()
-
Validate whether the row count of the table matches a specified count.
- Validate.col_count_match()
-
Validate whether the column count of the table matches a specified count.
- Validate.data_freshness()
-
Validate that data in a datetime column is not older than a specified maximum age.
- Validate.tbl_match()
-
Validate whether the target table matches a comparison table.
- Validate.conjointly()
-
Perform multiple row-wise validations for joint validity.
- Validate.specially()
-
Perform a specialized validation with customized logic.
- Validate.prompt()
-
Validate rows using AI/LLM-powered analysis.
Aggregation Steps
These validation methods check aggregated column values (sums, averages, standard deviations) against fixed values or column references.
- Validate.col_sum_gt()
-
Does the column sum satisfy a greater than comparison?
- Validate.col_sum_lt()
-
Does the column sum satisfy a less than comparison?
- Validate.col_sum_ge()
-
Does the column sum satisfy a greater than or equal to comparison?
- Validate.col_sum_le()
-
Does the column sum satisfy a less than or equal to comparison?
- Validate.col_sum_eq()
-
Does the column sum satisfy an equal to comparison?
- Validate.col_avg_gt()
-
Does the column average satisfy a greater than comparison?
- Validate.col_avg_lt()
-
Does the column average satisfy a less than comparison?
- Validate.col_avg_ge()
-
Does the column average satisfy a greater than or equal to comparison?
- Validate.col_avg_le()
-
Does the column average satisfy a less than or equal to comparison?
- Validate.col_avg_eq()
-
Does the column average satisfy an equal to comparison?
- Validate.col_sd_gt()
-
Does the column standard deviation satisfy a greater than comparison?
- Validate.col_sd_lt()
-
Does the column standard deviation satisfy a less than comparison?
- Validate.col_sd_ge()
-
Does the column standard deviation satisfy a greater than or equal to comparison?
- Validate.col_sd_le()
-
Does the column standard deviation satisfy a less than or equal to comparison?
- Validate.col_sd_eq()
-
Does the column standard deviation satisfy an equal to comparison?
Column Selection
Use the col() function along with column selection helpers to flexibly select columns for validation. Combine col() with starts_with(), matches(), etc. for selecting multiple target columns.
- col()
-
Helper function for referencing a column in the input table.
- starts_with()
-
Select columns that start with specified text.
- ends_with()
-
Select columns that end with specified text.
- contains()
-
Select columns that contain specified text.
- matches()
-
Select columns that match a specified regular expression pattern.
- everything()
-
Select all columns.
- first_n()
-
Select the first
ncolumns in the column list. - last_n()
-
Select the last
ncolumns in the column list. - expr_col()
-
Create a column expression for use in
conjointly()validation.
Segment Groups
Combine multiple values into a single segment using seg_*() helper functions.
- seg_group()
-
Group together values for segmentation.
Interrogation and Reporting
The validation plan is executed when interrogate() is called. After interrogation, view validation reports, extract metrics, or split data based on results.
- Validate.interrogate()
-
Execute each validation step against the table and store the results.
- Validate.set_tbl()
-
Set or replace the table associated with the Validate object.
- Validate.get_tabular_report()
-
Validation report as a GT table.
- Validate.get_step_report()
-
Get a detailed report for a single validation step.
- Validate.get_json_report()
-
Get a report of the validation results as a JSON-formatted string.
- Validate.get_dataframe_report()
-
Get a report of the validation results as a DataFrame.
- Validate.get_sundered_data()
-
Get the data that passed or failed the validation steps.
- Validate.get_data_extracts()
-
Get the rows that failed for each validation step.
- Validate.all_passed()
-
Determine if every validation step passed perfectly, with no failing test units.
- Validate.assert_passing()
-
Raise an
AssertionErrorif all tests are not passing. - Validate.assert_below_threshold()
-
Raise an
AssertionErrorif validation steps exceed a specified threshold level. - Validate.above_threshold()
-
Check if any validation steps exceed a specified threshold level.
- Validate.n()
-
Provides a dictionary of the number of test units for each validation step.
- Validate.n_passed()
-
Provides a dictionary of the number of test units that passed for each validation step.
- Validate.n_failed()
-
Provides a dictionary of the number of test units that failed for each validation step.
- Validate.f_passed()
-
Provides a dictionary of the fraction of test units that passed for each validation step.
- Validate.f_failed()
-
Provides a dictionary of the fraction of test units that failed for each validation step.
- Validate.warning()
-
Get the ‘warning’ level status for each validation step.
- Validate.error()
-
Get the ‘error’ level status for each validation step.
- Validate.critical()
-
Get the ‘critical’ level status for each validation step.
Inspection and Assistance
Functions for getting to grips with a new data table. Use DataScan for a quick overview, preview() for first/last rows, col_summary_tbl() for column summaries, and missing_vals_tbl() for missing value analysis.
- DataScan
-
Get a summary of a dataset.
- preview()
-
Display a table preview that shows some rows from the top, some from the bottom.
- col_summary_tbl()
-
Generate a column-level summary table of a dataset.
- missing_vals_tbl()
-
Display a table that shows the missing values in the input table.
- load_dataset()
-
Load a dataset hosted in the library as specified table type.
- get_data_path()
-
Get the file path to a dataset included with the Pointblank package.
- connect_to_table()
-
Connect to a database table using a connection string.
- print_database_tables()
-
List all tables in a database from a connection string.
Table Pre-checks
Helper functions for use with the active= parameter of validation methods. These inspect the target table before a step runs and conditionally skip the step when preconditions are not met.
- has_columns()
-
Check whether one or more columns exist in a table.
- has_rows()
-
Check whether a table has a certain number of rows.
YAML
Functions for using YAML to orchestrate validation workflows.
- yaml_interrogate()
-
Execute a YAML-based validation workflow.
- validate_yaml()
-
Validate YAML configuration against the expected structure.
- yaml_to_python()
-
Convert YAML validation configuration to equivalent Python code.
Utility Functions
Functions for accessing metadata about the target data and managing configuration.
- get_column_count()
-
Get the number of columns in a table.
- get_row_count()
-
Get the number of rows in a table.
- get_action_metadata()
-
Access step-level metadata when authoring custom actions.
- get_validation_summary()
-
Access validation summary information when authoring final actions.
- write_file()
-
Write a Validate object to disk as a serialized file.
- read_file()
-
Read a Validate object from disk that was previously saved with
write_file(). - ref()
-
Reference a column from the reference data for aggregate comparisons.
Test Data Generation
Generate synthetic test data based on schema definitions. Use generate_dataset() to create data from a Schema object.
- generate_dataset()
-
Generate synthetic test data from a schema.
- int_field()
-
Create an integer column specification for use in a schema.
- float_field()
-
Create a floating-point column specification for use in a schema.
- string_field()
-
Create a string column specification for use in a schema.
- bool_field()
-
Create a boolean column specification for use in a schema.
- date_field()
-
Create a date column specification for use in a schema.
- datetime_field()
-
Create a datetime column specification for use in a schema.
- time_field()
-
Create a time column specification for use in a schema.
- duration_field()
-
Create a duration column specification for use in a schema.
- profile_fields()
-
Create a dict of string field specifications representing a person profile.
Prebuilt Actions
Prebuilt action functions for common notification patterns.
- send_slack_notification()
-
Create a Slack notification function using a webhook URL.
- emit_otel()
-
Create an OTel export action for use in
FinalActions.
Integrations
Classes for integrating Pointblank with external observability and monitoring systems. Use OTelExporter to export validation results as OpenTelemetry metrics, traces, and logs.
- integrations.otel.OTelExporter
-
Export Pointblank validation results as OpenTelemetry signals.