Overview

This article provides a quick overview of the data validation features in Pointblank. It introduces the key concepts and shows examples of the main functionality, giving you a foundation for using the library effectively.

Later articles in the User Guide will expand on each section covered here, providing more explanations and examples.

Validation Methods

Pointblank’s core functionality revolves around validation steps, which are individual checks that verify different aspects of your data. These steps are created by calling validation methods from the Validate class. When combined they create a comprehensive validation plan for your data.

Here’s an example of a validation that incorporates three different validation methods:

import pointblank as pb
import polars as pl

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Three different validation methods."
    )
    .col_vals_gt(columns="a", value=0)
    .rows_distinct()
    .col_exists(columns="date")
    .interrogate()
)

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C	EXT
Pointblank Validation
Three different validation methods. Polars
#4CA64C	1	col_vals_gt()	a	0	✓	13	13 1.00	0 0.00	—	—	—	—
#4CA64C66	2	rows_distinct()	ALL COLUMNS	—	✓	13	11 0.85	2 0.15	—	—	—
#4CA64C	3	col_exists()	date	—	✓	1	1 1.00	0 0.00	—	—	—	—

This example showcases how you can combine different types of validations in a single validation plan:

a column value validation with col_vals_gt()
a row-based validation with rows_distinct()
a table structure validation with col_exists()

Most validation methods share common parameters that enhance their flexibility and power. These shared parameters (overviewed in the next few sections) create a consistent interface across all validation steps while allowing you to customize validation behavior for specific needs.

Column Selection Patterns

You can apply the same validation logic to multiple columns at once through use of column selection patterns (used in the columns= parameter). This reduces repetitive code and makes your validation plans more maintainable:

import narwhals.selectors as nws

# Map validations across multiple columns
(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Applying column mapping in `columns`."
    )

    # Apply validation rules to multiple columns ---
    .col_vals_not_null(
        columns=["a", "b", "c"]
    )

    # Apply to numeric columns only with a Narwhals selector ---
    .col_vals_gt(
        columns=nws.numeric(),
        value=0
    )
    .interrogate()
)

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C	EXT
Pointblank Validation
Applying column mapping in `columns`. Polars
#4CA64C	1	col_vals_not_null()	a	—	✓	13	13 1.00	0 0.00	—	—	—	—
#4CA64C	2	col_vals_not_null()	b	—	✓	13	13 1.00	0 0.00	—	—	—	—
#4CA64C66	3	col_vals_not_null()	c	—	✓	13	11 0.85	2 0.15	—	—	—
#4CA64C	4	col_vals_gt()	a	0	✓	13	13 1.00	0 0.00	—	—	—	—
#4CA64C66	5	col_vals_gt()	c	0	✓	13	11 0.85	2 0.15	—	—	—
#4CA64C	6	col_vals_gt()	d	0	✓	13	13 1.00	0 0.00	—	—	—	—

This technique is particularly valuable when working with wide datasets containing many similarly-structured columns or when applying standard quality checks across an entire table. It also ensures consistency in how validation rules are applied across related data columns.

Preprocessing

Preprocessing (with the pre= parameter) allows you to transform or modify your data before applying validation checks, enabling you to validate derived or modified data without altering the original dataset:

import polars as pl

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Preprocessing validation steps via `pre=`."
    )
    .col_vals_gt(
        columns="a", value=5,

        # Apply transformation before validation ---
        pre=lambda df: df.with_columns(
            pl.col("a") * 2  # Double values before checking
        )
    )
    .col_vals_lt(
        columns="c", value=100,

        # Apply more complex transformation ---
        pre=lambda df: df.with_columns(
            pl.col("c").pow(2)  # Square values before checking
        )
    )
    .interrogate()
)

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C
Pointblank Validation
Preprocessing validation steps via `pre=`. Polars
#4CA64C66	1	col_vals_gt()	a	5	✓	13	9 0.69	4 0.31	—	—	—
#4CA64C66	2	col_vals_lt()	c	100	✓	13	11 0.85	2 0.15	—	—	—

Preprocessing enables validation of transformed data without modifying your original dataset, making it ideal for checking derived metrics, or validating normalized values. This approach keeps your validation code clean while allowing for sophisticated data quality checks on calculated results.

Segmentation

Segmentation (through the segments= parameter) allows you to validate data across different groups, enabling you to identify segment-specific quality issues that might be hidden in aggregate analyses:

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Segmenting validation steps via `segments=`."
    )
    .col_vals_gt(
        columns="c", value=3,

        # Split into steps by categorical values in column 'f' ---
        segments="f"
    )
    .interrogate()
)

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C
Pointblank Validation
Segmenting validation steps via `segments=`. Polars
#4CA64C66	1	SEGMENT f / high col_vals_gt()	c	3	✓	6	2 0.33	4 0.67	—	—	—
#4CA64C66	2	SEGMENT f / low col_vals_gt()	c	3	✓	5	4 0.80	1 0.20	—	—	—
#4CA64C66	3	SEGMENT f / mid col_vals_gt()	c	3	✓	2	1 0.50	1 0.50	—	—	—

Segmentation is powerful for detecting patterns of quality issues that may exist only in specific data subsets, such as certain time periods, categories, or geographical regions. It helps ensure that all significant segments of your data meet quality standards, not just the data as a whole.

Thresholds

Thresholds (set through the thresholds= parameter) let you set acceptable levels of failure before triggering warnings, errors, or critical notifications for individual validation steps:

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Using thresholds."
    )

    # Add validation steps with different thresholds ---
    .col_vals_gt(
        columns="a", value=1,
        thresholds=pb.Thresholds(warning=0.1, error=0.2, critical=0.3)
    )

    # Add another step with stricter thresholds ---
    .col_vals_lt(
        columns="c", value=10,
        thresholds=pb.Thresholds(warning=0.05, error=0.1)
    )
    .interrogate()
)

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C
Pointblank Validation
Using thresholds. Polars
#4CA64C66	1	col_vals_gt()	a	1	✓	13	12 0.92	1 0.08	○	○	○
#EBBC14	2	col_vals_lt()	c	10	✓	13	11 0.85	2 0.15	●	●	—

Thresholds provide a nuanced way to monitor data quality, allowing you to set different severity levels based on the importance of each validation and your organization’s tolerance for specific types of data issues.

Actions

Actions (which can be configured in the actions= parameter) allow you to define specific responses when validation thresholds are crossed. You can use simple string messages or custom functions for more complex behavior:

# Example 1: Action with a string message ---

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Using actions with a string message."
    )
    .col_vals_gt(
        columns="c", value=2,
        thresholds=pb.Thresholds(warning=0.1, error=0.2),

        # Add a print-to-console action for the 'warning' threshold ---
        actions=pb.Actions(
            warning="WARNING: Values below `{value}` detected in column 'c'."
        )
    )
    .interrogate()
)

WARNING: Values below `2` detected in column 'c'.

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C
Pointblank Validation
Using actions with a string message. Polars
#EBBC14	1	col_vals_gt()	c	2	✓	13	10 0.77	3 0.23	●	●	—

# Example 2: Action with a callable function ---

def custom_action():
    from datetime import datetime
    print(f"Data quality issue found ({datetime.now()}).")

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Using actions with a callable function."
    )
    .col_vals_gt(
        columns="a", value=5,
        thresholds=pb.Thresholds(warning=0.1, error=0.2),

        # Apply the function to the 'error' threshold ---
        actions=pb.Actions(error=custom_action)
    )
    .interrogate()
)

Data quality issue found (2025-05-19 17:22:23.695340).

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C
Pointblank Validation
Using actions with a callable function. Polars
#EBBC14	1	col_vals_gt()	a	5	✓	13	3 0.23	10 0.77	●	●	—

With custom action functions, you can implement sophisticated responses like sending notifications or logging to external systems.

Briefs

Briefs (which can be set through the brief= parameter) allow you to customize descriptions associated with validation steps, making validation results more understandable to stakeholders. Briefs can be either automatically generated by setting brief=True or defined as custom messages for more specific explanations:

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Using `brief=` for displaying brief messages."
    )
    .col_vals_gt(
        columns="a", value=0,

        # Use `True` for automatic generation of briefs ---
        brief=True
    )
    .col_exists(
        columns=["date", "date_time"],

        # Add a custom brief for this validation step ---
        brief="Verify required date columns exist for time-series analysis"
    )
    .interrogate()
)

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C	EXT
Pointblank Validation
Using `brief=` for displaying brief messages. Polars
#4CA64C	1	col_vals_gt() Expect that values in `a` should be > `0`.	a	0	✓	13	13 1.00	0 0.00	—	—	—	—
#4CA64C	2	col_exists() Verify required date columns exist for time-series analysis	date	—	✓	1	1 1.00	0 0.00	—	—	—	—
#4CA64C	3	col_exists() Verify required date columns exist for time-series analysis	date_time	—	✓	1	1 1.00	0 0.00	—	—	—	—

Briefs make validation results more meaningful by providing context about why each check matters. They’re particularly valuable in shared reports where stakeholders from various disciplines need to understand validation results in domain-specific terms.

Getting More Information

Each validation step can be further customized and has additional options. See these pages for more information:

Validation Methods: A closer look at the more common validation methods
Column Selection Patterns: Techniques for targeting specific columns
Preprocessing: Transform data before validation
Segmentation: Apply validations to specific segments of your data
Thresholds: Set quality standards and trigger severity levels
Actions: Respond to threshold exceedances with notifications or custom functions
Briefs: Add context to validation steps

Conclusion

Validation steps are the building blocks of data validation in Pointblank. By combining steps from different categories and leveraging common features like thresholds, actions, and preprocessing, you can create comprehensive data quality checks tailored to your specific needs.

The next sections of this guide will dive deeper into each of these topics, providing detailed explanations and examples.