Overview

This article provides a quick overview of the data validation features in Pointblank. It introduces the key concepts and shows examples of the main functionality, giving you a foundation for using the library effectively.

Later articles in the User Guide will expand on each section covered here, providing more explanations and examples.

Validation Methods

Pointblank’s core functionality revolves around validation steps, which are individual checks that verify different aspects of your data. These steps are created by calling validation methods from the Validate class. When combined they create a comprehensive validation plan for your data.

Here’s an example of a validation that incorporates three different validation methods:

import pointblank as pb
import polars as pl

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Three different validation methods."
    )
    .col_vals_gt(columns="a", value=0)
    .rows_distinct()
    .col_exists(columns="date")
    .interrogate()
)
Pointblank Validation
Three different validation methods.
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_gt
col_vals_gt()
a 0 13 13
1.00
0
0.00
#4CA64C66 2
rows_distinct
rows_distinct()
ALL COLUMNS 13 11
0.85
2
0.15
#4CA64C 3
col_exists
col_exists()
date 1 1
1.00
0
0.00

This example showcases how you can combine different types of validations in a single validation plan:

Most validation methods share common parameters that enhance their flexibility and power. These shared parameters (overviewed in the next few sections) create a consistent interface across all validation steps while allowing you to customize validation behavior for specific needs.

Column Selection Patterns

You can apply the same validation logic to multiple columns at once through use of column selection patterns (used in the columns= parameter). This reduces repetitive code and makes your validation plans more maintainable:

import narwhals.selectors as nws

# Map validations across multiple columns
(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Applying column mapping in `columns`."
    )

    # Apply validation rules to multiple columns ---
    .col_vals_not_null(
        columns=["a", "b", "c"]
    )

    # Apply to numeric columns only with a Narwhals selector ---
    .col_vals_gt(
        columns=nws.numeric(),
        value=0
    )
    .interrogate()
)
Pointblank Validation
Applying column mapping in `columns`.
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_not_null
col_vals_not_null()
a 13 13
1.00
0
0.00
#4CA64C 2
col_vals_not_null
col_vals_not_null()
b 13 13
1.00
0
0.00
#4CA64C66 3
col_vals_not_null
col_vals_not_null()
c 13 11
0.85
2
0.15
#4CA64C 4
col_vals_gt
col_vals_gt()
a 0 13 13
1.00
0
0.00
#4CA64C66 5
col_vals_gt
col_vals_gt()
c 0 13 11
0.85
2
0.15
#4CA64C 6
col_vals_gt
col_vals_gt()
d 0 13 13
1.00
0
0.00

This technique is particularly valuable when working with wide datasets containing many similarly-structured columns or when applying standard quality checks across an entire table. It also ensures consistency in how validation rules are applied across related data columns.

Preprocessing

Preprocessing (with the pre= parameter) allows you to transform or modify your data before applying validation checks, enabling you to validate derived or modified data without altering the original dataset:

import polars as pl

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Preprocessing validation steps via `pre=`."
    )
    .col_vals_gt(
        columns="a", value=5,

        # Apply transformation before validation ---
        pre=lambda df: df.with_columns(
            pl.col("a") * 2  # Double values before checking
        )
    )
    .col_vals_lt(
        columns="c", value=100,

        # Apply more complex transformation ---
        pre=lambda df: df.with_columns(
            pl.col("c").pow(2)  # Square values before checking
        )
    )
    .interrogate()
)
Pointblank Validation
Preprocessing validation steps via `pre=`.
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C66 1
col_vals_gt
col_vals_gt()
a 5 13 9
0.69
4
0.31
#4CA64C66 2
col_vals_lt
col_vals_lt()
c 100 13 11
0.85
2
0.15

Preprocessing enables validation of transformed data without modifying your original dataset, making it ideal for checking derived metrics, or validating normalized values. This approach keeps your validation code clean while allowing for sophisticated data quality checks on calculated results.

Segmentation

Segmentation (through the segments= parameter) allows you to validate data across different groups, enabling you to identify segment-specific quality issues that might be hidden in aggregate analyses:

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Segmenting validation steps via `segments=`."
    )
    .col_vals_gt(
        columns="c", value=3,

        # Split into steps by categorical values in column 'f' ---
        segments="f"
    )
    .interrogate()
)
Pointblank Validation
Segmenting validation steps via `segments=`.
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C66 1
SEGMENT  f / high
col_vals_gt
col_vals_gt()
c 3 6 2
0.33
4
0.67
#4CA64C66 2
SEGMENT  f / low
col_vals_gt
col_vals_gt()
c 3 5 4
0.80
1
0.20
#4CA64C66 3
SEGMENT  f / mid
col_vals_gt
col_vals_gt()
c 3 2 1
0.50
1
0.50

Segmentation is powerful for detecting patterns of quality issues that may exist only in specific data subsets, such as certain time periods, categories, or geographical regions. It helps ensure that all significant segments of your data meet quality standards, not just the data as a whole.

Thresholds

Thresholds (set through the thresholds= parameter) let you set acceptable levels of failure before triggering warnings, errors, or critical notifications for individual validation steps:

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Using thresholds."
    )

    # Add validation steps with different thresholds ---
    .col_vals_gt(
        columns="a", value=1,
        thresholds=pb.Thresholds(warning=0.1, error=0.2, critical=0.3)
    )

    # Add another step with stricter thresholds ---
    .col_vals_lt(
        columns="c", value=10,
        thresholds=pb.Thresholds(warning=0.05, error=0.1)
    )
    .interrogate()
)
Pointblank Validation
Using thresholds.
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C66 1
col_vals_gt
col_vals_gt()
a 1 13 12
0.92
1
0.08
#EBBC14 2
col_vals_lt
col_vals_lt()
c 10 13 11
0.85
2
0.15

Thresholds provide a nuanced way to monitor data quality, allowing you to set different severity levels based on the importance of each validation and your organization’s tolerance for specific types of data issues.

Actions

Actions (which can be configured in the actions= parameter) allow you to define specific responses when validation thresholds are crossed. You can use simple string messages or custom functions for more complex behavior:

# Example 1: Action with a string message ---

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Using actions with a string message."
    )
    .col_vals_gt(
        columns="c", value=2,
        thresholds=pb.Thresholds(warning=0.1, error=0.2),

        # Add a print-to-console action for the 'warning' threshold ---
        actions=pb.Actions(
            warning="WARNING: Values below `{value}` detected in column 'c'."
        )
    )
    .interrogate()
)
WARNING: Values below `2` detected in column 'c'.
Pointblank Validation
Using actions with a string message.
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#EBBC14 1
col_vals_gt
col_vals_gt()
c 2 13 10
0.77
3
0.23
# Example 2: Action with a callable function ---

def custom_action():
    from datetime import datetime
    print(f"Data quality issue found ({datetime.now()}).")

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Using actions with a callable function."
    )
    .col_vals_gt(
        columns="a", value=5,
        thresholds=pb.Thresholds(warning=0.1, error=0.2),

        # Apply the function to the 'error' threshold ---
        actions=pb.Actions(error=custom_action)
    )
    .interrogate()
)
Data quality issue found (2025-05-19 17:22:23.695340).
Pointblank Validation
Using actions with a callable function.
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#EBBC14 1
col_vals_gt
col_vals_gt()
a 5 13 3
0.23
10
0.77

With custom action functions, you can implement sophisticated responses like sending notifications or logging to external systems.

Briefs

Briefs (which can be set through the brief= parameter) allow you to customize descriptions associated with validation steps, making validation results more understandable to stakeholders. Briefs can be either automatically generated by setting brief=True or defined as custom messages for more specific explanations:

(
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        label="Using `brief=` for displaying brief messages."
    )
    .col_vals_gt(
        columns="a", value=0,

        # Use `True` for automatic generation of briefs ---
        brief=True
    )
    .col_exists(
        columns=["date", "date_time"],

        # Add a custom brief for this validation step ---
        brief="Verify required date columns exist for time-series analysis"
    )
    .interrogate()
)
Pointblank Validation
Using `brief=` for displaying brief messages.
Polars
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_gt
col_vals_gt()

Expect that values in a should be > 0.

a 0 13 13
1.00
0
0.00
#4CA64C 2
col_exists
col_exists()

Verify required date columns exist for time-series analysis

date 1 1
1.00
0
0.00
#4CA64C 3
col_exists
col_exists()

Verify required date columns exist for time-series analysis

date_time 1 1
1.00
0
0.00

Briefs make validation results more meaningful by providing context about why each check matters. They’re particularly valuable in shared reports where stakeholders from various disciplines need to understand validation results in domain-specific terms.

Getting More Information

Each validation step can be further customized and has additional options. See these pages for more information:

Conclusion

Validation steps are the building blocks of data validation in Pointblank. By combining steps from different categories and leveraging common features like thresholds, actions, and preprocessing, you can create comprehensive data quality checks tailored to your specific needs.

The next sections of this guide will dive deeper into each of these topics, providing detailed explanations and examples.