Data validation toolkit for assessing and monitoring data quality.

Pointblank is a data validation framework for Python that makes data quality checks beautiful, powerful, and stakeholder-friendly. Instead of cryptic error messages, get stunning interactive reports that turn data issues into conversations.

Here’s what a validation looks like (click “Show the code” to see how it’s done):

Show the code
import pointblank as pb
import polars as pl

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="game_revenue", tbl_type="polars"),
        tbl_name="game_revenue",
        label="Comprehensive validation of game revenue data",
        thresholds=pb.Thresholds(warning=0.10, error=0.25, critical=0.35),
        brief=True
    )
    .col_vals_regex(columns="player_id", pattern=r"^[A-Z]{12}[0-9]{3}$")        # STEP 1
    .col_vals_gt(columns="session_duration", value=20)                          # STEP 2
    .col_vals_ge(columns="item_revenue", value=0.20)                            # STEP 3
    .col_vals_in_set(columns="item_type", set=["iap", "ad"])                    # STEP 4
    .col_vals_in_set(                                                           # STEP 5
        columns="acquisition",
        set=["google", "facebook", "organic", "crosspromo", "other_campaign"]
    )
    .col_vals_not_in_set(columns="country", set=["Mongolia", "Germany"])        # STEP 6
    .col_vals_between(                                                          # STEP 7
        columns="session_duration",
        left=10, right=50,
        pre = lambda df: df.select(pl.median("session_duration")),
        brief="Expect that the median of `session_duration` should be between `10` and `50`."
    )
    .rows_distinct(columns_subset=["player_id", "session_id", "time"])          # STEP 8
    .row_count_match(count=2000)                                                # STEP 9
    .col_count_match(count=11)                                                  # STEP 10
    .col_vals_not_null(columns="item_type")                                     # STEP 11
    .col_exists(columns="start_day")                                            # STEP 12
    .interrogate()
)

validation.get_tabular_report(title="Game Revenue Validation Report")

Game Revenue Validation Report

Comprehensive validation of game revenue data
Polarsgame_revenueWARNING0.1ERROR0.25CRITICAL0.35
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_regex
col_vals_regex()

Expect that values in player_id should match the regular expression: ^[A-Z]{12}[0-9]{3}$.

player_id ^[A-Z]{12}[0-9]{3}$ 2000 2000
1.00
0
0.00
#EBBC14 2
col_vals_gt
col_vals_gt()

Expect that values in session_duration should be > 20.

session_duration 20 2000 1418
0.71
582
0.29
#FF3300 3
col_vals_ge
col_vals_ge()

Expect that values in item_revenue should be >= 0.2.

item_revenue 0.2 2000 1192
0.60
808
0.40
#4CA64C 4
col_vals_in_set
col_vals_in_set()

Expect that values in item_type should be in the set of iap, ad.

item_type iap, ad 2000 2000
1.00
0
0.00
#4CA64C66 5
col_vals_in_set
col_vals_in_set()

Expect that values in acquisition should be in the set of google, facebook, organic, and 2 more.

acquisition google, facebook, organic, crosspromo, other_campaign 2000 1975
0.99
25
0.01
#AAAAAA 6
col_vals_not_in_set
col_vals_not_in_set()

Expect that values in country should not be in the set of Mongolia, Germany.

country Mongolia, Germany 2000 1775
0.89
225
0.11
#4CA64C 7
col_vals_between
col_vals_between()

Expect that the median of session_duration should be between 10 and 50.

session_duration [10, 50] 1 1
1.00
0
0.00
#4CA64C66 8
rows_distinct
rows_distinct()

Expect entirely distinct rows across player_id, session_id, time.

player_id, session_id, time 2000 1978
0.99
22
0.01
#4CA64C 9
row_count_match
row_count_match()

Expect that the row count is exactly 2000.

2000 1 1
1.00
0
0.00
#4CA64C 10
col_count_match
col_count_match()

Expect that the column count is exactly 11.

11 1 1
1.00
0
0.00
#4CA64C 11
col_vals_not_null
col_vals_not_null()

Expect that all values in item_type should not be Null.

item_type 2000 2000
1.00
0
0.00
#4CA64C 12
col_exists
col_exists()

Expect that column start_day exists.

start_day 1 1
1.00
0
0.00
2026-07-03 20:38:33 UTC< 1 s2026-07-03 20:38:33 UTC

Notes

Step 7 (pre_applied) Precondition applied: table dimensions [2,000 rows, 11 columns][1 row, 1 column].

That’s the kind of report you get from Pointblank: clear, interactive, and designed for everyone on your team. And if you need help getting started or want to work faster, Pointblank has built-in AI support through the assistant() function to guide you along the way. You can also use DraftValidation to quickly generate a validation plan from your existing data (great for getting started fast), or the AI Validation Editor to refine an existing plan with plain-English instructions.

Ready to validate? Start with our Installation guide or jump straight to the Quickstart.

By the way, Pointblank is made with 💙 by Posit.

What is Data Validation?

Data validation ensures your data meets quality standards before it’s used in analysis, reports, or downstream systems. Pointblank provides a structured way to define validation rules, execute them, and communicate results to both technical and non-technical stakeholders.

With Pointblank you can:

  • Validate data through a fluent, chainable API with 25+ validation methods
  • Set thresholds to define acceptable levels of data quality (warning, error, critical)
  • Take actions when thresholds are exceeded (notifications, logging, custom functions)
  • Generate reports that make data quality issues immediately understandable
  • Inspect data with built-in tools for previewing, summarizing, and finding missing values

Why Pointblank?

Pointblank is designed for the entire data team, not just engineers:

  • 🎨 Beautiful Reports: Interactive validation reports that stakeholders actually want to read
  • 📊 Threshold Management: Define quality standards with warning, error, and critical levels
  • 🔍 Error Drill-Down: Inspect failing data to get to root causes quickly
  • 🔗 Universal Compatibility: Works with Polars, Pandas, DuckDB, MySQL, PostgreSQL, SQLite, and more
  • 🌍 Multilingual Support: Reports available in 40 languages for global teams
  • 📝 YAML Support: Write validations in YAML for version control and team collaboration
  • CLI Tools: Run validations from the command line for CI/CD pipelines or as quick checks
  • 📋 Rich Inspection: Preview data, analyze columns, and visualize missing values

Quick Examples

Threshold-Based Quality

Set expectations and react when data quality degrades (with alerts, logging, or custom functions):

validation = (
    pb.Validate(data=sales_data, thresholds=(0.01, 0.02, 0.05)) # Three threhold levels set
    .col_vals_not_null(columns="customer_id")
    .col_vals_in_set(columns="status", set=["pending", "shipped", "delivered"])
    .interrogate()
)

YAML Workflows

Works wonderfully for CI/CD pipelines and team collaboration:

validate:
  data: sales_data
  tbl_name: "sales_data"
  thresholds: [0.01, 0.02, 0.05]

steps:
  - col_vals_not_null:
      columns: "customer_id"
  - col_vals_in_set:
      columns: "status"
      set: ["pending", "shipped", "delivered"]
validation = pb.yaml_interrogate("validation.yaml")

Command Line Power

Run validations without writing code:

# Quick validation
pb validate sales_data.csv --check col-vals-not-null --column customer_id

# Run YAML workflows
pb run validation.yaml --exit-code  # <- Great for CI/CD!

# Explore your data
pb scan sales_data.csv
pb missing sales_data.csv

Installation

Install Pointblank using pip or conda:

pip install pointblank
# or
conda install conda-forge::pointblank

For specific backends:

pip install "pointblank[pl]"       # Polars support
pip install "pointblank[pd]"       # Pandas support
pip install "pointblank[duckdb]"   # DuckDB support
pip install "pointblank[postgres]" # PostgreSQL support

See the Installation guide for more details.

Join the Community

We’d love to hear from you! Connect with us:


License: MIT | © 2024-2026 Posit Software, PBC