Pointblank

Find out if your data is what you think it is.

AI / Agents

Skills
llms.txt
llms-full.txt

Developers

Rich Iannone

Maintainer

Posit, PBC

Posit Software, PBC

Copyright holder, funder

Community

Contributing guide
Code of conduct
Security policy
Full license MIT
Citing pointblank

Meta

Requires: Python >=3.10
Provides-Extra: pd, pl, pyspark, generate, mcp, otel, excel, bigquery, databricks, duckdb, mysql, mssql, postgres, snowflake, sqlite, docs

Pointblank is a data validation framework for Python that makes data quality checks beautiful, powerful, and stakeholder-friendly. Instead of cryptic error messages, get stunning interactive reports that turn data issues into conversations.

Here’s what a validation looks like (click “Show the code” to see how it’s done):

Show the code
import pointblank as pb
import polars as pl

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="game_revenue", tbl_type="polars"),
        tbl_name="game_revenue",
        label="Comprehensive validation of game revenue data",
        thresholds=pb.Thresholds(warning=0.10, error=0.25, critical=0.35),
        brief=True
    )
    .col_vals_regex(columns="player_id", pattern=r"^[A-Z]{12}[0-9]{3}$")
    .col_vals_gt(columns="session_duration", value=20)
    .col_vals_ge(columns="item_revenue", value=0.20)
    .col_vals_in_set(columns="item_type", set=["iap", "ad"])
    .col_vals_in_set(
        columns="acquisition",
        set=["google", "facebook", "organic", "crosspromo", "other_campaign"]
    )
    .col_vals_not_in_set(columns="country", set=["Mongolia", "Germany"])
    .col_vals_between(
        columns="session_duration",
        left=10, right=50,
        pre=lambda df: df.select(pl.median("session_duration")),
        brief="Expect that the median of `session_duration` should be between `10` and `50`."
    )
    .rows_distinct(columns_subset=["player_id", "session_id", "time"])
    .row_count_match(count=2000)
    .col_count_match(count=11)
    .col_vals_not_null(columns="item_type")
    .col_exists(columns="start_day")
    .interrogate()
)

validation.get_tabular_report(title="Game Revenue Validation Report")

Game Revenue Validation Report

Comprehensive validation of game revenue data
Polarsgame_revenueWARNING0.1ERROR0.25CRITICAL0.35
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_vals_regex
col_vals_regex()

Expect that values in player_id should match the regular expression: ^[A-Z]{12}[0-9]{3}$.

player_id ^[A-Z]{12}[0-9]{3}$ 2000 2000
1.00
0
0.00
#EBBC14 2
col_vals_gt
col_vals_gt()

Expect that values in session_duration should be > 20.

session_duration 20 2000 1418
0.71
582
0.29
#FF3300 3
col_vals_ge
col_vals_ge()

Expect that values in item_revenue should be >= 0.2.

item_revenue 0.2 2000 1192
0.60
808
0.40
#4CA64C 4
col_vals_in_set
col_vals_in_set()

Expect that values in item_type should be in the set of iap, ad.

item_type iap, ad 2000 2000
1.00
0
0.00
#4CA64C66 5
col_vals_in_set
col_vals_in_set()

Expect that values in acquisition should be in the set of google, facebook, organic, and 2 more.

acquisition google, facebook, organic, crosspromo, other_campaign 2000 1975
0.99
25
0.01
#AAAAAA 6
col_vals_not_in_set
col_vals_not_in_set()

Expect that values in country should not be in the set of Mongolia, Germany.

country Mongolia, Germany 2000 1775
0.89
225
0.11
#4CA64C 7
col_vals_between
col_vals_between()

Expect that the median of session_duration should be between 10 and 50.

session_duration [10, 50] 1 1
1.00
0
0.00
#4CA64C66 8
rows_distinct
rows_distinct()

Expect entirely distinct rows across player_id, session_id, time.

player_id, session_id, time 2000 1978
0.99
22
0.01
#4CA64C 9
row_count_match
row_count_match()

Expect that the row count is exactly 2000.

2000 1 1
1.00
0
0.00
#4CA64C 10
col_count_match
col_count_match()

Expect that the column count is exactly 11.

11 1 1
1.00
0
0.00
#4CA64C 11
col_vals_not_null
col_vals_not_null()

Expect that all values in item_type should not be Null.

item_type 2000 2000
1.00
0
0.00
#4CA64C 12
col_exists
col_exists()

Expect that column start_day exists.

start_day 1 1
1.00
0
0.00
2026-04-16 03:10:35 UTC< 1 s2026-04-16 03:10:35 UTC

Notes

Step 7 (pre_applied) Precondition applied: table dimensions [2,000 rows, 11 columns][1 row, 1 column].

That’s the kind of report you get from Pointblank: clear, interactive, and designed for everyone on your team.

What is Data Validation?

Data validation ensures your data meets quality standards before it’s used in analysis, reports, or downstream systems. Pointblank provides a structured way to define validation rules, execute them, and communicate results to both technical and non-technical stakeholders.

With Pointblank you can:

  • Validate data through a fluent, chainable API with 25+ validation methods
  • Set thresholds to define acceptable levels of data quality (warning, error, critical)
  • Take actions when thresholds are exceeded (notifications, logging, custom functions)
  • Generate reports that make data quality issues immediately understandable
  • Inspect data with built-in tools for previewing, summarizing, and finding missing values

Why Pointblank?

Pointblank is designed for the entire data team, not just engineers:

  • 🎨 Beautiful Reports: Interactive validation reports that stakeholders actually want to read
  • 📊 Threshold Management: Define quality standards with warning, error, and critical levels
  • 🔍 Error Drill-Down: Inspect failing data to get to root causes quickly
  • 🔗 Universal Compatibility: Works with Polars, Pandas, DuckDB, MySQL, PostgreSQL, SQLite, and more
  • 🌍 Multilingual Support: Reports available in 40 languages for global teams
  • 📝 YAML Support: Write validations in YAML for version control and team collaboration
  • CLI Tools: Run validations from the command line for CI/CD pipelines or as quick checks
  • 📋 Rich Inspection: Preview data, analyze columns, and visualize missing values