Pointblank | Pointblank

Pointblank

Find out if your data is what you think it is.

Links

View on PyPI

AI / Agents

Skills
llms.txt
llms-full.txt

Developers

Rich Iannone

Maintainer

Posit, PBC

Posit Software, PBC

Community

Contributing guide
Code of conduct
Security policy
Full license MIT
Citing pointblank

Meta

Requires: Python >=3.10
Provides-Extra: pd, pl, pyspark, generate, mcp, otel, excel, bigquery, databricks, duckdb, mysql, mssql, postgres, snowflake, sqlite, docs

Pointblank is a data validation framework for Python that makes data quality checks beautiful, powerful, and stakeholder-friendly. Instead of cryptic error messages, get stunning interactive reports that turn data issues into conversations.

Here’s what a validation looks like (click “Show the code” to see how it’s done):

Show the code

import pointblank as pb
import polars as pl

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="game_revenue", tbl_type="polars"),
        tbl_name="game_revenue",
        label="Comprehensive validation of game revenue data",
        thresholds=pb.Thresholds(warning=0.10, error=0.25, critical=0.35),
        brief=True
    )
    .col_vals_regex(columns="player_id", pattern=r"^[A-Z]{12}[0-9]{3}$")
    .col_vals_gt(columns="session_duration", value=20)
    .col_vals_ge(columns="item_revenue", value=0.20)
    .col_vals_in_set(columns="item_type", set=["iap", "ad"])
    .col_vals_in_set(
        columns="acquisition",
        set=["google", "facebook", "organic", "crosspromo", "other_campaign"]
    )
    .col_vals_not_in_set(columns="country", set=["Mongolia", "Germany"])
    .col_vals_between(
        columns="session_duration",
        left=10, right=50,
        pre=lambda df: df.select(pl.median("session_duration")),
        brief="Expect that the median of `session_duration` should be between `10` and `50`."
    )
    .rows_distinct(columns_subset=["player_id", "session_id", "time"])
    .row_count_match(count=2000)
    .col_count_match(count=11)
    .col_vals_not_null(columns="item_type")
    .col_exists(columns="start_day")
    .interrogate()
)

validation.get_tabular_report(title="Game Revenue Validation Report")

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C	EXT
Game Revenue Validation Report
Comprehensive validation of game revenue data Polarsgame_revenueWARNING0.1ERROR0.25CRITICAL0.35
#4CA64C	1	col_vals_regex() Expect that values in `player_id` should match the regular expression: ^[A-Z]{12}[0-9]{3}$.	player_id	^[A-Z]{12}[0-9]{3}$	✓	2000	2000 1.00	0 0.00	○	○	○	—
#EBBC14	2	col_vals_gt() Expect that values in `session_duration` should be > `20`.	session_duration	20	✓	2000	1418 0.71	582 0.29	●	●	○
#FF3300	3	col_vals_ge() Expect that values in `item_revenue` should be >= `0.2`.	item_revenue	0.2	✓	2000	1192 0.60	808 0.40	●	●	●
#4CA64C	4	col_vals_in_set() Expect that values in `item_type` should be in the set of `iap`, `ad`.	item_type	iap, ad	✓	2000	2000 1.00	0 0.00	○	○	○	—
#4CA64C66	5	col_vals_in_set() Expect that values in `acquisition` should be in the set of `google`, `facebook`, `organic`, and 2 more.	acquisition	google, facebook, organic, crosspromo, other_campaign	✓	2000	1975 0.99	25 0.01	○	○	○
#AAAAAA	6	col_vals_not_in_set() Expect that values in `country` should not be in the set of `Mongolia`, `Germany`.	country	Mongolia, Germany	✓	2000	1775 0.89	225 0.11	●	○	○
#4CA64C	7	col_vals_between() Expect that the median of `session_duration` should be between `10` and `50`.	session_duration	[10, 50]	✓	1	1 1.00	0 0.00	○	○	○	—
#4CA64C66	8	rows_distinct() Expect entirely distinct rows across `player_id`, `session_id`, `time`.	player_id, session_id, time	—	✓	2000	1978 0.99	22 0.01	○	○	○
#4CA64C	9	row_count_match() Expect that the row count is exactly `2000`.	—	2000	✓	1	1 1.00	0 0.00	○	○	○	—
#4CA64C	10	col_count_match() Expect that the column count is exactly `11`.	—	11	✓	1	1 1.00	0 0.00	○	○	○	—
#4CA64C	11	col_vals_not_null() Expect that all values in `item_type` should not be Null.	item_type	—	✓	2000	2000 1.00	0 0.00	○	○	○	—
#4CA64C	12	col_exists() Expect that column `start_day` exists.	start_day	—	✓	1	1 1.00	0 0.00	○	○	○	—
2026-05-16 00:28:31 UTC< 1 s2026-05-16 00:28:31 UTC
Notes Step 7 (pre_applied) Precondition applied: table dimensions [2,000 rows, 11 columns] → [1 row, 1 column].

That’s the kind of report you get from Pointblank: clear, interactive, and designed for everyone on your team.

What is Data Validation?

Data validation ensures your data meets quality standards before it’s used in analysis, reports, or downstream systems. Pointblank provides a structured way to define validation rules, execute them, and communicate results to both technical and non-technical stakeholders.

With Pointblank you can:

Validate data through a fluent, chainable API with 25+ validation methods
Set thresholds to define acceptable levels of data quality (warning, error, critical)
Take actions when thresholds are exceeded (notifications, logging, custom functions)
Generate reports that make data quality issues immediately understandable
Inspect data with built-in tools for previewing, summarizing, and finding missing values

Why Pointblank?

Pointblank is designed for the entire data team, not just engineers:

🎨 Beautiful Reports: Interactive validation reports that stakeholders actually want to read
📊 Threshold Management: Define quality standards with warning, error, and critical levels
🔍 Error Drill-Down: Inspect failing data to get to root causes quickly
🔗 Universal Compatibility: Works with Polars, Pandas, DuckDB, MySQL, PostgreSQL, SQLite, and more
🌍 Multilingual Support: Reports available in 40 languages for global teams
📝 YAML Support: Write validations in YAML for version control and team collaboration
⚡ CLI Tools: Run validations from the command line for CI/CD pipelines or as quick checks
📋 Rich Inspection: Preview data, analyze columns, and visualize missing values