Data validation toolkit for assessing and monitoring data quality.
Pointblank is a data validation framework for Python that makes data quality checks beautiful, powerful, and stakeholder-friendly. Instead of cryptic error messages, get stunning interactive reports that turn data issues into conversations.
Here’s what a validation looks like (click “Show the code” to see how it’s done):
Show the code
import pointblank as pbimport polars as plvalidation = ( pb.Validate( data=pb.load_dataset(dataset="game_revenue", tbl_type="polars"), tbl_name="game_revenue", label="Comprehensive validation of game revenue data", thresholds=pb.Thresholds(warning=0.10, error=0.25, critical=0.35), brief=True ) .col_vals_regex(columns="player_id", pattern=r"^[A-Z]{12}[0-9]{3}$") # STEP 1 .col_vals_gt(columns="session_duration", value=20) # STEP 2 .col_vals_ge(columns="item_revenue", value=0.20) # STEP 3 .col_vals_in_set(columns="item_type", set=["iap", "ad"]) # STEP 4 .col_vals_in_set( # STEP 5 columns="acquisition",set=["google", "facebook", "organic", "crosspromo", "other_campaign"] ) .col_vals_not_in_set(columns="country", set=["Mongolia", "Germany"]) # STEP 6 .col_vals_between( # STEP 7 columns="session_duration", left=10, right=50, pre =lambda df: df.select(pl.median("session_duration")), brief="Expect that the median of `session_duration` should be between `10` and `50`." ) .rows_distinct(columns_subset=["player_id", "session_id", "time"]) # STEP 8 .row_count_match(count=2000) # STEP 9 .col_count_match(count=11) # STEP 10 .col_vals_not_null(columns="item_type") # STEP 11 .col_exists(columns="start_day") # STEP 12 .interrogate())validation.get_tabular_report(title="Game Revenue Validation Report")
Game Revenue Validation Report
Comprehensive validation of game revenue data
Polarsgame_revenueWARNING0.1ERROR0.25CRITICAL0.35
STEP
COLUMNS
VALUES
TBL
EVAL
UNITS
PASS
FAIL
W
E
C
EXT
#4CA64C
1
col_vals_regex()
Expect that values in player_id should match the regular expression: ^[A-Z]{12}[0-9]{3}$.
player_id
^[A-Z]{12}[0-9]{3}$
✓
2000
2000 1.00
0 0.00
○
○
○
—
#EBBC14
2
col_vals_gt()
Expect that values in session_duration should be > 20.
session_duration
20
✓
2000
1418 0.71
582 0.29
●
●
○
#FF3300
3
col_vals_ge()
Expect that values in item_revenue should be >= 0.2.
item_revenue
0.2
✓
2000
1192 0.60
808 0.40
●
●
●
#4CA64C
4
col_vals_in_set()
Expect that values in item_type should be in the set of iap, ad.
item_type
iap, ad
✓
2000
2000 1.00
0 0.00
○
○
○
—
#4CA64C66
5
col_vals_in_set()
Expect that values in acquisition should be in the set of google, facebook, organic, and 2 more.
Expect that values in country should not be in the set of Mongolia, Germany.
country
Mongolia, Germany
✓
2000
1775 0.89
225 0.11
●
○
○
#4CA64C
7
col_vals_between()
Expect that the median of session_duration should be between 10 and 50.
session_duration
[10, 50]
✓
1
1 1.00
0 0.00
○
○
○
—
#4CA64C66
8
rows_distinct()
Expect entirely distinct rows across player_id, session_id, time.
player_id, session_id, time
—
✓
2000
1978 0.99
22 0.01
○
○
○
#4CA64C
9
row_count_match()
Expect that the row count is exactly 2000.
—
2000
✓
1
1 1.00
0 0.00
○
○
○
—
#4CA64C
10
col_count_match()
Expect that the column count is exactly 11.
—
11
✓
1
1 1.00
0 0.00
○
○
○
—
#4CA64C
11
col_vals_not_null()
Expect that all values in item_type should not be Null.
item_type
—
✓
2000
2000 1.00
0 0.00
○
○
○
—
#4CA64C
12
col_exists()
Expect that column start_day exists.
start_day
—
✓
1
1 1.00
0 0.00
○
○
○
—
2025-11-05 22:39:18 UTC< 1 s2025-11-05 22:39:18 UTC
That’s the kind of report you get from Pointblank: clear, interactive, and designed for everyone on your team. And if you need help getting started or want to work faster, Pointblank has built-in AI support through the assistant() function to guide you along the way. You can also use DraftValidation to quickly generate a validation plan from your existing data (great for getting started fast).
Ready to validate? Start with our Installation guide or jump straight to the User Guide.
Data validation ensures your data meets quality standards before it’s used in analysis, reports, or downstream systems. Pointblank provides a structured way to define validation rules, execute them, and communicate results to both technical and non-technical stakeholders.