Data validation toolkit for assessing and monitoring data quality.
Pointblank is a data validation framework for Python that makes data quality checks beautiful, powerful, and stakeholder-friendly. Instead of cryptic error messages, get stunning interactive reports that turn data issues into conversations.
Here’s what a validation looks like (click “Show the code” to see how it’s done):
Show the code
import pointblank as pbimport polars as plvalidation = ( pb.Validate( data=pb.load_dataset(dataset="game_revenue", tbl_type="polars"), tbl_name="game_revenue", label="Comprehensive validation of game revenue data", thresholds=pb.Thresholds(warning=0.10, error=0.25, critical=0.35), brief=True ) .col_vals_regex(columns="player_id", pattern=r"^[A-Z]{12}[0-9]{3}$") # STEP 1 .col_vals_gt(columns="session_duration", value=20) # STEP 2 .col_vals_ge(columns="item_revenue", value=0.20) # STEP 3 .col_vals_in_set(columns="item_type", set=["iap", "ad"]) # STEP 4 .col_vals_in_set( # STEP 5 columns="acquisition",set=["google", "facebook", "organic", "crosspromo", "other_campaign"] ) .col_vals_not_in_set(columns="country", set=["Mongolia", "Germany"]) # STEP 6 .col_vals_between( # STEP 7 columns="session_duration", left=10, right=50, pre =lambda df: df.select(pl.median("session_duration")), brief="Expect that the median of `session_duration` should be between `10` and `50`." ) .rows_distinct(columns_subset=["player_id", "session_id", "time"]) # STEP 8 .row_count_match(count=2000) # STEP 9 .col_count_match(count=11) # STEP 10 .col_vals_not_null(columns="item_type") # STEP 11 .col_exists(columns="start_day") # STEP 12 .interrogate())validation.get_tabular_report(title="Game Revenue Validation Report")
Game Revenue Validation Report
Comprehensive validation of game revenue data
Polarsgame_revenueWARNING0.1ERROR0.25CRITICAL0.35
STEP
COLUMNS
VALUES
TBL
EVAL
UNITS
PASS
FAIL
W
E
C
EXT
#4CA64C
1
col_vals_regex()
Expect that values in player_id should match the regular expression: ^[A-Z]{12}[0-9]{3}$.
player_id
^[A-Z]{12}[0-9]{3}$
✓
2000
2000 1.00
0 0.00
○
○
○
—
#EBBC14
2
col_vals_gt()
Expect that values in session_duration should be > 20.
session_duration
20
✓
2000
1418 0.71
582 0.29
●
●
○
#FF3300
3
col_vals_ge()
Expect that values in item_revenue should be >= 0.2.
item_revenue
0.2
✓
2000
1192 0.60
808 0.40
●
●
●
#4CA64C
4
col_vals_in_set()
Expect that values in item_type should be in the set of iap, ad.
item_type
iap, ad
✓
2000
2000 1.00
0 0.00
○
○
○
—
#4CA64C66
5
col_vals_in_set()
Expect that values in acquisition should be in the set of google, facebook, organic, and 2 more.
That’s the kind of report you get from Pointblank: clear, interactive, and designed for everyone on your team. And if you need help getting started or want to work faster, Pointblank has built-in AI support through the assistant() function to guide you along the way. You can also use DraftValidation to quickly generate a validation plan from your existing data (great for getting started fast).
Ready to validate? Start with our Installation guide or jump straight to the User Guide.
Data validation ensures your data meets quality standards before it’s used in analysis, reports, or downstream systems. Pointblank provides a structured way to define validation rules, execute them, and communicate results to both technical and non-technical stakeholders.