| Pointblank Validation | |||||||||||||
2025-10-29|23:17:24 Polars |
|||||||||||||
| STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | E | C | EXT | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| #4CA64C66 | 1 |
specially()
All values in column 'a' should be within 2 std devs of mean |
✓ | 2000 | 1947 0.97 |
53 0.03 |
— | — | — | — | |||
| #4CA64C | 2 |
specially()
All values in column 'c' should be within 3 std devs of mean |
✓ | 2000 | 2000 1.00 |
0 0.00 |
— | — | — | — | |||
2025-10-29 23:17:24 UTC< 1 s2025-10-29 23:17:24 UTC |
|||||||||||||
Custom Validation with specially()
Create bespoke validations using specially() to implement domain-specific business rules.
import pointblank as pb
import polars as pl
def within_std_deviations(df, column, n_std=2):
"""Check if all values are within n standard deviations of the mean"""
mean_val = df[column].mean()
std_val = df[column].std()
lower_bound = mean_val - (n_std * std_val)
upper_bound = mean_val + (n_std * std_val)
# Add a boolean column and return the modified DataFrame
return df.with_columns(
pl.col(column).is_between(lower_bound, upper_bound, closed="both").alias("validation_result")
)
validation = (
pb.Validate(
data=pb.load_dataset(dataset="game_revenue", tbl_type="polars")
)
.specially(
expr=lambda df: within_std_deviations(df, column="session_duration", n_std=2),
brief="All values in column 'a' should be within 2 std devs of mean"
)
.specially(
expr=lambda df: within_std_deviations(df, column="session_duration", n_std=3),
brief="All values in column 'c' should be within 3 std devs of mean"
)
.interrogate()
)
validationPreview of Input Table
PolarsRows2,000Columns11 |
|||||||||||