Pointblank Validation | |||||||||||||
2025-06-22|01:26:31 Polars |
|||||||||||||
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | E | C | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
#4CA64C66 | 1 |
specially()
All values in column 'a' should be within 2 std devs of mean |
✓ | 2000 | 1947 0.97 |
53 0.03 |
— | — | — | — | |||
#4CA64C | 2 |
specially()
All values in column 'c' should be within 3 std devs of mean |
✓ | 2000 | 2000 1.00 |
0 0.00 |
— | — | — | — | |||
2025-06-22 01:26:31 UTC< 1 s2025-06-22 01:26:31 UTC |
Custom Validation with specially()
Create bespoke validations using specially()
to implement domain-specific business rules.
import pointblank as pb
import polars as pl
def within_std_deviations(df, column, n_std=2):
"""Check if all values are within n standard deviations of the mean"""
= df[column].mean()
mean_val = df[column].std()
std_val
= mean_val - (n_std * std_val)
lower_bound = mean_val + (n_std * std_val)
upper_bound
# Add a boolean column and return the modified DataFrame
return df.with_columns(
="both").alias("validation_result")
pl.col(column).is_between(lower_bound, upper_bound, closed
)
= (
validation
pb.Validate(=pb.load_dataset(dataset="game_revenue", tbl_type="polars")
data
)
.specially(=lambda df: within_std_deviations(df, column="session_duration", n_std=2),
expr="All values in column 'a' should be within 2 std devs of mean"
brief
)
.specially(=lambda df: within_std_deviations(df, column="session_duration", n_std=3),
expr="All values in column 'c' should be within 3 std devs of mean"
brief
)
.interrogate()
)
validation
Preview of Input Table
PolarsRows2,000Columns11 |
|||||||||||