Pointblank Validation | |||||||||||||
2025-10-09|20:53:32 Polars |
|||||||||||||
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | E | C | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
#4CA64C | 1 |
specially()
Recent data available (within 2 days of 2023-12-31) |
✓ | 1 | 1 1.00 |
0 0.00 |
— | — | — | — | |||
#4CA64C | 2 |
col_vals_ge()
All data points are from the last week |
✓ | 4 | 4 1.00 |
0 0.00 |
— | — | — | — | |||
#4CA64C | 3 |
specially()
Most recent data is from today |
✓ | 1 | 1 1.00 |
0 0.00 |
— | — | — | — | |||
#4CA64C | 4 |
col_vals_not_null()
No missing timestamps |
✓ | 4 | 4 1.00 |
0 0.00 |
— | — | — | — | |||
2025-10-09 20:53:32 UTC< 1 s2025-10-09 20:53:32 UTC |
Validating Data Freshness
Use date/datetime-based validations to ensure your data is current and recent. This is critical for applications that depend on timely data updates.
import pointblank as pb
import polars as pl
from datetime import date, datetime, timedelta
# Create sample data with mixed freshness levels
= pl.DataFrame({
freshness_data "data_timestamp": [
2023, 12, 28, 10, 30), # 3 days ago from Dec 31
datetime(2023, 12, 29, 14, 15), # 2 days ago
datetime(2023, 12, 30, 9, 45), # 1 day ago
datetime(2023, 12, 31, 16, 20), # Today
datetime(
],"sensor_id": ["TEMP_01", "TEMP_02", "TEMP_01", "TEMP_03"],
"reading": [22.5, 21.8, 23.1, 22.9],
"quality_score": [0.95, 0.88, 0.92, 0.97]
})
# Assuming today is 2023-12-31, check for data freshness
= date(2023, 12, 31)
current_date = current_date - timedelta(days=2) # Data should be within 2 days
freshness_cutoff
= (
validation
pb.Validate(freshness_data)
.specially(=lambda df: df.filter(
expr"data_timestamp").dt.date() >= freshness_cutoff
pl.col(> 0,
).height =f"Recent data available (within 2 days of {current_date})"
brief
)
.col_vals_ge(="data_timestamp",
columns=current_date - timedelta(days=7), # Within last week
value="All data points are from the last week"
brief
)
.specially(=lambda df: (
expr"data_timestamp").max()).item().date() >= current_date
df.select(pl.col(
),="Most recent data is from today"
brief
)
.col_vals_not_null(="data_timestamp",
columns="No missing timestamps"
brief
)
.interrogate()
)
validation
Preview of Input Table
PolarsRows4Columns4 |
||||