| Pointblank Validation | |||||||||||||
| 2025-10-29|23:16:15 Polars | |||||||||||||
| STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | E | C | EXT | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| #4CA64C | 1 | specially() Recent data available (within 2 days of 2023-12-31) | ✓ | 1 | 1 1.00 | 0 0.00 | — | — | — | — | |||
| #4CA64C | 2 | col_vals_ge() All data points are from the last week | ✓ | 4 | 4 1.00 | 0 0.00 | — | — | — | — | |||
| #4CA64C | 3 | specially() Most recent data is from today | ✓ | 1 | 1 1.00 | 0 0.00 | — | — | — | — | |||
| #4CA64C | 4 | col_vals_not_null() No missing timestamps | ✓ | 4 | 4 1.00 | 0 0.00 | — | — | — | — | |||
| 2025-10-29 23:16:15 UTC< 1 s2025-10-29 23:16:15 UTC | |||||||||||||
Validating Data Freshness
Use date/datetime-based validations to ensure your data is current and recent. This is critical for applications that depend on timely data updates.
import pointblank as pb
import polars as pl
from datetime import date, datetime, timedelta
# Create sample data with mixed freshness levels
freshness_data = pl.DataFrame({
    "data_timestamp": [
        datetime(2023, 12, 28, 10, 30),  # 3 days ago from Dec 31
        datetime(2023, 12, 29, 14, 15),  # 2 days ago
        datetime(2023, 12, 30, 9, 45),   # 1 day ago
        datetime(2023, 12, 31, 16, 20),  # Today
    ],
    "sensor_id": ["TEMP_01", "TEMP_02", "TEMP_01", "TEMP_03"],
    "reading": [22.5, 21.8, 23.1, 22.9],
    "quality_score": [0.95, 0.88, 0.92, 0.97]
})
# Assuming today is 2023-12-31, check for data freshness
current_date = date(2023, 12, 31)
freshness_cutoff = current_date - timedelta(days=2)  # Data should be within 2 days
validation = (
    pb.Validate(freshness_data)
    .specially(
        expr=lambda df: df.filter(
            pl.col("data_timestamp").dt.date() >= freshness_cutoff
        ).height > 0,
        brief=f"Recent data available (within 2 days of {current_date})"
    )
    .col_vals_ge(
        columns="data_timestamp",
        value=current_date - timedelta(days=7),  # Within last week
        brief="All data points are from the last week"
    )
    .specially(
        expr=lambda df: (
            df.select(pl.col("data_timestamp").max()).item().date() >= current_date
        ),
        brief="Most recent data is from today"
    )
    .col_vals_not_null(
        columns="data_timestamp",
        brief="No missing timestamps"
    )
    .interrogate()
)
validationPreview of Input Table
| PolarsRows4Columns4 | ||||