data_freshness()`method`

Validate that data in a datetime column is not older than a specified maximum age.

USAGE

Validate.data_freshness(
    column,
    max_age,
    reference_time=None,
    timezone=None,
    allow_tz_mismatch=False,
    pre=None,
    thresholds=None,
    actions=None,
    brief=None,
    active=True,
)

The data_freshness() validation method checks whether the most recent timestamp in the specified datetime column is within the allowed max_age= from the reference_time= (which defaults to the current time). This is useful for ensuring data pipelines are delivering fresh data and for enforcing data SLAs.

This method helps detect stale data by comparing the maximum (most recent) value in a datetime column against an expected freshness threshold.

Parameters

column : str: The name of the datetime column to check for freshness. This column should contain date or datetime values.
max_age : str | datetime.timedelta: The maximum allowed age of the data. Can be specified as: (1) a string with a human-readable duration like "24 hours", "1 day", "30 minutes", "2 weeks", etc. (supported units: seconds, minutes, hours, days, weeks), or (2) a datetime.timedelta object for precise control.
reference_time : datetime.datetime | str | None = None: The reference point in time to compare against. Defaults to None, which uses the current time (UTC if timezone= is not specified). Can be: (1) a datetime.datetime object (timezone-aware recommended), (2) a string in ISO 8601 format (e.g., "2024-01-15T10:30:00" or "2024-01-15T10:30:00+05:30"), or (3) None to use the current time.
timezone : str | None = None: The timezone to use for interpreting the data and reference time. Accepts IANA timezone names (e.g., "America/New_York"), hour offsets (e.g., "-7"), or ISO 8601 offsets (e.g., "-07:00"). When None (default), naive datetimes are treated as UTC. See the The timezone= Parameter section for details.
allow_tz_mismatch : bool = False: Whether to allow timezone mismatches between the column data and reference time. By default (False), a warning note is added when comparing timezone-naive with timezone-aware datetimes. Set to True to suppress these warnings.
pre : Callable | None = None: An optional preprocessing function or lambda to apply to the data table during interrogation. This function should take a table as input and return a modified table.
thresholds : int | float | bool | tuple | dict | Thresholds | None = None: Set threshold failure levels for reporting and reacting to exceedences of the levels. The thresholds are set at the step level and will override any global thresholds set in Validate(thresholds=...). The default is None, which means that no thresholds will be set locally and global thresholds (if any) will take effect.
actions : Actions | None = None: Optional actions to take when the validation step meets or exceeds any set threshold levels. If provided, the Actions class should be used to define the actions.
brief : str | bool | None = None: An optional brief description of the validation step that will be displayed in the reporting table. You can use the templating elements like "{step}" to insert the step number, or "{auto}" to include an automatically generated brief. If True the entire brief will be automatically generated. If None (the default) then there won’t be a brief.
active : bool | Callable = True: A boolean value or callable that determines whether the validation step should be active. Using False will make the validation step inactive (still reporting its presence and keeping indexes for the steps unchanged). A callable can also be provided; it will receive the data table as its single argument and must return a boolean value. The callable is evaluated before any pre= processing. Inspection functions like has_columns() and has_rows() can be used here to conditionally activate a step based on properties of the target table.

Returns

Validate: The Validate object with the added validation step.

How Timezones Affect Freshness Checks

Freshness validation involves comparing two times: the data time (the most recent timestamp in your column) and the execution time (when and where the validation runs). Timezone confusion typically arises because these two times may originate from different contexts.

Consider these common scenarios:

your data timestamps are stored in UTC (common for databases), but you’re running validation on your laptop in New York (Eastern Time)
you develop and test validation locally, then deploy it to a cloud workflow that runs in UTC—suddenly your ‘same’ validation behaves differently
your data comes from servers in multiple regions, each recording timestamps in their local timezone

The timezone= parameter exists to solve this problem by establishing a single, explicit timezone context for the freshness comparison. When you specify a timezone, Pointblank interprets both the data timestamps (if naive) and the execution time in that timezone, ensuring consistent behavior whether you run validation on your laptop or in a cloud workflow.

Scenario 1: Data has timezone-aware datetimes

# Your data column has values like: 2024-01-15 10:30:00+00:00 (UTC)
# Comparison is straightforward as both sides have explicit timezones
.data_freshness(column="updated_at", max_age="24 hours")

Scenario 2: Data has naive datetimes (no timezone)

# Your data column has values like: 2024-01-15 10:30:00 (no timezone)
# Specify the timezone the data was recorded in:
.data_freshness(column="updated_at", max_age="24 hours", timezone="America/New_York")

Scenario 3: Ensuring consistent behavior across environments

# Pin the timezone to ensure identical results whether running locally or in the cloud
.data_freshness(
    column="updated_at",
    max_age="24 hours",
    timezone="UTC",  # Explicit timezone removes environment dependence
)

The `timezone=` Parameter

The timezone= parameter accepts several convenient formats, making it easy to specify timezones in whatever way is most natural for your use case. The following examples illustrate the three supported input styles.

IANA Timezone Names (recommended for regions with daylight saving time):

timezone="America/New_York"   # Eastern Time (handles DST automatically)
timezone="Europe/London"      # UK time
timezone="Asia/Tokyo"         # Japan Standard Time
timezone="Australia/Sydney"   # Australian Eastern Time
timezone="UTC"                # Coordinated Universal Time

Simple Hour Offsets (quick and easy):

timezone="-7"    # UTC-7 (e.g., Mountain Standard Time)
timezone="+5"    # UTC+5 (e.g., Pakistan Standard Time)
timezone="0"     # UTC
timezone="-12"   # UTC-12

ISO 8601 Offset Format (precise, including fractional hours):

timezone="-07:00"   # UTC-7
timezone="+05:30"   # UTC+5:30 (e.g., India Standard Time)
timezone="+00:00"   # UTC
timezone="-09:30"   # UTC-9:30

When a timezone is specified:

naive datetime values in the column are assumed to be in this timezone.
the reference time (if naive) is assumed to be in this timezone.
the validation report will show times in this timezone.

When None (default):

if your column has timezone-aware datetimes, those timezones are used
if your column has naive datetimes, they’re treated as UTC
the current time reference uses UTC

Note that IANA timezone names are preferred when daylight saving time transitions matter, as they automatically handle the offset changes. Fixed offsets like "-7" or "-07:00" do not account for DST.

Recommendations for Working with Timestamps

When working with datetime data, storing timestamps in UTC in your databases is strongly recommended since it provides a consistent reference point regardless of where your data originates or where it’s consumed. Using timezone-aware datetimes whenever possible helps avoid ambiguity—when a datetime has an explicit timezone, there’s no guessing about what time it actually represents.

If you’re working with naive datetimes (which lack timezone information), always specify the timezone= parameter so Pointblank knows how to interpret those values. When providing reference_time= as a string, use ISO 8601 format with the timezone offset included (e.g., "2024-01-15T10:30:00+00:00") to ensure unambiguous parsing. Finally, prefer IANA timezone names (like "America/New_York") over fixed offsets (like "-05:00") when daylight saving time transitions matter, since IANA names automatically handle the twice-yearly offset changes. To see all available IANA timezone names in Python, use zoneinfo.available_timezones() from the standard library’s zoneinfo module.

Examples

The simplest use of data_freshness() requires just two arguments: the column= containing your timestamps and max_age= specifying how old the data can be. In this first example, we create sample data with an "updated_at" column containing timestamps from 1, 12, and 20 hours ago. By setting max_age="24 hours", we’re asserting that the most recent timestamp should be within 24 hours of the current time. Since the newest record is only 1 hour old, this validation passes.

import pointblank as pb
import polars as pl
from datetime import datetime, timedelta

# Create sample data with recent timestamps
recent_data = pl.DataFrame({
    "id": [1, 2, 3],
    "updated_at": [
        datetime.now() - timedelta(hours=1),
        datetime.now() - timedelta(hours=12),
        datetime.now() - timedelta(hours=20),
    ]
})

validation = (
    pb.Validate(data=recent_data)
    .data_freshness(column="updated_at", max_age="24 hours")
    .interrogate()
)

validation

		STEP	COLUMNS	VALUES	TBL	EVAL	UNITS	PASS	FAIL	W	E	C	EXT
#4CA64C	1	data_freshness()	updated_at	1d		✓	1	1 1.00	0 0.00	—	—	—	—
Notes Step 1 (freshness_details) ✓ Most recent data: `2026-02-26 13:41:40` (age: 1.0h, max allowed: 1d)

The max_age= parameter accepts human-readable strings with various time units. You can chain multiple data_freshness() calls to check different freshness thresholds simultaneously—useful for tiered SLAs where you might want warnings at 30 minutes but errors at 2 days.

# Check data is fresh within different time windows
validation = (
    pb.Validate(data=recent_data)
    .data_freshness(column="updated_at", max_age="30 minutes")  # Very fresh
    .data_freshness(column="updated_at", max_age="2 days")      # Reasonably fresh
    .data_freshness(column="updated_at", max_age="1 week")      # Within a week
    .interrogate()
)

validation

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C	EXT
#4CA64C66	1	data_freshness()	updated_at	30.0m	✓	1	0 0.00	1 1.00	—	—	—	—
#4CA64C	2	data_freshness()	updated_at	2d	✓	1	1 1.00	0 0.00	—	—	—	—
#4CA64C	3	data_freshness()	updated_at	1w	✓	1	1 1.00	0 0.00	—	—	—	—
Notes Step 1 (freshness_details) ✗ Most recent data: `2026-02-26 13:41:40` (age: 1.0h, max allowed: 30.0m) Step 2 (freshness_details) ✓ Most recent data: `2026-02-26 13:41:40` (age: 1.0h, max allowed: 2d) Step 3 (freshness_details) ✓ Most recent data: `2026-02-26 13:41:40` (age: 1.0h, max allowed: 1w)

When your data contains naive datetimes (timestamps without timezone information), use the timezone= parameter to specify what timezone those values represent. Here we have event data recorded in Eastern Time, so we set timezone="America/New_York" to ensure the freshness comparison is done correctly.

# Data with naive datetimes (assume they're in Eastern Time)
eastern_data = pl.DataFrame({
    "event_time": [
        datetime.now() - timedelta(hours=2),
        datetime.now() - timedelta(hours=5),
    ]
})

validation = (
    pb.Validate(data=eastern_data)
    .data_freshness(
        column="event_time",
        max_age="12 hours",
        timezone="America/New_York"  # Interpret times as Eastern
    )
    .interrogate()
)

validation

STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT

#4CA64C

		STEP	COLUMNS	VALUES	TBL	EVAL	UNITS	PASS	FAIL	W	E	C	EXT
#4CA64C	1	data_freshness()	event_time	12.0h -05:00		✓	1	1 1.00	0 0.00	—	—	—	—
Notes Step 1 (tz_warning) ⚠️ Column has naive datetimes but reference time is timezone-aware. Naive datetimes are being treated as if they're in the reference timezone. Step 1 (freshness_details) ✓ Most recent data: `2026-02-26 12:41:40` (age: -10800.0s, max allowed: 12.0h)

data_freshness()

event_time

12.0h
-05:00

✓

1
1.00

0
0.00

—

Notes

Step 1 (tz_warning) ⚠️ Column has naive datetimes but reference time is timezone-aware. Naive datetimes are being treated as if they're in the reference timezone.

Step 1 (freshness_details) ✓ Most recent data: 2026-02-26 12:41:40 (age: -10800.0s, max allowed: 12.0h)

For reproducible validations or historical checks, you can use reference_time= to compare against a specific point in time instead of the current time. This is particularly useful for testing or when validating data snapshots. The reference time should include a timezone offset (like +00:00 for UTC) to avoid ambiguity.

validation = (
    pb.Validate(data=recent_data)
    .data_freshness(
        column="updated_at",
        max_age="24 hours",
        reference_time="2024-01-15T12:00:00+00:00"
    )
    .interrogate()
)

validation