ref()

Reference a column from the reference data for aggregate comparisons.

Usage

Source

ref(column_name)

This function is used with aggregate validation methods (like col_sum_eq, col_avg_gt, etc.) to compare the aggregate value of a column in the main data against the aggregate value of a column in the reference data.

To use this function, you must first set the reference data on the Validate object using the reference= parameter in the constructor.

Parameters

column_name: str
The name of the column in the reference data to compute the aggregate from.

Returns

ReferenceColumn
A reference column marker that indicates the value should be computed from the reference data.

Examples

Suppose we have two DataFrames: a current data table and a reference (historical) table. We want to validate that the sum of a column in the current data matches the sum of the same column in the reference data.

import pointblank as pb
import polars as pl

# Current data
current_data = pl.DataFrame({"sales": [100, 200, 300]})

# Reference (historical) data
reference_data = pl.DataFrame({"sales": [100, 200, 300]})

validation = (
    pb.Validate(data=current_data, reference=reference_data)
    .col_sum_eq("sales", pb.ref("sales"))
    .interrogate()
)

validation
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_sum_eq
col_sum_eq()
sales ref('sales') 1 1
1.00
0
0.00

You can also compare different columns or use tolerance:

current_data = pl.DataFrame({"revenue": [105, 205, 305]})
reference_data = pl.DataFrame({"sales": [100, 200, 300]})

# Check if revenue sum is within 10% of sales sum
validation = (
    pb.Validate(data=current_data, reference=reference_data)
    .col_sum_eq("revenue", pb.ref("sales"), tol=0.1)
    .interrogate()
)

validation
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C 1
col_sum_eq
col_sum_eq()
revenue ref('sales')
tol=0.1
1 1
1.00
0
0.00

See Also

The