ref()

Reference a column from the reference data for aggregate comparisons.

Usage

ref(column_name)

This function is used with aggregate validation methods (like col_sum_eq, col_avg_gt, etc.) to compare the aggregate value of a column in the main data against the aggregate value of a column in the reference data.

To use this function, you must first set the reference data on the Validate object using the reference= parameter in the constructor.

Parameters

column_name: str: The name of the column in the reference data to compute the aggregate from.

Returns

ReferenceColumn: A reference column marker that indicates the value should be computed from the reference data.

Examples

Suppose we have two DataFrames: a current data table and a reference (historical) table. We want to validate that the sum of a column in the current data matches the sum of the same column in the reference data.

import pointblank as pb
import polars as pl

# Current data
current_data = pl.DataFrame({"sales": [100, 200, 300]})

# Reference (historical) data
reference_data = pl.DataFrame({"sales": [100, 200, 300]})

validation = (
    pb.Validate(data=current_data, reference=reference_data)
    .col_sum_eq("sales", pb.ref("sales"))
    .interrogate()
)

validation

		STEP	COLUMNS	VALUES	TBL	EVAL	UNITS	PASS	FAIL	W	E	C	EXT
#4CA64C	1	col_sum_eq()	sales	ref('sales')		✓	1	1 1.00	0 0.00	—	—	—	—

You can also compare different columns or use tolerance:

current_data = pl.DataFrame({"revenue": [105, 205, 305]})
reference_data = pl.DataFrame({"sales": [100, 200, 300]})

# Check if revenue sum is within 10% of sales sum
validation = (
    pb.Validate(data=current_data, reference=reference_data)
    .col_sum_eq("revenue", pb.ref("sales"), tol=0.1)
    .interrogate()
)

validation

		STEP	COLUMNS	VALUES	TBL	EVAL	UNITS	PASS	FAIL	W	E	C	EXT
#4CA64C	1	col_sum_eq()	revenue	ref('sales') tol=0.1		✓	1	1 1.00	0 0.00	—	—	—	—

Parameters

Returns

Examples

See Also