This function is used with aggregate validation methods (like col_sum_eq, col_avg_gt, etc.) to compare the aggregate value of a column in the main data against the aggregate value of a column in the reference data.
To use this function, you must first set the reference data on the Validate object using the reference= parameter in the constructor.
Parameters
column_name:str
The name of the column in the reference data to compute the aggregate from.
Returns
ReferenceColumn
A reference column marker that indicates the value should be computed from the reference data.
Examples
Suppose we have two DataFrames: a current data table and a reference (historical) table. We want to validate that the sum of a column in the current data matches the sum of the same column in the reference data.
import pointblank as pbimport polars as pl# Current datacurrent_data = pl.DataFrame({"sales": [100, 200, 300]})# Reference (historical) datareference_data = pl.DataFrame({"sales": [100, 200, 300]})validation = ( pb.Validate(data=current_data, reference=reference_data) .col_sum_eq("sales", pb.ref("sales")) .interrogate())validation
STEP
COLUMNS
VALUES
TBL
EVAL
UNITS
PASS
FAIL
W
E
C
EXT
#4CA64C
1
col_sum_eq()
sales
ref('sales')
✓
1
1 1.00
0 0.00
—
—
—
—
You can also compare different columns or use tolerance:
current_data = pl.DataFrame({"revenue": [105, 205, 305]})reference_data = pl.DataFrame({"sales": [100, 200, 300]})# Check if revenue sum is within 10% of sales sumvalidation = ( pb.Validate(data=current_data, reference=reference_data) .col_sum_eq("revenue", pb.ref("sales"), tol=0.1) .interrogate())validation