While Pointblank offers many specialized validation functions for common data quality checks, sometimes you need more flexibility for complex validation requirements. This is where expression-based validation with col_vals_expr() comes in.
combine multiple conditions in a single validation step
access row-wise values across multiple columns
Now let’s explore how to use these capabilities through a collection of examples!
Basic Usage
At its core, col_vals_expr() validates whether an expression evaluates to True for each row in your data. Here’s a simple example:
import pointblank as pbimport polars as pl# Load small_table dataset as a Polars DataFramesmall_table_pl = pb.load_dataset(dataset="small_table", tbl_type="polars")( pb.Validate(data=small_table_pl) .col_vals_expr(# Use Polars expression syntax --- expr=pl.col("d") > pl.col("a") *50, brief="Column `d` should be at least 50 times larger than `a`." ) .interrogate())
Pointblank Validation
2025-05-19|17:22:27
Polars
STEP
COLUMNS
VALUES
TBL
EVAL
UNITS
PASS
FAIL
W
E
C
EXT
#4CA64C66
1
col_vals_expr()
Column d should be at least 50 times larger than a.
—
COLUMN EXPR
✓
13
12 0.92
1 0.08
—
—
—
—
In this example, we’re validating that for each row, the value in column d is at least 50 times larger than the value in column a.
Notes on Expression Syntax
The expression syntax depends on your table type:
Polars: uses Polars expression syntax with pl.col("column_name")
Pandas: uses standard Python/NumPy syntax
The expression should:
evaluate to a boolean result for each row
reference columns using the appropriate syntax for your table type
use standard operators (+, -, *, /, >, <, ==, etc.)
not include assignments
Complex Expressions
The real power of col_vals_expr() comes with complex expressions that would be difficult to represent using the standard validation functions:
# Load game_revenue dataset as a Polars DataFramegame_revenue_pl = pb.load_dataset(dataset="game_revenue", tbl_type="polars")( pb.Validate(data=game_revenue_pl) .col_vals_expr(# Use Polars expression syntax --- expr=(pl.col("session_duration") >20) | (pl.col("item_revenue") >10), brief="Sessions should be either long (>20 min) or high-value (>$10)." ) .interrogate())
Pointblank Validation
2025-05-19|17:22:27
Polars
STEP
COLUMNS
VALUES
TBL
EVAL
UNITS
PASS
FAIL
W
E
C
EXT
#4CA64C66
1
col_vals_expr()
Sessions should be either long (>20 min) or high-value (>$10).
—
COLUMN EXPR
✓
2000
1518 0.76
482 0.24
—
—
—
—
This validates that either the session duration is longer than 20 minutes OR the item revenue is greater than $10.
Example: Multiple Conditions
You can create sophisticated validations with multiple conditions:
Adults should have reasonable income-to-experience ratios.
—
COLUMN EXPR
✓
5
4 0.80
1 0.20
—
—
—
—
Example: Handling Null Values
When working with expressions, consider how to handle null/missing values:
( pb.Validate(data=small_table_pl) .col_vals_expr(# Check for nulls before division expr=(pl.col("c").is_not_null()) & ((pl.col("c") / pl.col("a")) >1.5), brief="Ratio of `c`/`a` should exceed 1.5 (when `c` is not null)." ) .interrogate())
Pointblank Validation
2025-05-19|17:22:27
Polars
STEP
COLUMNS
VALUES
TBL
EVAL
UNITS
PASS
FAIL
W
E
C
EXT
#4CA64C66
1
col_vals_expr()
Ratio of c/a should exceed 1.5 (when c is not null).
—
COLUMN EXPR
✓
13
5 0.38
8 0.62
—
—
—
—
Best Practices
Here are some tips and tricks for effectively using expression-based validation with col_vals_expr().
Document Your Expressions
Always provide clear documentation in the brief= parameter:
( pb.Validate(data=small_table_pl) .col_vals_expr( expr=pl.col("d") > pl.col("a") *1.5,# Document which columns are being compared brief="Column `d` should be at least 1.5 times larger than column `a`." ) .interrogate())
Pointblank Validation
2025-05-19|17:22:27
Polars
STEP
COLUMNS
VALUES
TBL
EVAL
UNITS
PASS
FAIL
W
E
C
EXT
#4CA64C
1
col_vals_expr()
Column d should be at least 1.5 times larger than column a.
—
COLUMN EXPR
✓
13
13 1.00
0 0.00
—
—
—
—
Handle Edge Cases
Consider potential edge cases like division by zero or nulls:
( pb.Validate(data=small_table_pl) .col_vals_expr(# Check denominator before division expr=(pl.col("a") !=0) & (pl.col("d") / pl.col("a") >1.5), brief="Ratio of `d`/`a` should exceed 1.5 (avoiding division by zero)." ) .interrogate())
Pointblank Validation
2025-05-19|17:22:27
Polars
STEP
COLUMNS
VALUES
TBL
EVAL
UNITS
PASS
FAIL
W
E
C
EXT
#4CA64C
1
col_vals_expr()
Ratio of d/a should exceed 1.5 (avoiding division by zero).
—
COLUMN EXPR
✓
13
13 1.00
0 0.00
—
—
—
—
Test on Small Datasets First
When developing complex expressions, test on a small sample of your data first to ensure your logic is correct before applying it to large datasets.
Conclusion
The col_vals_expr() method provides a powerful way to implement complex validation logic in Pointblank when standard validation methods aren’t sufficient. By leveraging expressions, you can create sophisticated data quality checks tailored to your specific requirements, combining conditions across multiple columns and applying transformations as needed.
This flexibility makes expression-based validation an essential tool for addressing complex data quality scenarios in your validation workflows.