import pointblank as pb
import polars as pl
# Load small_table dataset as a Polars DataFrame
= pb.load_dataset(dataset="small_table", tbl_type="polars")
small_table_pl
(=small_table_pl)
pb.Validate(data
.col_vals_expr(
# Use Polars expression syntax ---
=pl.col("d") > pl.col("a") * 50,
expr="Column `d` should be at least 50 times larger than `a`."
brief
)
.interrogate() )
Expression-Based Validation
While Pointblank offers many specialized validation functions for common data quality checks, sometimes you need more flexibility for complex validation requirements. This is where expression-based validation with col_vals_expr()
comes in.
The col_vals_expr()
method allows you to:
- combine multiple conditions in a single validation step
- access row-wise values across multiple columns
Now let’s explore how to use these capabilities through a collection of examples!
Basic Usage
At its core, col_vals_expr()
validates whether an expression evaluates to True
for each row in your data. Here’s a simple example:
In this example, we’re validating that for each row, the value in column d
is at least 50 times larger than the value in column a
.
Notes on Expression Syntax
The expression syntax depends on your table type:
- Polars: uses Polars expression syntax with
pl.col("column_name")
- Pandas: uses standard Python/NumPy syntax
The expression should:
- evaluate to a boolean result for each row
- reference columns using the appropriate syntax for your table type
- use standard operators (
+
,-
,*
,/
,>
,<
,==
, etc.) - not include assignments
Complex Expressions
The real power of col_vals_expr()
comes with complex expressions that would be difficult to represent using the standard validation functions:
# Load game_revenue dataset as a Polars DataFrame
= pb.load_dataset(dataset="game_revenue", tbl_type="polars")
game_revenue_pl
(=game_revenue_pl)
pb.Validate(data
.col_vals_expr(
# Use Polars expression syntax ---
=(pl.col("session_duration") > 20) | (pl.col("item_revenue") > 10),
expr="Sessions should be either long (>20 min) or high-value (>$10)."
brief
)
.interrogate() )
This validates that either the session duration is longer than 20 minutes OR the item revenue is greater than $10.
Example: Multiple Conditions
You can create sophisticated validations with multiple conditions:
# Create a simple Polars DataFrame
= pl.DataFrame({
employee_df "age": [25, 30, 15, 40, 35],
"income": [50000, 75000, 0, 100000, 60000],
"years_experience": [3, 8, 0, 15, 7]
})
(=employee_df, tbl_name="employee_data")
pb.Validate(data
.col_vals_expr(
# Complex condition with multiple comparisons ---
=(
expr"age") >= 18) &
(pl.col("income") / (pl.col("years_experience") + 1) <= 25000)
(pl.col(
),="Adults should have reasonable income-to-experience ratios."
brief
)
.interrogate() )
Example: Handling Null Values
When working with expressions, consider how to handle null/missing values:
(=small_table_pl)
pb.Validate(data
.col_vals_expr(
# Check for nulls before division ---
=(pl.col("c").is_not_null()) & ((pl.col("c") / pl.col("a")) > 1.5),
expr="Ratio of `c`/`a` should exceed 1.5 (when `c` is not null)."
brief
)
.interrogate() )
Best Practices
Here are some tips and tricks for effectively using expression-based validation with col_vals_expr()
.
Document Your Expressions
Always provide clear documentation in the brief=
parameter:
(=small_table_pl)
pb.Validate(data
.col_vals_expr(=pl.col("d") > pl.col("a") * 1.5,
expr
# Document which columns are being compared ---
="Column `d` should be at least 1.5 times larger than column `a`."
brief
)
.interrogate() )
Pointblank Validation | |||||||||||||
2025-06-22|01:20:57 Polars |
|||||||||||||
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | E | C | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
#4CA64C | 1 |
col_vals_expr()
Column |
✓ | 13 | 13 1.00 |
0 0.00 |
— | — | — | — |
Handle Edge Cases
Consider potential edge cases like division by zero or nulls:
(=small_table_pl)
pb.Validate(data
.col_vals_expr(
# Check denominator before division ---
=(pl.col("a") != 0) & (pl.col("d") / pl.col("a") > 1.5),
expr="Ratio of `d`/`a` should exceed 1.5 (avoiding division by zero)."
brief
)
.interrogate() )
Pointblank Validation | |||||||||||||
2025-06-22|01:20:57 Polars |
|||||||||||||
STEP | COLUMNS | VALUES | TBL | EVAL | UNITS | PASS | FAIL | W | E | C | EXT | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
#4CA64C | 1 |
col_vals_expr()
Ratio of |
✓ | 13 | 13 1.00 |
0 0.00 |
— | — | — | — |
Test on Small Datasets First
When developing complex expressions, test on a small sample of your data first to ensure your logic is correct before applying it to large datasets.
Conclusion
The col_vals_expr()
method provides a powerful way to implement complex validation logic in Pointblank when standard validation methods aren’t sufficient. By leveraging expressions, you can create sophisticated data quality checks tailored to your specific requirements, combining conditions across multiple columns and applying transformations as needed.
This flexibility makes expression-based validation an essential tool for addressing complex data quality scenarios in your validation workflows.