import pointblank as pb
import polars as pl
# Create sample data
= pl.DataFrame({
sample_data "id": [1, 2, 3, 4, 5],
"value": [10.5, 8.3, -2.1, 15.7, 7.2]
})
# Create a validation plan and assert that all steps pass
(=sample_data)
pb.Validate(data="id", value=0, brief="IDs must be positive")
.col_vals_gt(columns="value", value=-5, brief="Values should exceed -5")
.col_vals_gt(columns
# Will automatically `interrogate()` and raise an AssertionError if any validation fails ---
.assert_passing() )
Assertions
In addition to validation steps that create reports, Pointblank provides assertions. This is a lightweight way to confirm data quality by raising exceptions when validation conditions aren’t met. Assertions are particularly useful in:
- data processing pipelines where you need to halt execution if data doesn’t meet expectations
- testing environments where you want to verify data properties programmatically
- scripts and functions where you need immediate notification of data problems
Basic Assertion Workflow
The assertion workflow uses your familiar validation steps with assertion methods to check that validations meet your requirements:
This simple pattern allows you to integrate data quality checks into your data pipelines. With it, you can create clear stopping points when data doesn’t meet specified criteria.
Assertion Methods
Pointblank offers two types of assertions:
- Full Passing Assertions: using
assert_passing()
to verify that every single test unit passes - Threshold-Based Assertions: using
assert_below_threshold()
to verify that failure rates stay within acceptable thresholds
assert_passing()
The assert_passing()
method is the strictest form of assertion, requiring every single validation test unit to pass:
try:
(=sample_data)
pb.Validate(data="value", value=0)
.col_vals_gt(columns
# Direct assertion: automatically interrogates ---
.assert_passing()
)except AssertionError as e:
print("AssertionError:", str(e))
AssertionError: The following assertions failed:
- Step 1: Expect that values in `value` should be > `0`.
assert_below_threshold()
The assert_below_threshold()
method is more flexible as it allows some failures as long as they stay below specified threshold levels. Pointblank uses three severity thresholds that increase in order of seriousness:
- ‘warning’ (least severe): the first threshold that gets triggered when failures exceed this level
- ‘error’ (more severe): the middle threshold indicating more serious data quality issues
- ‘critical’ (most severe): the highest threshold indicating critical data quality problems
# Create a two-column DataFrame for this example
= pl.DataFrame({
tbl_pl "a": [4, 6, 9, 7, 12, 8, 7, 12, 10, 7],
"b": [9, 8, 10, 5, 10, 9, 14, 6, 6, 8],
})
# Set thresholds: warning=0.2 (20%), error=0.3 (30%), critical=0.4 (40%)
= (
validation =tbl_pl, thresholds=(0.2, 0.3, 0.4))
pb.Validate(data="b", value=5) # 1/10 failing (10% failure rate)
.col_vals_gt(columns="a", value=11) # 2/10 failing (20% failure rate)
.col_vals_lt(columns="b", value=8) # 3/10 failing (30% failure rate)
.col_vals_ge(columns
.interrogate()
)
validation
The validation report above visually indicates threshold levels with colored circles:
- gray circles in the
W
column indicate the ‘warning’ threshold - yellow circles in the
E
column indicate the ‘error’ threshold - red circles in the
C
column indicate the ‘critical’ threshold
This won’t pass the assert_below_threshold()
assertion for the ‘error’ level because step 3 exceeds this threshold (30% failure rate matches the error threshold):
try:
="error")
validation.assert_below_threshold(levelexcept AssertionError as e:
print("AssertionError:", str(e))
AssertionError: The following steps exceeded the error threshold level:
Step 3: Expect that values in `b` should be >= `8`.
We can check against the ‘error’ threshold for specific steps with the i=
parameter:
="error", i=[1, 2]) validation.assert_below_threshold(level
This passes because the highest threshold exceeded in steps 1 and 2 is ‘warning’.
The assert_below_threshold()
method takes these parameters:
level=
: threshold level to check against ("warning"
,"error"
, or"critical"
)i=
: optional specific step number(s) to checkmessage=
: optional custom error message
This is particularly useful when:
- working with real-world data where some percentage of failures is acceptable
- implementing different severity levels for data quality rules
- gradually improving data quality with stepped thresholds
Assertion methods like assert_passing()
and assert_below_threshold()
will automatically call interrogate()
if needed, so you don’t have to explicitly include this step when using assertions directly.
Using Status Check Methods
In addition to assertion methods that raise exceptions, Pointblank provides status check methods that return boolean values:
all_passed()
The all_passed()
method will return True
only if every single test unit in every validation step passed:
= (
validation =sample_data)
pb.Validate(data="value", value=0)
.col_vals_gt(columns
.interrogate()
)
if not validation.all_passed():
print("Validation failed: some values are not positive")
Validation failed: some values are not positive
warning()
, error()
, and critical()
The methods (warning()
, error()
, and critical()
) return information about whether validation steps exceeded that specific threshold level.
While assertion methods raise exceptions to halt execution when thresholds are exceeded, these status methods give you fine-grained control to implement custom logic based on different validation quality levels.
= (
validation =sample_data, thresholds=(0.05, 0.10, 0.20))
pb.Validate(data="value", value=0) # Some values are negative
.col_vals_gt(columns
.interrogate()
)
validation
The warning()
method returns a dictionary mapping step numbers to boolean values. A True
value means that step exceeds the warning threshold:
# Get dictionary of warning status for each step
= validation.warning()
warning_status print(f"Warning status: {warning_status}") # {1: True} means step 1 exceeds warning threshold
Warning status: {1: True}
You can check a specific step using the i=
parameter, and get a single boolean with scalar=True
:
# Check error threshold for specific step
= validation.error(i=1, scalar=True)
has_errors
if has_errors:
print("Step 1 exceeded the error threshold.")
Step 1 exceeded the error threshold.
Similarly, we can check if any steps exceed the ‘critical’ threshold:
# Check against critical threshold
= validation.critical()
critical_status print(f"Critical status: {critical_status}")
Critical status: {1: True}
These methods are particularly useful for:
- Conditional logic: taking different actions based on threshold severity
- Reporting: generating summary reports about validation quality
- Monitoring: tracking data quality trends over time
- Graceful degradation: implementing fallback logic when quality decreases
Each method has these options:
- without parameters: returns a dictionary mapping step numbers to boolean status values
- with
i=
: check specific step(s) - with
scalar=True
: return a single boolean instead of a dictionary (when checking a specific step)
While assertion methods raise exceptions to halt execution when thresholds are exceeded, these methods give you fine-grained control to implement custom logic based on different validation quality levels.
Customizing Error Messages
You can provide custom error messages when assertions fail to make them more meaningful in your specific workflow context:
# Create a validation with potential failures
= (
validation =sample_data, thresholds=(0.2, 0.3, 0.4))
pb.Validate(data="value", value=0)
.col_vals_gt(columns
.interrogate()
)
# Display the validation results
validation
When you need to customize the error message that appears when an assertion fails, use the message=
parameter:
try:
# Custom message for threshold assertion
validation.assert_below_threshold(="warning",
level="Data quality too low for processing!"
message
)except AssertionError as e:
print(f"Custom handling of failure: {e}")
Custom handling of failure: Data quality too low for processing!
Descriptive error messages are essential in production systems where multiple team members might need to interpret validation failures. The custom message lets you provide context appropriate to your specific workflow or data pipeline stage.
Combining Assertions with Actions
Actions and assertions serve complementary but distinct purposes in data validation workflows:
- Actions trigger during validation but shouldn’t raise errors (as this would halt report generation)
- Assertions are designed to raise errors based on specific conditions, making them ideal for flow control after validation completes
Here’s a simplified example showing how to use them together. The print statements simulate logging or monitoring that would be valuable in production data pipelines:
# Define a simple action function (won't raise errors)
def notify_quality_issue(message="Data quality issue detected"):
print(f"ACTION TRIGGERED: {message}")
# Create data with known failures
= pl.DataFrame({
problem_data "id": [1, 2, 3, -4, 5], # One negative ID
"value": [10.5, 8.3, -2.1, 15.7, 7.2] # One negative value
})
# First use actions for automated responses during validation
print("Running validation with actions...")
= (
validation =problem_data, thresholds=(0.1, 0.2, 0.3))
pb.Validate(data
.col_vals_gt(="id", value=0,
columns="IDs must be positive",
brief=pb.Actions(warning=notify_quality_issue)
actions
)# Actions trigger here but won't stop report generation
.interrogate()
)
# Then use assertions after validation for workflow control
print("\nNow using assertion for flow control...")
try:
="warning")
validation.assert_below_threshold(levelprint("This line won't execute if the assertion fails")
except AssertionError as e:
print(f"Validation failed threshold check: {e}")
print("Implementing fallback process...")
Running validation with actions...
ACTION TRIGGERED: Data quality issue detected
Now using assertion for flow control...
Validation failed threshold check: The following steps exceeded the warning threshold level:
Step 1: Expect that values in `id` should be > `0`.
Implementing fallback process...
This approach gives you the best of both worlds:
- Actions provide immediate notification during validation without interrupting the process
- Assertions control workflow execution after validation when important thresholds are exceeded
This pattern works well in data pipelines where you want both: (1) automated responses during validation and (2) clear decision points after validation is complete.
Best Practices for Assertions
When using assertions in your data workflows, consider these best practices:
Choose the right assertion type:
- use
assert_passing()
for critical validations where any failure is unacceptable - use
assert_below_threshold()
for validations where some failure rate is acceptable
- use
Set appropriate thresholds that match your data quality requirements:
# Example threshold strategy = pb.Validate( validation =sample_data, data# warning at 1%, error at 5%, critical at 10% =pb.Thresholds(warning=0.01, error=0.05, critical=0.10) thresholds )
Use a graduated approach to validation severity:
# Critical validations: must be perfect validation_1.assert_passing() # Important validations: must be below error threshold ="error") validation_2.assert_below_threshold(level # Monitor-only validations: check warning status = validation_3.warning() warning_status
Placement in pipelines: place assertions at critical points where data quality is essential
Error handling: wrap assertions in try-except blocks for better error handling in production systems
Combine with reporting: use both assertions and reporting approaches for comprehensive quality control
Conclusion
Pointblank’s assertion methods give you flexible options for enforcing data quality requirements:
assert_passing()
for strict validation where every test unit must passassert_below_threshold()
for more flexible validation where some failures are tolerable- Status methods (
warning()
,error()
, andcritical()
) for programmatic threshold checking
By using these assertion methods appropriately, you can build robust data pipelines with different levels of quality enforcement (from strict validation of critical data properties to more lenient checks for less critical aspects). This graduated approach to data quality helps create systems that are both reliable and practical in real-world data environments.