import pointblank as pb
# Define a basic YAML validation workflow
= '''
yaml_config tbl: small_table
steps:
- rows_distinct
- col_exists:
columns: [date, a, b]
'''
# Execute the validation workflow
= pb.yaml_interrogate(yaml_config)
result result
yaml_interrogate()function
Execute a YAML-based validation workflow.
USAGE
yaml_interrogate(yaml)
This is the main entry point for YAML-based validation workflows. It takes YAML configuration (as a string or file path) and returns a validated Validate
object with interrogation results.
The YAML configuration defines the data source, validation steps, and optional settings like thresholds and labels. This function automatically loads the data, builds the validation plan, executes all validation steps, and returns the interrogated results.
Parameters
yaml :
Union
[str
,Path
]-
YAML configuration as string or file path. Can be: (1) a YAML string containing the validation configuration, or (2) a Path object or string path to a YAML file.
Returns
Raises
:
YAMLValidationError
-
If the YAML is invalid, malformed, or execution fails. This includes syntax errors, missing required fields, unknown validation methods, or data loading failures.
Examples
For the examples here, we’ll use YAML configurations to define validation workflows. Let’s start with a basic YAML workflow that validates the built-in small_table
dataset.
The validation table shows the results of our YAML-defined workflow. We can see that the rows_distinct()
validation failed (because there are duplicate rows in the table), while the column existence checks passed.
Now let’s create a more comprehensive validation workflow with thresholds and metadata:
# Advanced YAML configuration with thresholds and metadata
= '''
yaml_config tbl: small_table
tbl_name: small_table_demo
label: Comprehensive data validation
thresholds:
warning: 0.1
error: 0.25
critical: 0.35
steps:
- col_vals_gt:
columns: [d]
value: 100
- col_vals_regex:
columns: [b]
pattern: '[0-9]-[a-z]{3}-[0-9]{3}'
- col_vals_not_null:
columns: [date, a]
'''
# Execute the validation workflow
= pb.yaml_interrogate(yaml_config)
result print(f"Table name: {result.tbl_name}")
print(f"Label: {result.label}")
print(f"Total validation steps: {len(result.validation_info)}")
Table name: small_table_demo
Label: Comprehensive data validation
Total validation steps: 4
The validation results now include our custom table name and label. The thresholds we defined will determine when validation steps are marked as warnings, errors, or critical failures.
You can also load YAML configurations from files. Here’s how you would work with a YAML file:
from pathlib import Path
import tempfile
# Create a temporary YAML file for demonstration
= '''
yaml_content tbl: small_table
tbl_name: File-based Validation
steps:
- col_vals_between:
columns: [c]
left: 1
right: 10
- col_vals_in_set:
columns: [f]
set: [low, mid, high]
'''
with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
f.write(yaml_content)= Path(f.name)
yaml_file_path
# Load and execute validation from file
= pb.yaml_interrogate(yaml_file_path)
result result
This approach is particularly useful for storing validation configurations as part of your data pipeline or version control system, allowing you to maintain validation rules alongside your code.