Validate YAML configuration against the expected structure.
USAGE
validate_yaml(yaml)
This function validates that a YAML configuration conforms to the expected structure for validation workflows. It checks for required fields, proper data types, and valid validation method names. This is useful for validating configurations before execution or for building configuration editors and validators.
The function performs comprehensive validation including:
required fields (‘tbl’ and ‘steps’)
proper data types for all fields
valid threshold configurations
known validation method names
proper step configuration structure
Parameters
yaml:Union[str, Path]
YAML configuration as string or file path. Can be: (1) a YAML string containing the validation configuration, or (2) a Path object or string path to a YAML file.
Raises
:YAMLValidationError
If the YAML is invalid, malformed, or execution fails. This includes syntax errors, missing required fields, unknown validation methods, or data loading failures.
Examples
For the examples here, we’ll demonstrate how to validate YAML configurations before using them with validation workflows. This is particularly useful for building robust data validation systems where you want to catch configuration errors early.
Let’s start with validating a basic configuration:
import pointblank as pb# Define a basic YAML validation configurationyaml_config ='''tbl: small_tablesteps:- rows_distinct- col_exists: columns: [a, b]'''# Validate the configuration: no exception means it's validpb.validate_yaml(yaml_config)print("Basic YAML configuration is valid")
Basic YAML configuration is valid
The function completed without raising an exception, which means our configuration is valid and follows the expected structure.
Now let’s validate a more complex configuration with thresholds and metadata:
# Complex YAML configuration with all optional fieldsyaml_config ='''tbl: small_tabletbl_name: My Datasetlabel: Quality checklang: enlocale: enthresholds: warning: 0.1 error: 0.25 critical: 0.35steps:- rows_distinct- col_vals_gt: columns: [d] value: 100- col_vals_regex: columns: [b] pattern: '[0-9]-[a-z]{3}-[0-9]{3}''''# Validate the configurationpb.validate_yaml(yaml_config)print("Complex YAML configuration is valid")# Count the validation stepsimport pointblank.yaml as pbyconfig = pby.load_yaml_config(yaml_config)print(f"Configuration has {len(config['steps'])} validation steps")
Complex YAML configuration is valid
Configuration has 3 validation steps
This configuration includes all the optional metadata fields and complex validation steps, demonstrating that the validation handles the full range of supported options.
Let’s see what happens when we try to validate an invalid configuration:
Validation failed: Error loading YAML configuration: YAML must contain 'tbl' field
The validation correctly identifies that our configuration is missing the required 'tbl' field.
Here’s a practical example of using validation in a workflow builder:
def safe_yaml_interrogate(yaml_config):"""Safely execute a YAML configuration after validation."""try:# Validate the YAML configuration first pb.validate_yaml(yaml_config)print("✓ YAML configuration is valid")# Then execute the workflow result = pb.yaml_interrogate(yaml_config)print(f"Validation completed with {len(result.validation_info)} steps")return resultexcept pb.yaml.YAMLValidationError as e:print(f"Configuration error: {e}")returnNone# Test with a valid YAML configurationtest_yaml ='''tbl: small_tablesteps:- col_vals_between: columns: [c] left: 1 right: 10'''result = safe_yaml_interrogate(test_yaml)
✓ YAML configuration is valid
Validation completed with 1 steps
This pattern of validating before executing helps build more reliable data validation pipelines by catching configuration errors early in the process.
Note that this function only validates the structure and does not check if the specified data source (‘tbl’) exists or is accessible. Data source validation occurs during execution with yaml_interrogate().