validate_yaml()function

Validate YAML configuration against the expected structure.

USAGE

validate_yaml(yaml)

This function validates that a YAML configuration conforms to the expected structure for validation workflows. It checks for required fields, proper data types, and valid validation method names. This is useful for validating configurations before execution or for building configuration editors and validators.

The function performs comprehensive validation including:

Parameters

yaml : Union[str, Path]

YAML configuration as string or file path. Can be: (1) a YAML string containing the validation configuration, or (2) a Path object or string path to a YAML file.

Raises

: YAMLValidationError

If the YAML is invalid, malformed, or execution fails. This includes syntax errors, missing required fields, unknown validation methods, or data loading failures.

Examples


For the examples here, we’ll demonstrate how to validate YAML configurations before using them with validation workflows. This is particularly useful for building robust data validation systems where you want to catch configuration errors early.

Let’s start with validating a basic configuration:

import pointblank as pb

# Define a basic YAML validation configuration
yaml_config = '''
tbl: small_table
steps:
- rows_distinct
- col_exists:
    columns: [a, b]
'''

# Validate the configuration: no exception means it's valid
pb.validate_yaml(yaml_config)
print("Basic YAML configuration is valid")
Basic YAML configuration is valid

The function completed without raising an exception, which means our configuration is valid and follows the expected structure.

Now let’s validate a more complex configuration with thresholds and metadata:

# Complex YAML configuration with all optional fields
yaml_config = '''
tbl: small_table
tbl_name: My Dataset
label: Quality check
lang: en
locale: en
thresholds:
  warning: 0.1
  error: 0.25
  critical: 0.35
steps:
- rows_distinct
- col_vals_gt:
    columns: [d]
    value: 100
- col_vals_regex:
    columns: [b]
    pattern: '[0-9]-[a-z]{3}-[0-9]{3}'
'''

# Validate the configuration
pb.validate_yaml(yaml_config)
print("Complex YAML configuration is valid")

# Count the validation steps
import pointblank.yaml as pby
config = pby.load_yaml_config(yaml_config)
print(f"Configuration has {len(config['steps'])} validation steps")
Complex YAML configuration is valid
Configuration has 3 validation steps

This configuration includes all the optional metadata fields and complex validation steps, demonstrating that the validation handles the full range of supported options.

Let’s see what happens when we try to validate an invalid configuration:

# Invalid YAML configuration: missing required 'tbl' field
invalid_yaml = '''
steps:
- rows_distinct
'''

try:
    pb.validate_yaml(invalid_yaml)
except pb.yaml.YAMLValidationError as e:
    print(f"Validation failed: {e}")
Validation failed: Error loading YAML configuration: YAML must contain 'tbl' field

The validation correctly identifies that our configuration is missing the required 'tbl' field.

Here’s a practical example of using validation in a workflow builder:

def safe_yaml_interrogate(yaml_config):
    """Safely execute a YAML configuration after validation."""
    try:
        # Validate the YAML configuration first
        pb.validate_yaml(yaml_config)
        print("✓ YAML configuration is valid")

        # Then execute the workflow
        result = pb.yaml_interrogate(yaml_config)
        print(f"Validation completed with {len(result.validation_info)} steps")
        return result

    except pb.yaml.YAMLValidationError as e:
        print(f"Configuration error: {e}")
        return None

# Test with a valid YAML configuration
test_yaml = '''
tbl: small_table
steps:
- col_vals_between:
    columns: [c]
    left: 1
    right: 10
'''

result = safe_yaml_interrogate(test_yaml)
✓ YAML configuration is valid
Validation completed with 1 steps

This pattern of validating before executing helps build more reliable data validation pipelines by catching configuration errors early in the process.

Note that this function only validates the structure and does not check if the specified data source (‘tbl’) exists or is accessible. Data source validation occurs during execution with yaml_interrogate().

See Also

yaml_interrogate : execute YAML-based validation workflows