yaml_interrogate()function

Execute a YAML-based validation workflow.

USAGE

yaml_interrogate(yaml)

This is the main entry point for YAML-based validation workflows. It takes YAML configuration (as a string or file path) and returns a validated Validate object with interrogation results.

The YAML configuration defines the data source, validation steps, and optional settings like thresholds and labels. This function automatically loads the data, builds the validation plan, executes all validation steps, and returns the interrogated results.

Parameters

yaml : Union[str, Path]

YAML configuration as string or file path. Can be: (1) a YAML string containing the validation configuration, or (2) a Path object or string path to a YAML file.

Returns

Validate

An instance of the Validate class that has been configured based on the YAML input. This object contains the results of the validation steps defined in the YAML configuration. It includes metadata like table name, label, language, and thresholds if specified.

Raises

: YAMLValidationError

If the YAML is invalid, malformed, or execution fails. This includes syntax errors, missing required fields, unknown validation methods, or data loading failures.

Examples


For the examples here, we’ll use YAML configurations to define validation workflows. Let’s start with a basic YAML workflow that validates the built-in small_table dataset.

import pointblank as pb

# Define a basic YAML validation workflow
yaml_config = '''
tbl: small_table
steps:
- rows_distinct
- col_exists:
    columns: [date, a, b]
'''

# Execute the validation workflow
result = pb.yaml_interrogate(yaml_config)
result
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C66 1
rows_distinct
rows_distinct()
ALL COLUMNS 13 11
0.85
2
0.15
#4CA64C 2
col_exists
col_exists()
date 1 1
1.00
0
0.00
#4CA64C 3
col_exists
col_exists()
a 1 1
1.00
0
0.00
#4CA64C 4
col_exists
col_exists()
b 1 1
1.00
0
0.00

The validation table shows the results of our YAML-defined workflow. We can see that the rows_distinct() validation failed (because there are duplicate rows in the table), while the column existence checks passed.

Now let’s create a more comprehensive validation workflow with thresholds and metadata:

# Advanced YAML configuration with thresholds and metadata
yaml_config = '''
tbl: small_table
tbl_name: small_table_demo
label: Comprehensive data validation
thresholds:
  warning: 0.1
  error: 0.25
  critical: 0.35
steps:
- col_vals_gt:
    columns: [d]
    value: 100
- col_vals_regex:
    columns: [b]
    pattern: '[0-9]-[a-z]{3}-[0-9]{3}'
- col_vals_not_null:
    columns: [date, a]
'''

# Execute the validation workflow
result = pb.yaml_interrogate(yaml_config)
print(f"Table name: {result.tbl_name}")
print(f"Label: {result.label}")
print(f"Total validation steps: {len(result.validation_info)}")
Table name: small_table_demo
Label: Comprehensive data validation
Total validation steps: 4

The validation results now include our custom table name and label. The thresholds we defined will determine when validation steps are marked as warnings, errors, or critical failures.

You can also load YAML configurations from files. Here’s how you would work with a YAML file:

from pathlib import Path
import tempfile

# Create a temporary YAML file for demonstration
yaml_content = '''
tbl: small_table
tbl_name: File-based Validation
steps:
- col_vals_between:
    columns: [c]
    left: 1
    right: 10
- col_vals_in_set:
    columns: [f]
    set: [low, mid, high]
'''

with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
    f.write(yaml_content)
    yaml_file_path = Path(f.name)

# Load and execute validation from file
result = pb.yaml_interrogate(yaml_file_path)
result
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W E C EXT
#4CA64C66 1
col_vals_between
col_vals_between()
c [1, 10] 13 11
0.85
2
0.15
#4CA64C 2
col_vals_in_set
col_vals_in_set()
f low, mid, high 13 13
1.00
0
0.00

This approach is particularly useful for storing validation configurations as part of your data pipeline or version control system, allowing you to maintain validation rules alongside your code.