This is the main entry point for YAML-based validation workflows. It takes YAML configuration (as a string or file path) and returns a validated Validate object with interrogation results.
The YAML configuration defines the data source, validation steps, and optional settings like thresholds and labels. This function automatically loads the data, builds the validation plan, executes all validation steps, and returns the interrogated results.
Parameters
yaml:Union[str, Path]
YAML configuration as string or file path. Can be: (1) a YAML string containing the validation configuration, or (2) a Path object or string path to a YAML file.
set_tbl:Union[FrameT, Any, None]=None
An optional table to override the table specified in the YAML configuration. This allows you to apply a YAML-defined validation workflow to a different table than what’s specified in the configuration. If provided, this table will replace the table defined in the YAML’s tbl field before executing the validation workflow. This can be any supported table type including DataFrame objects, Ibis table objects, CSV file paths, Parquet file paths, GitHub URLs, or database connection strings.
Optional module namespaces to make available for Python code execution in YAML configurations. Can be a dictionary mapping aliases to module names or a list of module names. See the “Using Namespaces” section below for detailed examples.
An instance of the Validate class that has been configured based on the YAML input. This object contains the results of the validation steps defined in the YAML configuration. It includes metadata like table name, label, language, and thresholds if specified.
Raises
:YAMLValidationError
If the YAML is invalid, malformed, or execution fails. This includes syntax errors, missing required fields, unknown validation methods, or data loading failures.
Using Namespaces
The namespaces= parameter enables custom Python modules and functions in YAML configurations. This is particularly useful for custom action functions and advanced Python expressions.
Namespace formats:
Dictionary format: {"alias": "module.name"} maps aliases to module names
List format: ["module.name", "another.module"] imports modules directly
Option 1: Inline expressions (no namespaces needed)
2025-10-04 20:51:54 UTC< 1 s2025-10-04 20:51:54 UTC
Option 2: External functions with namespaces
# Define a custom action functiondef my_custom_action():print("Data validation failed: please check your data.")# Add to current module for demoimport syssys.modules[__name__].my_custom_action = my_custom_action# YAML that references the external functionyaml_config ='''tbl: small_tablethresholds: warning: 0.01actions: warning: python: actions.my_custom_actionsteps:- col_vals_gt: columns: [a] value: 1000 # This will fail'''# Use namespaces to make the function availableresult = pb.yaml_interrogate(yaml_config, namespaces={'actions': '__main__'})result
Data validation failed: please check your data.
Pointblank Validation
2025-10-04|20:51:54
PolarsWARNING0.01ERROR—CRITICAL—
STEP
COLUMNS
VALUES
TBL
EVAL
UNITS
PASS
FAIL
W
E
C
EXT
#AAAAAA
1
col_vals_gt()
a
1000
✓
13
0 0.00
13 1.00
●
—
—
2025-10-04 20:51:54 UTC< 1 s2025-10-04 20:51:54 UTC
This approach enables modular, reusable validation workflows with custom business logic.
Examples
For the examples here, we’ll use YAML configurations to define validation workflows. Let’s start with a basic YAML workflow that validates the built-in small_table dataset.
import pointblank as pb# Define a basic YAML validation workflowyaml_config ='''tbl: small_tablesteps:- rows_distinct- col_exists: columns: [date, a, b]'''# Execute the validation workflowresult = pb.yaml_interrogate(yaml_config)result
STEP
COLUMNS
VALUES
TBL
EVAL
UNITS
PASS
FAIL
W
E
C
EXT
#4CA64C66
1
rows_distinct()
ALL COLUMNS
—
✓
13
9 0.69
2 0.15
—
—
—
#4CA64C
2
col_exists()
date
—
✓
1
1 1.00
0 0.00
—
—
—
—
#4CA64C
3
col_exists()
a
—
✓
1
1 1.00
0 0.00
—
—
—
—
#4CA64C
4
col_exists()
b
—
✓
1
1 1.00
0 0.00
—
—
—
—
The validation table shows the results of our YAML-defined workflow. We can see that the rows_distinct() validation failed (because there are duplicate rows in the table), while the column existence checks passed.
Now let’s create a more comprehensive validation workflow with thresholds and metadata:
Table name: small_table_demo
Label: Comprehensive data validation
Total validation steps: 4
The validation results now include our custom table name and label. The thresholds we defined will determine when validation steps are marked as warnings, errors, or critical failures.
You can also load YAML configurations from files. Here’s how you would work with a YAML file:
from pathlib import Pathimport tempfile# Create a temporary YAML file for demonstrationyaml_content ='''tbl: small_tabletbl_name: File-based Validationsteps:- col_vals_between: columns: [c] left: 1 right: 10- col_vals_in_set: columns: [f] set: [low, mid, high]'''with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f: f.write(yaml_content) yaml_file_path = Path(f.name)# Load and execute validation from fileresult = pb.yaml_interrogate(yaml_file_path)result
STEP
COLUMNS
VALUES
TBL
EVAL
UNITS
PASS
FAIL
W
E
C
EXT
#4CA64C66
1
col_vals_between()
c
[1, 10]
✓
13
11 0.85
2 0.15
—
—
—
#4CA64C
2
col_vals_in_set()
f
low, mid, high
✓
13
13 1.00
0 0.00
—
—
—
—
This approach is particularly useful for storing validation configurations as part of your data pipeline or version control system, allowing you to maintain validation rules alongside your code.
Using set_tbl= to Override the Table
The set_tbl= parameter allows you to override the table specified in the YAML configuration. This is useful when you have a template validation workflow but want to apply it to different tables:
import polars as pl# Create a test table with similar structure to small_tabletest_table = pl.DataFrame({"date": ["2023-01-01", "2023-01-02", "2023-01-03"],"a": [1, 2, 3],"b": ["1-abc-123", "2-def-456", "3-ghi-789"],"d": [150, 200, 250]})# Use the same YAML config but apply it to our test tableyaml_config ='''tbl: small_table # This will be overriddentbl_name: Test Table # This name will be usedsteps:- col_exists: columns: [date, a, b, d]- col_vals_gt: columns: [d] value: 100'''# Execute with table overrideresult = pb.yaml_interrogate(yaml_config, set_tbl=test_table)print(f"Validation applied to: {result.tbl_name}")result
Validation applied to: Test Table
STEP
COLUMNS
VALUES
TBL
EVAL
UNITS
PASS
FAIL
W
E
C
EXT
#4CA64C
1
col_exists()
date
—
✓
1
1 1.00
0 0.00
—
—
—
—
#4CA64C
2
col_exists()
a
—
✓
1
1 1.00
0 0.00
—
—
—
—
#4CA64C
3
col_exists()
b
—
✓
1
1 1.00
0 0.00
—
—
—
—
#4CA64C
4
col_exists()
d
—
✓
1
1 1.00
0 0.00
—
—
—
—
#4CA64C
5
col_vals_gt()
d
100
✓
3
3 1.00
0 0.00
—
—
—
—
This feature makes YAML configurations more reusable and flexible, allowing you to define validation logic once and apply it to multiple similar tables.