Writing a validation object to disk with write_file() can be useful for keeping data validation results close at hand for later retrieval (with read_file()). By default, any data table that the validation object holds will be removed before writing to disk (not applicable if no data table is present). This behavior can be changed by setting keep_tbl=True, but this only works when the table is not of a database type (e.g., DuckDB, PostgreSQL, etc.), as database connections cannot be serialized.
Extract data from failing validation steps can also be preserved by setting keep_extracts=True, which is useful for later analysis of data quality issues.
The serialized file uses Python’s pickle format for storage of the validation object state, including all validation results, metadata, and optionally the source data.
Important note. If your validation uses custom preprocessing functions (via the pre= parameter), these functions must be defined at the module level (not interactively or as lambda functions) to ensure they can be properly restored when loading the validation in a different Python session. Read the Creating Serializable Validations section below for more information.
The filename to create on disk for the validation object. Should not include the file extension as .pkl will be added automatically.
path:str | None=None
An optional directory path where the file should be saved. If not provided, the file will be saved in the current working directory. The directory will be created if it doesn’t exist.
keep_tbl:bool=False
An option to keep the data table that is associated with the validation object. The default is False where the data table is removed before writing to disk. For database tables (e.g., Ibis tables with database backends), the table is always removed even if keep_tbl=True, as database connections cannot be serialized.
keep_extracts:bool=False
An option to keep any collected extract data for failing rows from validation steps. By default, this is False (i.e., extract data is removed to save space).
quiet:bool=False
Should the function not inform when the file is written? By default, this is False, so a message will be printed when the file is successfully written.
Returns
None
This function doesn’t return anything but saves the validation object to disk.
Creating Serializable Validations
To ensure your validations work reliably across different Python sessions, the recommended approach is to use module-Level functions. So, create a separate Python file for your preprocessing functions:
# your_main_script.pyimport pointblank as pbfrom preprocessing_functions import multiply_by_100, add_computed_columnvalidation = ( pb.Validate(data=my_data) .col_vals_gt(columns="value", value=500, pre=multiply_by_100) .col_vals_between(columns="computed", left=50, right=1000, pre=add_computed_column) .interrogate())# Save validation and it will work reliably across sessionspb.write_file(validation, "my_validation", keep_tbl=True)
Problematic Patterns to Avoid
Don’t use lambda functions as they will cause immediate errors.
Don’t use interactive function definitions (as they may fail when loading).
def my_function(df): # Defined in notebook/REPLreturn df.with_columns(pl.col("value") *2)validation = pb.Validate(data).col_vals_gt( columns="value", value=100, pre=my_function)
Automatic Analysis and Guidance
When you call write_file(), it automatically analyzes your validation and provides:
confirmation when all functions will work reliably
warnings for functions that may cause cross-session issues
clear errors for unsupported patterns (lambda functions)
specific recommendations and code examples
loading instructions tailored to your validation
Loading Your Validation
To load a saved validation in a new Python session:
# In a new Python sessionimport pointblank as pb# Import the same preprocessing functions used when creating the validationfrom preprocessing_functions import multiply_by_100, add_computed_column# Upon loading the validation, functions will be automatically restoredvalidation = pb.read_file("my_validation.pkl")
** Testing Your Validation:**
To verify your validation works across sessions:
save your validation in one Python session
start a fresh Python session (restart kernel/interpreter)
test that preprocessing functions work as expected
Performance and Storage
use keep_tbl=False (default) to reduce file size when you don’t need the original data
use keep_extracts=False (default) to save space by excluding extract data
set quiet=True to suppress guidance messages in automated scripts
files are saved using pickle’s highest protocol for optimal performance
Examples
Let’s create a simple validation and save it to disk:
import pointblank as pb# Create a validationvalidation = ( pb.Validate(data=pb.load_dataset("small_table"), label="My validation") .col_vals_gt(columns="d", value=100) .col_vals_regex(columns="b", pattern=r"[0-9]-[a-z]{3}-[0-9]{3}") .interrogate())# Save to disk (without the original table data)pb.write_file(validation, "my_validation")
Serialization Analysis:
✓ No preprocessing functions detected
✓ This validation should serialize and load reliably across sessions
✅ Validation object written to: my_validation.pkl
📖 To load: validation = pb.read_file('my_validation.pkl')
To keep the original table data for later analysis:
# Save with the original table data includedpb.write_file(validation, "my_validation_with_data", keep_tbl=True)
Serialization Analysis:
✓ No preprocessing functions detected
✓ This validation should serialize and load reliably across sessions
✅ Validation object written to: my_validation_with_data.pkl
📖 To load: validation = pb.read_file('my_validation_with_data.pkl')
You can also specify a custom directory and keep extract data:
For validations that use preprocessing functions to be portable across sessions, define your functions in a separate .py file:
# In `preprocessing_functions.py`import polars as pldef multiply_by_100(df):return df.with_columns(pl.col("value") *100)def add_computed_column(df):return df.with_columns(computed=pl.col("value") *2+10)
Then import and use them in your validation:
# In your main scriptimport pointblank as pbfrom preprocessing_functions import multiply_by_100, add_computed_columnvalidation = ( pb.Validate(data=my_data) .col_vals_gt(columns="value", value=500, pre=multiply_by_100) .col_vals_between(columns="computed", left=50, right=1000, pre=add_computed_column) .interrogate())# This validation can now be saved and loaded reliablypb.write_file(validation, "my_validation", keep_tbl=True)
When you load this validation in a new session, simply import the preprocessing functions again and they will be automatically restored.
See Also
Use the read_file() function to load a validation object that was previously saved with write_file().