Many teams already have data schemas defined in other tools: JSON Schema files for API validation, Frictionless Table Schemas for open data, dbt schema.yml files for analytics pipelines, or Pandera/Pydantic models in application code. Rather than manually rewriting these specifications as Pointblank validation steps, you can import them directly.
The import_contract() function reads an external schema definition and produces a ContractImport object containing everything Pointblank needs to validate data: column types, constraints, and mapped validation steps. From there you can create a Validate workflow, a Contract object, generate equivalent Python code, or produce a YAML definition, all with a single function call.
Quick Start
The fastest path from an external schema to running validation:
import pointblank as pbimport polars as pl# Define a JSON Schema (could also be loaded from a file)user_schema = {"type": "object","properties": {"user_id": {"type": "integer"},"email": {"type": "string", "format": "email"},"age": {"type": "integer", "minimum": 0, "maximum": 150},"status": {"type": "string", "enum": ["active", "inactive", "pending"]}, },"required": ["user_id", "email"],}# Import the schemaresult = pb.import_contract(user_schema, format="json_schema")# Create sample data and validateusers = pl.DataFrame( {"user_id": [1, 2, 3, 4, 5],"email": ["alice@example.com","bob@corp.io","charlie@mail.org","dave@startup.co","eve@company.net", ],"age": [28, 34, 45, 22, 31],"status": ["active", "active", "inactive", "pending", "active"], })result.to_validate(data=users).interrogate()
That’s it. The JSON Schema minimum, maximum, enum, format, and required keywords were automatically translated into the appropriate Pointblank validation steps. Each keyword becomes a dedicated validation check: minimum becomes col_vals_ge(), maximum becomes col_vals_le(), enum becomes col_vals_in_set(), and so on. The schema’s required array generates col_vals_not_null() steps for each listed field, ensuring that null values are caught at validation time.
How It Works
The import process has three stages:
Parse: the external schema is read (from a file path, a dict, or a Python object)
Map: each constraint in the source format is translated to a Pointblank validation method
Package: the results are stored in a ContractImport object with multiple output options
flowchart LR
A[External Schema] --> B[import_contract]
B --> C[ContractImport]
C --> D[.to_validate‹data›]
C --> E[.to_contract‹›]
C --> F[.to_python‹›]
C --> G[.to_yaml‹›]
The ContractImport object is your bridge between the external world and Pointblank. It doesn’t execute anything on its own. Rather, it holds the translated specification and lets you choose how to use it. This separation is intentional: you can inspect the translation results, check for any warnings, and decide how to proceed before committing to a particular output format.
If any constraints couldn’t be mapped, they appear in .warnings:
# A schema with an unmappable formatschema_with_date = {"type": "object","properties": {"created_at": {"type": "string", "format": "date-time"}, },}result = pb.import_contract(schema_with_date, format="json_schema")if result.warnings:for w in result.warnings:print(f"⚠ {w}")print(f"\nCoverage: {result.coverage:.0%}")
⚠ Column 'created_at': JSON Schema format 'date-time' has no Pointblank equivalent — skipped.
Coverage: 0%
The coverage metric tells you what fraction of the source constraints were successfully translated. A coverage of 100% means everything mapped cleanly; lower values mean some constraints were skipped (with details in warnings). This transparency is important because no translation between formats is perfect. By checking coverage and warnings before running validation, you can be confident about exactly which parts of your original schema are being enforced and which parts might need manual attention.
Supported Formats
Pointblank ships with adapters for the two most universal tabular schema formats. Additional adapters (dbt, Pydantic, Pandera) are planned for future releases.
JSON Schema is a widely used format for describing the structure of JSON data. Because tabular data (DataFrames) can be modeled as arrays of JSON objects, JSON Schema is a natural fit for defining column-level constraints.
Each constraint from the JSON Schema has been translated into the corresponding Pointblank method call. The pattern keyword on the sku field became a col_vals_regex() step, exclusiveMinimum became col_vals_gt() (note the strict inequality), and the three required fields each generated a col_vals_not_null() step. You can iterate over result.constraints like this to verify the translation before running any validation.
Frictionless Data Table Schema
Frictionless Data is a set of standards for describing and packaging data. The Table Schema format is particularly well-suited for tabular data validation, with explicit support for column types, constraints, and primary/foreign keys.
Notice how the primaryKey field generates both a not-null check and a uniqueness check for item_id. This is the correct semantic interpretation: a primary key must always be present and must uniquely identify each row. The field-level constraints.required and constraints.unique also contribute their own checks, so the adapter deduplicates where appropriate.
Importing from a Data Package:
Data Packages bundle multiple resources (tables) together. You can select which resource to import by name or index:
The resource= parameter accepts either a string (the resource name) or an integer (the resource index). When omitted, the first resource in the package is used. This makes it straightforward to work with multi-table data packages where each table has its own schema definition.
Output Options
Once you have a ContractImport, you can use it in several ways depending on your workflow.
Direct Validation with .to_validate()
The most common path is to get a Validate object, pass your data, and run it:
The .to_validate() method returns a fully configured Validate object with all imported constraints already applied as validation steps. You get the familiar validation report showing pass/fail counts for each check. Because the Validate object is not yet interrogated when created, you also have the option of adding additional validation steps before calling .interrogate().
You can also pass additional arguments to the Validate constructor:
Any keyword argument accepted by the Validate class can be passed through here, including tbl_name, label, thresholds, owner, and consumers. This gives you full control over how the validation is configured without needing to modify the import result.
Creating a Reusable Contract with .to_contract()
If you want to store the imported schema as a Pointblank Contract for use in pipelines or repeated validation:
The resulting Contract can be serialized to YAML, used in Pipeline, or shared with other teams. This is particularly valuable when you want to maintain a stable contract definition that outlives the original external schema file. The Contract object carries all the metadata (version, owner, description) that makes it suitable for team workflows and CI/CD pipelines.
Generating Python Code with .to_python()
When you want to see (or save) the equivalent Pointblank Python code that would be generated from the import:
The generated code is syntactically valid Python that you can copy directly into a script or notebook. It uses the standard Pointblank method-chaining style, making it easy to read and modify. This is especially useful for:
understanding exactly what validation steps an import produces
generating starter code that you can then customize
documentation and onboarding (show teams what their schema “means” in validation terms)
Once you have the generated code, you can paste it into your project and modify it freely. Add extra validation steps, remove checks that don’t apply, or adjust parameter values. The generated code has no dependency on the original schema file, so it serves as a clean handoff point between the schema world and your Python codebase.
Generating YAML with .to_yaml()
For workflows that use Pointblank’s YAML-based validation:
The YAML output follows Pointblank’s validation YAML format, with each constraint appearing as a separate step entry. You can save this output to a file and use it with pb.yaml_interrogate() or pb.validate_yaml() for configuration-driven workflows where validation rules are managed as YAML files rather than Python code.
Exporting Contracts
The reverse operation (taking a Pointblank Contract or Validate object and writing it out in an external format) is handled by export_contract():
Each format produces the output structure that is native to that standard. JSON Schema export creates a valid $schema-annotated document with properties, type, and required fields. Frictionless export creates a Table Schema with fields and constraints entries. Both formats can be fed directly into tools that consume those standards, such as form validators, data catalogs, or documentation generators.
When a destination path is provided, the output is written to that file (creating parent directories as needed) and also returned from the function. This makes it convenient to both persist the output and inspect it in the same call.
Round-Trip Fidelity
Importing a schema and then exporting it back should produce an equivalent result. This is important for workflows where you maintain schemas in an external format but want to validate with Pointblank:
# Start with a JSON Schemaoriginal = {"type": "object","properties": {"score": {"type": "integer", "minimum": 0, "maximum": 100},"grade": {"type": "string", "enum": ["A", "B", "C", "D", "F"]}, },"required": ["score"],}# Import → Contract → Exportimported = pb.import_contract(original, format="json_schema")contract = imported.to_contract(name="grades")exported = pb.export_contract(contract, format="json_schema")# The exported schema preserves the constraintsprint(f"Original constraints on 'score': minimum={original['properties']['score']['minimum']}, "f"maximum={original['properties']['score']['maximum']}")print(f"Exported constraints on 'score': minimum={exported['properties']['score'].get('minimum')}, "f"maximum={exported['properties']['score'].get('maximum')}")
Original constraints on 'score': minimum=0, maximum=100
Exported constraints on 'score': minimum=0, maximum=100
Round-trip fidelity is tested as part of Pointblank’s test suite. The general guarantee is that any constraint that can be expressed in both Pointblank and the target format will survive the round trip. Constraints that are unique to one format (like JSON Schema’s $ref or Pointblank’s pre= argument) may not survive, but the core numeric bounds, enum checks, null checks, and pattern constraints will always round-trip cleanly.
Auto-Detection
When the format is obvious from the source content, you can omit the format= parameter:
# JSON Schema: detected by presence of "$schema" or "type" + "properties"result = pb.import_contract({"type": "object", "properties": {"x": {"type": "integer"}}})print(f"Detected: {result.source_format}")# Frictionless: detected by presence of "fields" listresult = pb.import_contract({"fields": [{"name": "x", "type": "integer"}]})print(f"Detected: {result.source_format}")
Detected: json_schema
Detected: frictionless
For file-based imports, the extension is also used for detection (.schema.json maps to JSON Schema, .resource.json or .datapackage.json maps to Frictionless). Auto-detection is a convenience feature that works well for common cases. When working with ambiguous files or dict inputs that could match multiple formats, it is best to specify format= explicitly to avoid any possibility of misdetection.
Combining Imports with Extra Checks
An imported schema gives you a baseline, but you can always add more Pointblank checks on top:
imported = pb.import_contract(user_schema, format="json_schema")# Start from the import but add custom checksvalidation = ( imported .to_validate(data=users, tbl_name="enriched_check") .col_vals_regex(columns="email", pattern=r".*\.(com|io|org|net|co)$") .rows_distinct(columns_subset="user_id") .interrogate())validation
This pattern works well when the external schema covers structural and type constraints, but your team has additional business rules that only make sense in the Pointblank context. The imported constraints form the foundation, and your additional .col_vals_*() or .rows_*() calls layer on top. Because .to_validate() returns a standard Validate object, you have full access to the entire Pointblank API for adding checks, setting thresholds, or attaching actions.
Migration from Other Tools
A key use case for import_contract() is migration: bringing existing validation definitions from other tools into Pointblank without manual rewriting.
Coming from JSON Schema
If your team uses JSON Schema for API validation and you want the same rules applied to DataFrames:
# Your existing JSON Schema (maybe generated by your API framework)result = pb.import_contract("api/schemas/user.schema.json")# Now use it for DataFrame validation in your data pipelinevalidation = result.to_validate(data=raw_users_df).interrogate()
This approach is particularly powerful when your API team already maintains JSON Schema definitions for request/response validation. Those same schemas can now serve double duty: validating API payloads at the service boundary and validating the resulting DataFrames in your analytics pipeline. You get consistent enforcement across both layers without writing the rules twice.
Coming from Frictionless
If you have data packages from open data sources or research datasets:
# Import from an existing data package descriptorresult = pb.import_contract("data/datapackage.json", resource="observations")# Validate the actual CSV data against the declared schemavalidation = result.to_validate(data=observations_df).interrogate()
Frictionless Data Packages are common in open data portals, government datasets, and academic research repositories. By importing their Table Schemas directly, you can validate downloaded data against its declared structure without needing to manually inspect the descriptor file and rewrite each constraint. This is especially valuable when working with unfamiliar datasets where the schema descriptor is your primary documentation of what the data should contain.
Generating a Starting Point
Even if you don’t plan to keep using the external format, importing is a great way to bootstrap a Pointblank contract:
# Import from your existing schemaimported = pb.import_contract("legacy_schema.json", format="json_schema")# Save as a YAML contract you'll maintain going forwardcontract = imported.to_contract(name="my_table", version="1.0.0")contract.to_yaml("contracts/my_table.yaml")
Now you have a Pointblank-native contract that you can extend and evolve independently of the original source. You can add new validation steps, adjust thresholds, or incorporate business rules that go beyond what the original schema format could express.
Conclusion
The contract import/export system lets you bridge the gap between external schema definitions and Pointblank’s validation engine. Rather than maintaining duplicate specifications across tools, you can keep your source of truth in whichever format suits your team and import it into Pointblank whenever you need runtime validation. The key points to remember:
Use pb.import_contract() to read external schemas and translate them into Pointblank checks
The ContractImport object gives you multiple output options: direct validation, reusable contracts, generated Python code, or YAML
Check .coverage and .warnings to understand how completely the translation covered your original schema
Use pb.export_contract() to write Pointblank contracts back to external formats for sharing with other tools
Combine imports with additional Pointblank-specific checks for the most thorough validation coverage
Whether you are migrating from another validation tool, bootstrapping contracts from existing schemas, or maintaining interoperability with external systems, the adapter framework gives you a clean path between external specifications and Pointblank’s validation engine. As new adapters are added in future releases, the same import_contract() interface will continue to work, so any code you write today will gain new format support automatically.