Create a Schema from an existing table with inferred Field constraints.
schema_from_tbl(
tbl,
*,
infer_constraints=True,
categorical_threshold=20,
detect_presets=True,
sample_size=None
)
This is the functional form of Schema.from_table(). It inspects the actual values in the table to infer rich constraints (min/max, uniqueness, null rates, allowed values, presets) suitable for synthetic data generation via schema.generate() or generate_dataset().
Parameters
tbl: Any
-
A Polars DataFrame, Pandas DataFrame, or Ibis table (DuckDB, SQLite, etc.).
infer_constraints: bool = True
-
When True (default), inspect values to infer min/max, uniqueness, null rates, etc. When False, behave like Schema(tbl=df) (dtype only).
categorical_threshold: int | float = 20
-
If a column has <= this many unique values (int) or this fraction of total rows (float between 0 and 1), treat it as categorical and populate allowed=. Default is 20.
detect_presets: bool = True
-
Attempt to match string columns to known generation presets (e.g., email, url, phone_number) based on column name heuristics and value validation. Default is True.
sample_size: int | None = None
-
If set, sample this many rows before analysis (useful for very large tables).
None means use all rows.
Returns
Schema
-
A Schema populated with Field objects containing inferred constraints, ready for use with
schema.generate() or generate_dataset().
Examples
import pointblank as pb
import polars as pl
df = pl.DataFrame({
"user_id": list(range(1, 51)),
"email": [f"user{i}@example.com" for i in range(50)],
"age": [20 + i % 50 for i in range(50)],
"status": ["active", "pending", "inactive"] * 16 + ["active", "pending"],
})
schema = pb.schema_from_tbl(df)
print(schema)
Pointblank Schema
user_id: IntField(dtype='Int64', nullable=False, null_probability=0.0, unique=True, min_val=1, max_val=50, allowed=None)
email: StringField(dtype='String', nullable=False, null_probability=0.0, unique=True, min_length=None, max_length=None, pattern=None, preset='email', allowed=None)
age: IntField(dtype='Int64', nullable=False, null_probability=0.0, unique=True, min_val=20, max_val=69, allowed=None)
status: StringField(dtype='String', nullable=False, null_probability=0.0, unique=False, min_length=None, max_length=None, pattern=None, preset=None, allowed=['active', 'inactive', 'pending'])
Generate synthetic data matching the original’s characteristics:
pb.preview(schema.generate(n=10, seed=23))
|
|
|
|
|
|
| 1 |
50 |
doris.martin@yandex.com |
69 |
inactive |
| 2 |
19 |
ngonzalez74@yahoo.com |
38 |
active |
| 3 |
6 |
jessica379@protonmail.com |
25 |
active |
| 4 |
2 |
george_evans@yahoo.com |
21 |
pending |
| 5 |
38 |
p_williams@outlook.com |
57 |
inactive |
| 6 |
20 |
andreamitchell@mail.com |
39 |
inactive |
| 7 |
28 |
maria.valentine@mail.com |
47 |
inactive |
| 8 |
25 |
vwalker@gmail.com |
44 |
pending |
| 9 |
34 |
brenda.lopez@zoho.com |
53 |
inactive |
| 10 |
23 |
laurendavis@aol.com |
42 |
active |