This function generates random data that conforms to a schema’s column definitions. When the schema is defined using Field objects with constraints (e.g., min_val, max_val, pattern, preset), the generated data will respect those constraints.
This is a convenience function that wraps Schema.generate() for a more functional style of usage, similar to how load_dataset() loads built-in datasets.
Output format for the generated data. Options are: (1) "polars" (default) returns a Polars DataFrame, (2) "pandas" returns a Pandas DataFrame, and (3) "dict" returns a dictionary of lists.
country:str='US'
Country code for realistic data generation when using presets (e.g., preset="email", preset="address"). Accepts ISO 3166-1 alpha-2 codes (e.g., "US", "DE", "FR") or alpha-3 codes (e.g., "USA", "DEU", "FRA"). Default is "US".
Returns
DataFrame or dict
Generated data in the requested format.
Raises
:ValueError
If the schema has no columns or if constraints cannot be satisfied.
:ImportError
If required optional dependencies are not installed.
Supported Countries
The country= parameter controls the country used for generating realistic data with presets (e.g., preset="email", preset="address"). This affects location-specific formats like addresses, phone numbers, and postal codes. Currently, 50 countries are supported with full locale data:
Europe (32 countries): Austria ("AT"), Belgium ("BE"), Bulgaria ("BG"), Croatia ("HR"), Cyprus ("CY"), Czech Republic ("CZ"), Denmark ("DK"), Estonia ("EE"), Finland ("FI"), France ("FR"), Germany ("DE"), Greece ("GR"), Hungary ("HU"), Iceland ("IS"), Ireland ("IE"), Italy ("IT"), Latvia ("LV"), Lithuania ("LT"), Luxembourg ("LU"), Malta ("MT"), Netherlands ("NL"), Norway ("NO"), Poland ("PL"), Portugal ("PT"), Romania ("RO"), Russia ("RU"), Slovakia ("SK"), Slovenia ("SI"), Spain ("ES"), Sweden ("SE"), Switzerland ("CH"), United Kingdom ("GB")
Americas (7 countries): Argentina ("AR"), Brazil ("BR"), Canada ("CA"), Chile ("CL"), Colombia ("CO"), Mexico ("MX"), United States ("US")
Asia-Pacific (10 countries): Australia ("AU"), China ("CN"), Hong Kong ("HK"), India ("IN"), Indonesia ("ID"), Japan ("JP"), New Zealand ("NZ"), Philippines ("PH"), South Korea ("KR"), Taiwan ("TW")
Middle East (1 country): Turkey ("TR")
Examples
Generate test data from a schema with field constraints:
import pointblank as pbschema = pb.Schema( user_id=pb.int_field(min_val=1, unique=True), email=pb.string_field(preset="email"), age=pb.int_field(min_val=18, max_val=100), status=pb.string_field(allowed=["active", "pending", "inactive"]),)# Generate 100 rows of test datapb.preview(pb.generate_dataset(schema, n=100, seed=23))
PolarsRows100Columns4
user_id
Int64
email
String
age
Int64
status
String
1
7188536481533917197
vivienne.rios@gmail.com
55
pending
2
2674009078779859984
williamschaefer@aol.com
28
active
3
7652102777077138151
lilyhansen@hotmail.com
20
active
4
157503859921753049
shirley.mays27@aol.com
93
inactive
5
2829213282471975080
sean.dawson29@aol.com
57
pending
96
7027508096731143831
kathryn.green@hotmail.com
68
active
97
6055996548456656575
dmorris@yahoo.com
20
inactive
98
3822709996092631588
williamcooper@protonmail.com
38
inactive
99
1522653102058131295
l_sawyer@zoho.com
46
active
100
5690877051669225499
paisley_sandoval@gmail.com
19
pending
Generate data from a simple dtype-only schema as a Pandas DataFrame: