This function provides direct access to the file paths of datasets included with Pointblank. These paths can be used in examples and documentation to demonstrate file-based data loading without requiring the actual data files. The returned paths can be used with Validate(data=path) to demonstrate CSV and Parquet file loading capabilities.
The file format to get the path for. Options are "csv", "parquet", or "duckdb".
Returns
str
The file path to the requested dataset file.
Included Datasets
The available datasets are the same as those in load_dataset():
"small_table": A small dataset with 13 rows and 8 columns. Ideal for testing and examples.
"game_revenue": A dataset with 2000 rows and 11 columns. Revenue data for a game company.
"nycflights": A dataset with 336,776 rows and 18 columns. Flight data from NYC airports.
"global_sales": A dataset with 50,000 rows and 20 columns. Global sales data across regions.
File Types
Each dataset is available in multiple formats:
"csv": Comma-separated values file (.csv)
"parquet": Parquet file (.parquet)
"duckdb": DuckDB database file (.ddb)
Examples
Get the path to a CSV file and use it with Validate:
import pointblank as pb# Get path to the small_table CSV filecsv_path = pb.get_data_path("small_table", "csv")print(csv_path)# Use the path directly with Validatevalidation = ( pb.Validate(data=csv_path) .col_exists(["a", "b", "c"]) .col_vals_gt(columns="d", value=0) .interrogate())validation
/tmp/tmp22p14vgo.csv
Pointblank Validation
2025-06-22|01:24:06
Polars
STEP
COLUMNS
VALUES
TBL
EVAL
UNITS
PASS
FAIL
W
E
C
EXT
#4CA64C
1
col_exists()
a
—
✓
1
1 1.00
0 0.00
—
—
—
—
#4CA64C
2
col_exists()
b
—
✓
1
1 1.00
0 0.00
—
—
—
—
#4CA64C
3
col_exists()
c
—
✓
1
1 1.00
0 0.00
—
—
—
—
#4CA64C
4
col_vals_gt()
d
0
✓
13
13 1.00
0 0.00
—
—
—
—
2025-06-22 01:24:06 UTC< 1 s2025-06-22 01:24:06 UTC
Get a Parquet file path for validation examples:
# Get path to the game_revenue Parquet fileparquet_path = pb.get_data_path(dataset="game_revenue", file_type="parquet")# Validate the Parquet file directlyvalidation = ( pb.Validate(data=parquet_path, label="Game Revenue Data Validation") .col_vals_not_null(columns=["player_id", "session_id"]) .col_vals_gt(columns="item_revenue", value=0) .interrogate())validation
Pointblank Validation
Game Revenue Data Validation
Polars
STEP
COLUMNS
VALUES
TBL
EVAL
UNITS
PASS
FAIL
W
E
C
EXT
#4CA64C
1
col_vals_not_null()
player_id
—
✓
2000
2000 1.00
0 0.00
—
—
—
—
#4CA64C
2
col_vals_not_null()
session_id
—
✓
2000
2000 1.00
0 0.00
—
—
—
—
#4CA64C
3
col_vals_gt()
item_revenue
0
✓
2000
2000 1.00
0 0.00
—
—
—
—
2025-06-22 01:24:06 UTC< 1 s2025-06-22 01:24:06 UTC
This is particularly useful for documentation examples where you want to demonstrate file-based workflows without requiring users to have specific data files: