Data Inspection and Exploration
Pointblank’s CLI makes it easy to inspect, preview, and explore your data before running validations. This is useful for understanding your data’s structure, checking for obvious issues, and confirming that your data source is being read correctly.
Supported Data Sources
You can inspect a wide variety of data sources using the CLI:
- CSV files: single files, glob patterns
- Parquet files: as single files, directories, or partitioned datasets
- GitHub URLs: for CSV/Parquet files as standard or raw URLs
- database tables: via connection strings
- built-in datasets: these are provided by Pointblank
Quick Reference for the Data Inspection Commands
Command | Purpose |
---|---|
pb info |
Show table type, dimensions, columns, types |
pb preview |
Preview head/tail rows, select columns |
pb scan |
Full column summary/profile (stats, NA, etc) |
pb missing |
Visualize missing value patterns |
pb info
: Inspecting Data Structure
Use pb info
to display basic information about your data source:
pb info data.csv
pb info "data/*.parquet"
pb info "duckdb:///warehouse/analytics.ddb::customer_metrics"
pb info small_table
This command shows the
- table type (e.g., pandas, polars, etc.)
- number of rows and columns
- data source path or identifier
pb preview
: Previewing Data
Use pb preview
to view the first and last rows of your data, with flexible column selection:
pb preview data.csv
pb preview "data/*.parquet"
pb preview "https://github.com/user/repo/blob/main/data.csv"
pb preview "duckdb:///path/to/db.ddb::table_name"
pb preview small_table
Here are some useful options:
--rows N
: show N rows from the top, default: 5--columns "col1,col2"
: show only specified columns--col-range "1:10"
: show columns by position--col-first N
: show first N columns--col-last N
: show last N columns--no-row-numbers
: hide row numbers--output-html file.html
: save preview as an HTML file
Here’s an example where only the name
, age
, and email
columns from data.csv
are shown (and we limit this to the top 10 rows):
pb preview data.csv --columns "name,age,email" --rows 10
pb scan
: Column Summary and Profiling
Use pb scan
for a comprehensive column summary, including:
- data types
- missing value counts
- unique value counts
- summary statistics (mean, standard deviation, min, max, quartiles)
pb scan data.csv
pb scan "data/*.parquet"
pb scan "duckdb:///warehouse/analytics.ddb::customer_metrics"
pb scan small_table
Here are the options:
--columns "col1,col2"
(scan only specified columns)--output-html file.html
(save scan as HTML report)
pb missing
: Missing Value Patterns
Use pb missing
to generate a missing values report, visualizing missingness across columns and row sectors:
pb missing data.csv
pb missing "data/*.parquet"
pb missing "duckdb:///warehouse/analytics.ddb::customer_metrics"
pb missing small_table
There’s an option here as well:
--output-html file.html
(save missing values report as HTML)
Some Useful Tips on When and How to Use
- use
pb info
and before running validations to confirm your data source can be loaded. - use
pb preview
to quickly understand what the data looks like. - use
pb missing
to visualize and diagnose missing data patterns. - use
pb scan
for a quick data profile and to spot outliers or data quality issues.