CLI Reference

This page provides a complete reference for all Pointblank CLI commands. Each section shows the full help text as it appears in the terminal, giving you quick access to all available options and examples.

For practical usage examples and workflows, see the CLI Data Validation and CLI Data Inspection guides.

pb - Main Command

The main entry point for all Pointblank CLI operations:

Usage: pb [OPTIONS] COMMAND [ARGS]...

  Pointblank CLI: Data validation and quality tools for data engineers.

  Use this CLI to validate data quality, explore datasets, and generate
  comprehensive reports for CSV, Parquet, and database sources. Suitable for
  data pipelines, ETL validation, and exploratory data analysis from the
  command line.

  Quick Examples:

    pb preview data.csv              Preview your data
    pb scan data.csv                 Generate data profile
    pb validate data.csv             Run basic validation

  Use pb COMMAND --help for detailed help on any command.

Options:
  -v, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  info           Display information about a data source.
  preview        Preview a data table showing head and tail rows.
  scan           Generate a data scan profile report.
  missing        Generate a missing values report for a data table.
  validate       Perform single or multiple data validations.
  run            Run a Pointblank validation script or YAML configuration.
  make-template  Create a validation script or YAML configuration template.
  pl             Execute Polars expressions and display results.
  datasets       List available built-in datasets.
  requirements   Check installed dependencies and their availability.

pb info - Data Source Information

Display basic information about a data source:

Usage: pb info [OPTIONS] [DATA_SOURCE]

  Display information about a data source.

  Shows table type, dimensions, column names, and data types.

  DATA_SOURCE can be:

  - CSV file path (e.g., data.csv)
  - Parquet file path or pattern (e.g., data.parquet, data/*.parquet)
  - GitHub URL to CSV/Parquet (e.g., https://github.com/user/repo/blob/main/data.csv)
  - Database connection string (e.g., duckdb:///path/to/db.ddb::table_name)
  - Dataset name from pointblank (small_table, game_revenue, nycflights, global_sales)

Options:
  --help  Show this message and exit.

pb preview - Data Table Preview

Preview data showing head and tail rows:

Usage: pb preview [OPTIONS] [DATA_SOURCE]

  Preview a data table showing head and tail rows.

  DATA_SOURCE can be:

  - CSV file path (e.g., data.csv)
  - Parquet file path or pattern (e.g., data.parquet, data/*.parquet)
  - GitHub URL to CSV/Parquet (e.g., https://github.com/user/repo/blob/main/data.csv)
  - Database connection string (e.g., duckdb:///path/to/db.ddb::table_name)
  - Dataset name from pointblank (small_table, game_revenue, nycflights, global_sales)
  - Piped data from pb pl command

  COLUMN SELECTION OPTIONS:

  For tables with many columns, use these options to control which columns are
  displayed:

  - --columns: Specify exact columns (e.g., --columns "name,age,email")
  - --col-range: Select column range (e.g., --col-range "1:10", --col-range "5:", --col-range ":15")
  - --col-first: Show first N columns (e.g., --col-first 5)
  - --col-last: Show last N columns (e.g., --col-last 3)

  Tables with >15 columns automatically show first 7 and last 7 columns with
  indicators.

Options:
  --columns TEXT             Comma-separated list of columns to display
  --col-range TEXT           Column range like '1:10' or '5:' or ':15'
                             (1-based indexing)
  --col-first INTEGER        Show first N columns
  --col-last INTEGER         Show last N columns
  --head INTEGER             Number of rows from the top (default: 5)
  --tail INTEGER             Number of rows from the bottom (default: 5)
  --limit INTEGER            Maximum total rows to display (default: 50)
  --no-row-numbers           Hide row numbers
  --max-col-width INTEGER    Maximum column width in pixels (default: 250)
  --min-table-width INTEGER  Minimum table width in pixels (default: 500)
  --no-header                Hide table header
  --output-html PATH         Save HTML output to file
  --help                     Show this message and exit.

pb scan - Data Profile Reports

Generate comprehensive data profiles:

Usage: pb scan [OPTIONS] [DATA_SOURCE]

  Generate a data scan profile report.

  Produces a comprehensive data profile including:

  - Column types and distributions
  - Missing value patterns
  - Basic statistics
  - Data quality indicators

  DATA_SOURCE can be:

  - CSV file path (e.g., data.csv)
  - Parquet file path or pattern (e.g., data.parquet, data/*.parquet)
  - GitHub URL to CSV/Parquet (e.g., https://github.com/user/repo/blob/main/data.csv)
  - Database connection string (e.g., duckdb:///path/to/db.ddb::table_name)
  - Dataset name from pointblank (small_table, game_revenue, nycflights, global_sales)
  - Piped data from pb pl command

Options:
  --output-html PATH  Save HTML scan report to file
  -c, --columns TEXT  Comma-separated list of columns to scan
  --help              Show this message and exit.

pb missing - Missing Values Reports

Generate reports focused on missing values:

Usage: pb missing [OPTIONS] [DATA_SOURCE]

  Generate a missing values report for a data table.

  DATA_SOURCE can be:

  - CSV file path (e.g., data.csv)
  - Parquet file path or pattern (e.g., data.parquet, data/*.parquet)
  - GitHub URL to CSV/Parquet (e.g., https://github.com/user/repo/blob/main/data.csv)
  - Database connection string (e.g., duckdb:///path/to/db.ddb::table_name)
  - Dataset name from pointblank (small_table, game_revenue, nycflights, global_sales)
  - Piped data from pb pl command

Options:
  --output-html PATH  Save HTML output to file
  --help              Show this message and exit.

pb validate - Quick Data Validations

Perform single or multiple data validations:

Usage: pb validate [OPTIONS] [DATA_SOURCE]

  Perform single or multiple data validations.

  Run one or more validation checks on your data in a single command. Use
  multiple --check options to perform multiple validations.

  DATA_SOURCE can be:

  - CSV file path (e.g., data.csv)
  - Parquet file path or pattern (e.g., data.parquet, data/*.parquet)
  - GitHub URL to CSV/Parquet (e.g., https://github.com/user/repo/blob/main/data.csv)
  - Database connection string (e.g., duckdb:///path/to/db.ddb::table_name)
  - Dataset name from pointblank (small_table, game_revenue, nycflights, global_sales)

  AVAILABLE CHECK_TYPES:

  Require no additional options:

  - rows-distinct: Check if all rows in the dataset are unique (no duplicates)
  - rows-complete: Check if all rows are complete (no missing values in any column)

  Require --column:

  - col-exists: Check if a specific column exists in the dataset
  - col-vals-not-null: Check if all values in a column are not null/missing

  Require --column and --value:

  - col-vals-gt: Check if column values are greater than a fixed value
  - col-vals-ge: Check if column values are greater than or equal to a fixed value
  - col-vals-lt: Check if column values are less than a fixed value
  - col-vals-le: Check if column values are less than or equal to a fixed value

  Require --column and --set:

  - col-vals-in-set: Check if column values are in an allowed set

  Use --list-checks to see all available validation methods with examples. The
  default CHECK_TYPE is 'rows-distinct' which checks for duplicate rows.

  Examples:

  pb validate data.csv                               # Uses default validation (rows-distinct)
  pb validate data.csv --list-checks                 # Show all available checks
  pb validate data.csv --check rows-distinct
  pb validate data.csv --check rows-distinct --show-extract
  pb validate data.csv --check rows-distinct --write-extract failing_rows_folder
  pb validate data.csv --check rows-distinct --exit-code
  pb validate data.csv --check col-exists --column price
  pb validate data.csv --check col-vals-not-null --column email
  pb validate data.csv --check col-vals-gt --column score --value 50
  pb validate data.csv --check col-vals-in-set --column status --set "active,inactive,pending"

  Multiple validations in one command: pb validate data.csv --check rows-
  distinct --check rows-complete

Options:
  --list-checks         List available validation checks and exit
  --check CHECK_TYPE    Type of validation check to perform. Can be used
                        multiple times for multiple checks.
  --column TEXT         Column name or integer position as #N (1-based index)
                        for validation.
  --set TEXT            Comma-separated allowed values for col-vals-in-set
                        checks.
  --value FLOAT         Numeric value for comparison checks.
  --show-extract        Show extract of failing rows if validation fails
  --write-extract TEXT  Save failing rows to folder. Provide base name for
                        folder.
  --limit INTEGER       Maximum number of failing rows to save to CSV
                        (default: 500)
  --exit-code           Exit with non-zero code if validation fails
  --help                Show this message and exit.

pb run - Validation Scripts and YAML

Run Python validation scripts or YAML configurations:

Usage: pb run [OPTIONS] [VALIDATION_FILE]

  Run a Pointblank validation script or YAML configuration.

  VALIDATION_FILE can be: - A Python file (.py) that defines validation logic
  - A YAML configuration file (.yaml, .yml) that defines validation steps

  Python scripts should load their own data and create validation objects.
  YAML configurations define data sources and validation steps declaratively.

  If --data is provided, it will automatically replace the data source in your
  validation objects (Python scripts) or override the 'tbl' field (YAML
  configs).

  To get started quickly, use 'pb make-template' to create templates.

  DATA can be:

  - CSV file path (e.g., data.csv)
  - Parquet file path or pattern (e.g., data.parquet, data/*.parquet)
  - GitHub URL to CSV/Parquet (e.g., https://github.com/user/repo/blob/main/data.csv)
  - Database connection string (e.g., duckdb:///path/to/db.ddb::table_name)
  - Dataset name from pointblank (small_table, game_revenue, nycflights, global_sales)

  Examples:

  pb make-template my_validation.py  # Create a Python template
  pb run validation_script.py
  pb run validation_config.yaml
  pb run validation_script.py --data data.csv
  pb run validation_config.yaml --data small_table --output-html report.html
  pb run validation_script.py --show-extract --fail-on error
  pb run validation_config.yaml --write-extract extracts_folder --fail-on critical

Options:
  --data TEXT                     Data source to replace in validation objects
                                  (Python scripts and YAML configs)
  --output-html PATH              Save HTML validation report to file
  --output-json PATH              Save JSON validation summary to file
  --show-extract                  Show extract of failing rows if validation
                                  fails
  --write-extract TEXT            Save failing rows to folders (one CSV per
                                  step). Provide base name for folder.
  --limit INTEGER                 Maximum number of failing rows to save to
                                  CSV (default: 500)
  --fail-on [critical|error|warning|any]
                                  Exit with non-zero code when validation
                                  reaches this threshold level
  --help                          Show this message and exit.

pb make-template - Template Generation

Create validation script or YAML configuration templates:

Usage: pb make-template [OPTIONS] [OUTPUT_FILE]

  Create a validation script or YAML configuration template.

  Creates a sample Python script or YAML configuration with examples showing
  how to use Pointblank for data validation. The template type is determined
  by the file extension: - .py files create Python script templates -
  .yaml/.yml files create YAML configuration templates

  Edit the template to add your own data loading and validation rules, then
  run it with 'pb run'.

  OUTPUT_FILE is the path where the template will be created.

  Examples:

  pb make-template my_validation.py        # Creates Python script template
  pb make-template my_validation.yaml      # Creates YAML config template
  pb make-template validation_template.yml # Creates YAML config template

Options:
  --help  Show this message and exit.

pb pl - Polars Expression Execution

Execute Polars expressions and display results:

Usage: pb pl [OPTIONS] [POLARS_EXPRESSION]

  Execute Polars expressions and display results.

  Execute Polars DataFrame operations from the command line and display the
  results using Pointblank's visualization tools.

  POLARS_EXPRESSION should be a valid Polars expression that returns a
  DataFrame. The 'pl' module is automatically imported and available.

  Examples:

  # Direct expression
  pb pl "pl.read_csv('data.csv')"
  pb pl "pl.read_csv('data.csv').select(['name', 'age'])"
  pb pl "pl.read_csv('data.csv').filter(pl.col('age') > 25)"

  # Multi-line with editor (supports multiple statements)
  pb pl --edit

  # Multi-statement code example in editor:
  # csv = pl.read_csv('data.csv')
  # result = csv.select(['name', 'age']).filter(pl.col('age') > 25)

  # Multi-line with a specific editor
  pb pl --edit --editor nano
  pb pl --edit --editor code
  pb pl --edit --editor micro

  # From file
  pb pl --file query.py

  Piping to other pb commands
  pb pl "pl.read_csv('data.csv').head(20)" --pipe | pb validate --check rows-distinct
  pb pl --edit --pipe | pb preview --head 10
  pb pl --edit --pipe | pb scan --output-html report.html
  pb pl --edit --pipe | pb missing --output-html missing_report.html

  Use --output-format to change how results are displayed:
  pb pl "pl.read_csv('data.csv')" --output-format scan
  pb pl "pl.read_csv('data.csv')" --output-format missing
  pb pl "pl.read_csv('data.csv')" --output-format info

  Note: For multi-statement code, assign your final result to a variable like
  'result', 'df', 'data', or ensure it's the last expression.

Options:
  -e, --edit                      Open editor for multi-line input
  -f, --file PATH                 Read query from file
  --editor TEXT                   Editor to use for --edit mode (overrides
                                  $EDITOR and auto-detection)
  -o, --output-format [preview|scan|missing|info]
                                  Output format for the result
  --preview-head INTEGER          Number of head rows for preview
  --preview-tail INTEGER          Number of tail rows for preview
  --output-html PATH              Save HTML output to file
  --pipe                          Output data in a format suitable for piping
                                  to other pb commands
  --pipe-format [parquet|csv]     Format for piped output (default: parquet)
  --help                          Show this message and exit.

pb datasets - Built-in Datasets

List available built-in datasets:

Usage: pb datasets [OPTIONS]

  List available built-in datasets.

Options:
  --help  Show this message and exit.

pb requirements - Dependency Check

Check installed dependencies and their availability:

Usage: pb requirements [OPTIONS]

  Check installed dependencies and their availability.

Options:
  --help  Show this message and exit.

Common Data Source Types

All commands that accept a DATA_SOURCE parameter support these formats:

  • CSV files: data.csv, path/to/data.csv
  • Parquet files: data.parquet, data/*.parquet (patterns supported)
  • GitHub URLs: https://github.com/user/repo/blob/main/data.csv
  • Database connections: duckdb:///path/to/db.ddb::table_name
  • Built-in datasets: small_table, game_revenue, nycflights, global_sales
  • Piped data: Output from pb pl command (where supported)

Exit Codes and Automation

Many commands support options useful for automation and CI/CD:

  • --exit-code: Exit with non-zero code on validation failure
  • --fail-on [critical|error|warning|any]: Control failure thresholds
  • --output-html, --output-json: Save reports for external consumption
  • --write-extract: Save failing rows for investigation

These features make Pointblank CLI commands suitable for integration into data pipelines, quality gates, and automated workflows.