CLI Interactive Demos

These CLI demos showcase practical data quality workflows that you can use!

🎬 Workflow-Based Demonstrations
  • Essential validations for everyday data quality checks
  • Data exploration tools that require no Python knowledge
  • CI/CD integration patterns for automated data quality
  • Complete pipelines from exploration to production validation
Prerequisites

To follow along with these demonstrations:

pip install pointblank
pb --help  # Verify installation

Getting Started with the CLI

Learn the basics of Pointblank’s CLI and run your first validation:

Getting Started CLI overview and your first data quality validation

Essential Data Quality Validations

See the most commonly used validation checks that catch critical data issues:

Essential Validations Duplicate detection, null checks, and data extract debugging

Data Exploration Tools

Discover how to profile and explore data using CLI tools that are quick and easy to use:

Data Exploration Preview data, find missing values, and generate column summaries

Using Polars

You can use Polars in the CLI to load and transform data, and, pass the data to other CLI tools:

Using Polars Preview data, find missing values, and generate column summaries

CI/CD Integration & Automation

Learn how to integrate data quality checks into automated pipelines:

CI/CD Integration Exit codes, pipeline integration, and automated quality gates

Complete Data Quality Workflow

Follow an end-to-end data quality pipeline combining exploration, validation, and profiling:

Complete Workflow Full pipeline: explore → validate → automate

Getting Started

Ready to implement data quality workflows? Here’s how to get started:

1. Install and Verify

pip install pointblank
pb --help

2. Explore Various Data Sources

# Try previewing a built-in dataset
pb preview small_table

# Access local files (even use patterns to combine multiple Parquet files)
pb preview sales_data.csv
pb scan "data/*.parquet"

# Inspect datasets in GitHub repositories (no need to download the data!)
pb preview "https://github.com/user/repo/blob/main/data.csv"
pb missing "https://raw.githubusercontent.com/user/repo/main/sales.parquet"

# Work with DB tables through connection strings
pb info "duckdb:///warehouse/analytics.ddb::customers"

3. Run Essential Validations

# Check for duplicate rows
pb validate small_table --check rows-distinct

# Validate data from multiple sources
pb validate "data/*.parquet" --check col-vals-not-null --column customer_id
pb validate "https://github.com/user/repo/blob/main/sales.csv" --check rows-distinct

# Extract failing data for debugging
pb validate small_table --check col-vals-gt --column a --value 5 --show-extract

4. Integrate with CI/CD

# Use exit codes for automation (0 = pass, 1 = fail)
pb validate small_table --check rows-distinct --exit-code