----------------------------------------------------------------------
This is the API documentation for the Pointblank library.
----------------------------------------------------------------------


## The Validate family

When peforming data validation, you'll need the `Validate` class to get the
process started. It's given the target table and you can optionally provide some metadata and/or
failure thresholds (using the `Thresholds` class or through shorthands for this task). The
`Validate` class has numerous methods for defining validation steps and for obtaining
post-interrogation metrics and data.

Validate(data: 'FrameT | Any', tbl_name: 'str | None' = None, label: 'str | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds | None' = None, actions: 'Actions | None' = None, final_actions: 'FinalActions | None' = None, brief: 'str | bool | None' = None, lang: 'str | None' = None, locale: 'str | None' = None) -> None

    Workflow for defining a set of validations on a table and interrogating for results.

    The `Validate` class is used for defining a set of validation steps on a table and interrogating
    the table with the *validation plan*. This class is the main entry point for the *data quality
    reporting* workflow. The overall aim of this workflow is to generate comprehensive reporting
    information to assess the level of data quality for a target table.

    We can supply as many validation steps as needed, and having a large number of them should
    increase the validation coverage for a given table. The validation methods (e.g.,
    [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`),
    [`col_vals_between()`](`pointblank.Validate.col_vals_between`), etc.) translate to discrete
    validation steps, where each step will be sequentially numbered (useful when viewing the
    reporting data). This process of calling validation methods is known as developing a
    *validation plan*.

    The validation methods, when called, are merely instructions up to the point the concluding
    [`interrogate()`](`pointblank.Validate.interrogate`) method is called. That kicks off the
    process of acting on the *validation plan* by querying the target table getting reporting
    results for each step. Once the interrogation process is complete, we can say that the workflow
    now has reporting information. We can then extract useful information from the reporting data
    to understand the quality of the table. Printing the `Validate` object (or using the
    [`get_tabular_report()`](`pointblank.Validate.get_tabular_report`) method) will return a table
    with the results of the interrogation and
    [`get_sundered_data()`](`pointblank.Validate.get_sundered_data`) allows for the splitting of the
    table based on passing and failing rows.

    Parameters
    ----------
    data
        The table to validate, which could be a DataFrame object, an Ibis table object, a CSV
        file path, a Parquet file path, a GitHub URL pointing to a CSV or Parquet file, or a
        database connection string. When providing a CSV or Parquet file path (as a string or
        `pathlib.Path` object), the file will be automatically loaded using an available DataFrame
        library (Polars or Pandas). Parquet input also supports glob patterns, directories
        containing .parquet files, and Spark-style partitioned datasets. GitHub URLs are
        automatically transformed to raw content URLs and downloaded. Connection strings enable
        direct database access via Ibis with optional table specification using the `::table_name`
        suffix. Read the *Supported Input Table Types* section for details on the supported table
        types.
    tbl_name
        An optional name to assign to the input table object. If no value is provided, a name will
        be generated based on whatever information is available. This table name will be displayed
        in the header area of the tabular report.
    label
        An optional label for the validation plan. If no value is provided, a label will be
        generated based on the current system date and time. Markdown can be used here to make the
        label more visually appealing (it will appear in the header area of the tabular report).
    thresholds
        Generate threshold failure levels so that all validation steps can report and react
        accordingly when exceeding the set levels. The thresholds are set at the global level and
        can be overridden at the validation step level (each validation step has its own
        `thresholds=` parameter). The default is `None`, which means that no thresholds will be set.
        Look at the *Thresholds* section for information on how to set threshold levels.
    actions
        The actions to take when validation steps meet or exceed any set threshold levels. These
        actions are paired with the threshold levels and are executed during the interrogation
        process when there are exceedances. The actions are executed right after each step is
        evaluated. Such actions should be provided in the form of an `Actions` object. If `None`
        then no global actions will be set. View the *Actions* section for information on how to set
        actions.
    final_actions
        The actions to take when the validation process is complete and the final results are
        available. This is useful for sending notifications or reporting the overall status of the
        validation process. The final actions are executed after all validation steps have been
        processed and the results have been collected. The final actions are not tied to any
        threshold levels, they are executed regardless of the validation results. Such actions
        should be provided in the form of a `FinalActions` object. If `None` then no finalizing
        actions will be set. Please see the *Actions* section for information on how to set final
        actions.
    brief
        A global setting for briefs, which are optional brief descriptions for validation steps
        (they be displayed in the reporting table). For such a global setting, templating elements
        like `"{step}"` (to insert the step number) or `"{auto}"` (to include an automatically
        generated brief) are useful. If `True` then each brief will be automatically generated. If
        `None` (the default) then briefs aren't globally set.
    lang
        The language to use for various reporting elements. By default, `None` will select English
        (`"en"`) as the but other options include French (`"fr"`), German (`"de"`), Italian
        (`"it"`), Spanish (`"es"`), and several more. Have a look at the *Reporting Languages*
        section for the full list of supported languages and information on how the language setting
        is utilized.
    locale
        An optional locale ID to use for formatting values in the reporting table according the
        locale's rules. Examples include `"en-US"` for English (United States) and `"fr-FR"` for
        French (France). More simply, this can be a language identifier without a designation of
        territory, like `"es"` for Spanish.

    Returns
    -------
    Validate
        A `Validate` object with the table and validations to be performed.

    Supported Input Table Types
    ---------------------------
    The `data=` parameter can be given any of the following table types:

    - Polars DataFrame (`"polars"`)
    - Pandas DataFrame (`"pandas"`)
    - PySpark table (`"pyspark"`)
    - DuckDB table (`"duckdb"`)*
    - MySQL table (`"mysql"`)*
    - PostgreSQL table (`"postgresql"`)*
    - SQLite table (`"sqlite"`)*
    - Microsoft SQL Server table (`"mssql"`)*
    - Snowflake table (`"snowflake"`)*
    - Databricks table (`"databricks"`)*
    - BigQuery table (`"bigquery"`)*
    - Parquet table (`"parquet"`)*
    - CSV files (string path or `pathlib.Path` object with `.csv` extension)
    - Parquet files (string path, `pathlib.Path` object, glob pattern, directory with `.parquet`
    extension, or partitioned dataset)
    - Database connection strings (URI format with optional table specification)

    The table types marked with an asterisk need to be prepared as Ibis tables (with type of
    `ibis.expr.types.relations.Table`). Furthermore, the use of `Validate` with such tables requires
    the Ibis library v9.5.0 and above to be installed. If the input table is a Polars or Pandas
    DataFrame, the Ibis library is not required.

    To use a CSV file, ensure that a string or `pathlib.Path` object with a `.csv` extension is
    provided. The file will be automatically detected and loaded using the best available DataFrame
    library. The loading preference is Polars first, then Pandas as a fallback.

    Connection strings follow database URL formats and must also specify a table using the
    `::table_name` suffix. Examples include:

    ```
    "duckdb:///path/to/database.ddb::table_name"
    "sqlite:///path/to/database.db::table_name"
    "postgresql://user:password@localhost:5432/database::table_name"
    "mysql://user:password@localhost:3306/database::table_name"
    "bigquery://project/dataset::table_name"
    "snowflake://user:password@account/database/schema::table_name"
    ```

    When using connection strings, the Ibis library with the appropriate backend driver is required.

    Thresholds
    ----------
    The `thresholds=` parameter is used to set the failure-condition levels for all validation
    steps. They are set here at the global level but can be overridden at the validation step level
    (each validation step has its own local `thresholds=` parameter).

    There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values can
    either be set as a proportion failing of all test units (a value between `0` to `1`), or, the
    absolute number of failing test units (as integer that's `1` or greater).

    Thresholds can be defined using one of these input schemes:

    1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
    thresholds)
    2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is the
    'error' level, and position `2` is the 'critical' level
    3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
    'critical'
    4. a single integer/float value denoting absolute number or fraction of failing test units for
    the 'warning' level only

    If the number of failing test units for a validation step exceeds set thresholds, the validation
    step will be marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need
    to be set, you're free to set any combination of them.

    Aside from reporting failure conditions, thresholds can be used to determine the actions to take
    for each level of failure (using the `actions=` parameter).

    Actions
    -------
    The `actions=` and `final_actions=` parameters provide mechanisms to respond to validation
    results. These actions can be used to notify users of validation failures, log issues, or
    trigger other processes when problems are detected.

    *Step Actions*

    The `actions=` parameter allows you to define actions that are triggered when validation steps
    exceed specific threshold levels (warning, error, or critical). These actions are executed
    during the interrogation process, right after each step is evaluated.

    Step actions should be provided using the [`Actions`](`pointblank.Actions`) class, which lets
    you specify different actions for different severity levels:

    ```python
    # Define an action that logs a message when warning threshold is exceeded
    def log_warning():
        metadata = pb.get_action_metadata()
        print(f"WARNING: Step {metadata['step']} failed with type {metadata['type']}")

    # Define actions for different threshold levels
    actions = pb.Actions(
        warning = log_warning,
        error = lambda: send_email("Error in validation"),
        critical = "CRITICAL FAILURE DETECTED"
    )

    # Use in Validate
    validation = pb.Validate(
        data=my_data,
        actions=actions  # Global actions for all steps
    )
    ```

    You can also provide step-specific actions in individual validation methods:

    ```python
    validation.col_vals_gt(
        columns="revenue",
        value=0,
        actions=pb.Actions(warning=log_warning)  # Only applies to this step
    )
    ```

    Step actions have access to step-specific context through the
    [`get_action_metadata()`](`pointblank.get_action_metadata`) function, which provides details
    about the current validation step that triggered the action.

    *Final Actions*

    The `final_actions=` parameter lets you define actions that execute after all validation steps
    have completed. These are useful for providing summaries, sending notifications based on
    overall validation status, or performing cleanup operations.

    Final actions should be provided using the [`FinalActions`](`pointblank.FinalActions`) class:

    ```python
    def send_report():
        summary = pb.get_validation_summary()
        if summary["status"] == "CRITICAL":
            send_alert_email(
                subject=f"CRITICAL validation failures in {summary['tbl_name']}",
                body=f"{summary['critical_steps']} steps failed with critical severity."
            )

    validation = pb.Validate(
        data=my_data,
        final_actions=pb.FinalActions(send_report)
    )
    ```

    Final actions have access to validation-wide summary information through the
    [`get_validation_summary()`](`pointblank.get_validation_summary`) function, which provides a
    comprehensive overview of the entire validation process.

    The combination of step actions and final actions provides a flexible system for responding to
    data quality issues at both the individual step level and the overall validation level.

    Reporting Languages
    -------------------
    Various pieces of reporting in Pointblank can be localized to a specific language. This is done
    by setting the `lang=` parameter in `Validate`. Any of the following languages can be used (just
    provide the language code):

    - English (`"en"`)
    - French (`"fr"`)
    - German (`"de"`)
    - Italian (`"it"`)
    - Spanish (`"es"`)
    - Portuguese (`"pt"`)
    - Dutch (`"nl"`)
    - Swedish (`"sv"`)
    - Danish (`"da"`)
    - Norwegian Bokmål (`"nb"`)
    - Icelandic (`"is"`)
    - Finnish (`"fi"`)
    - Polish (`"pl"`)
    - Czech (`"cs"`)
    - Romanian (`"ro"`)
    - Greek (`"el"`)
    - Russian (`"ru"`)
    - Turkish (`"tr"`)
    - Arabic (`"ar"`)
    - Hindi (`"hi"`)
    - Simplified Chinese (`"zh-Hans"`)
    - Traditional Chinese (`"zh-Hant"`)
    - Japanese (`"ja"`)
    - Korean (`"ko"`)
    - Vietnamese (`"vi"`)
    - Indonesian (`"id"`)
    - Ukrainian (`"uk"`)
    - Hebrew (`"he"`)
    - Thai (`"th"`)
    - Persian (`"fa"`)

    Automatically generated briefs (produced by using `brief=True` or `brief="...{auto}..."`) will
    be written in the selected language. The language setting will also used when generating the
    validation report table through
    [`get_tabular_report()`](`pointblank.Validate.get_tabular_report`) (or printing the `Validate`
    object in a notebook environment).

    Examples
    --------
    ### Creating a validation plan and interrogating

    Let's walk through a data quality analysis of an extremely small table. It's actually called
    `"small_table"` and it's accessible through the [`load_dataset()`](`pointblank.load_dataset`)
    function.

    ```python
    import pointblank as pb

    # Load the `small_table` dataset
    small_table = pb.load_dataset(dataset="small_table", tbl_type="polars")

    # Preview the table
    pb.preview(small_table)
    ```

    We ought to think about what's tolerable in terms of data quality so let's designate
    proportional failure thresholds to the 'warning', 'error', and 'critical' states. This can be
    done by using the [`Thresholds`](`pointblank.Thresholds`) class.

    ```python
    thresholds = pb.Thresholds(warning=0.10, error=0.25, critical=0.35)
    ```

    Now, we use the `Validate` class and give it the `thresholds` object (which serves as a default
    for all validation steps but can be overridden). The static thresholds provided in `thresholds=`
    will make the reporting a bit more useful. We also need to provide a target table and we'll use
    `small_table` for this.

    ```python
    validation = (
        pb.Validate(
            data=small_table,
            tbl_name="small_table",
            label="`Validate` example.",
            thresholds=thresholds
        )
    )
    ```

    Then, as with any `Validate` object, we can add steps to the validation plan by using as many
    validation methods as we want. To conclude the process (and actually query the data table), we
    use the [`interrogate()`](`pointblank.Validate.interrogate`) method.

    ```python
    validation = (
        validation
        .col_vals_gt(columns="d", value=100)
        .col_vals_le(columns="c", value=5)
        .col_vals_between(columns="c", left=3, right=10, na_pass=True)
        .col_vals_regex(columns="b", pattern=r"[0-9]-[a-z]{3}-[0-9]{3}")
        .col_exists(columns=["date", "date_time"])
        .interrogate()
    )
    ```

    The `validation` object can be printed as a reporting table.

    ```python
    validation
    ```

    The report could be further customized by using the
    [`get_tabular_report()`](`pointblank.Validate.get_tabular_report`) method, which contains
    options for modifying the display of the table.

    ### Adding briefs

    Briefs are short descriptions of the validation steps. While they can be set for each step
    individually, they can also be set globally. The global setting is done by using the
    `brief=` argument in `Validate`. The global setting can be as simple as `True` to have
    automatically-generated briefs for each step. Alternatively, we can use templating elements
    like `"{step}"` (to insert the step number) or `"{auto}"` (to include an automatically generated
    brief). Here's an example of a global setting for briefs:

    ```python
    validation_2 = (
        pb.Validate(
            data=pb.load_dataset(),
            tbl_name="small_table",
            label="Validation example with briefs",
            brief="Step {step}: {auto}",
        )
        .col_vals_gt(columns="d", value=100)
        .col_vals_between(columns="c", left=3, right=10, na_pass=True)
        .col_vals_regex(
            columns="b",
            pattern=r"[0-9]-[a-z]{3}-[0-9]{3}",
            brief="Regex check for column {col}"
        )
        .interrogate()
    )

    validation_2
    ```

    We see the text of the briefs appear in the `STEP` column of the reporting table. Furthermore,
    the global brief's template (`"Step {step}: {auto}"`) is applied to all steps except for the
    final step, where the step-level `brief=` argument provided an override.

    If you should want to cancel the globally-defined brief for one or more validation steps, you
    can set `brief=False` in those particular steps.

    ### Post-interrogation methods

    The `Validate` class has a number of post-interrogation methods that can be used to extract
    useful information from the validation results. For example, the
    [`get_data_extracts()`](`pointblank.Validate.get_data_extracts`) method can be used to get
    the data extracts for each validation step.

    ```python
    validation_2.get_data_extracts()
    ```

    We can also view step reports for each validation step using the
    [`get_step_report()`](`pointblank.Validate.get_step_report`) method. This method adapts to the
    type of validation step and shows the relevant information for a step's validation.

    ```python
    validation_2.get_step_report(i=2)
    ```

    The `Validate` class also has a method for getting the sundered data, which is the data that
    passed or failed the validation steps. This can be done using the
    [`get_sundered_data()`](`pointblank.Validate.get_sundered_data`) method.

    ```python
    pb.preview(validation_2.get_sundered_data())
    ```

    The sundered data is a DataFrame that contains the rows that passed or failed the validation.
    The default behavior is to return the rows that failed the validation, as shown above.

    ### Working with CSV Files

    The `Validate` class can directly accept CSV file paths, making it easy to validate data stored
    in CSV files without manual loading:

    ```python
    # Get a path to a CSV file from the package data
    csv_path = pb.get_data_path("global_sales", "csv")

    validation_3 = (
        pb.Validate(
            data=csv_path,
            label="CSV validation example"
        )
        .col_exists(["customer_id", "product_id", "revenue"])
        .col_vals_not_null(["customer_id", "product_id"])
        .col_vals_gt(columns="revenue", value=0)
        .interrogate()
    )

    validation_3
    ```

    You can also use a Path object to specify the CSV file. Here's an example of how to do that:

    ```python
    from pathlib import Path

    csv_file = Path(pb.get_data_path("game_revenue", "csv"))

    validation_4 = (
        pb.Validate(data=csv_file, label="Game Revenue Validation")
        .col_exists(["player_id", "session_id", "item_name"])
        .col_vals_regex(
            columns="session_id",
            pattern=r"[A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}"
        )
        .col_vals_gt(columns="item_revenue", value=0, na_pass=True)
        .interrogate()
    )

    validation_4
    ```

    The CSV loading is automatic, so when a string or Path with a `.csv` extension is provided,
    Pointblank will automatically load the file using the best available DataFrame library (Polars
    preferred, Pandas as fallback). The loaded data can then be used with all validation methods
    just like any other supported table type.

    ### Working with Parquet Files

    The `Validate` class can directly accept Parquet files and datasets in various formats. The
    following examples illustrate how to validate Parquet files:

    ```python
    # Single Parquet file from package data
    parquet_path = pb.get_data_path("nycflights", "parquet")

    validation_5 = (
        pb.Validate(
            data=parquet_path,
            tbl_name="NYC Flights Data"
        )
        .col_vals_not_null(["carrier", "origin", "dest"])
        .col_vals_gt(columns="distance", value=0)
        .interrogate()
    )

    validation_5
    ```

    You can also use glob patterns and directories. Here are some examples for how to:

    1. load multiple Parquet files
    2. load a Parquet-containing directory
    3. load a partitioned Parquet dataset

    ```python
    # Multiple Parquet files with glob patterns
    validation_6 = pb.Validate(data="data/sales_*.parquet")

    # Directory containing Parquet files
    validation_7 = pb.Validate(data="parquet_data/")

    # Partitioned Parquet dataset
    validation_8 = (
        pb.Validate(data="sales_data/")  # Contains year=2023/quarter=Q1/region=US/sales.parquet
        .col_exists(["transaction_id", "amount", "year", "quarter", "region"])
        .interrogate()
    )
    ```

    When you point to a directory that contains a partitioned Parquet dataset (with subdirectories
    like `year=2023/quarter=Q1/region=US/`), Pointblank will automatically:

    - discover all Parquet files recursively
    - extract partition column values from directory paths
    - add partition columns to the final DataFrame
    - combine all partitions into a single table for validation

    Both Polars and Pandas handle partitioned datasets natively, so this works seamlessly with
    either DataFrame library. The loading preference is Polars first, then Pandas as a fallback.

    ### Working with Database Connection Strings

    The `Validate` class supports database connection strings for direct validation of database
    tables. Connection strings must specify a table using the `::table_name` suffix:

    ```python
    # Get path to a DuckDB database file from package data
    duckdb_path = pb.get_data_path("game_revenue", "duckdb")

    validation_9 = (
        pb.Validate(
            data=f"duckdb:///{duckdb_path}::game_revenue",
            label="DuckDB Game Revenue Validation"
        )
        .col_exists(["player_id", "session_id", "item_revenue"])
        .col_vals_gt(columns="item_revenue", value=0)
        .interrogate()
    )

    validation_9
    ```

    For comprehensive documentation on supported connection string formats, error handling, and
    installation requirements, see the [`connect_to_table()`](`pointblank.connect_to_table`)
    function. This function handles all the connection logic and provides helpful error messages
    when table specifications are missing or backend dependencies are not installed.
    

Thresholds(warning: 'int | float | bool | None' = None, error: 'int | float | bool | None' = None, critical: 'int | float | bool | None' = None) -> None

    Definition of threshold values.

    Thresholds are used to set limits on the number of failing test units at different levels. The
    levels are 'warning', 'error', and 'critical'. These levels correspond to different levels of
    severity when a threshold is reached. The threshold values can be set as absolute counts or as
    fractions of the total number of test units. When a threshold is reached, an action can be taken
    (e.g., displaying a message or calling a function) if there is an associated action defined for
    that level (defined through the [`Actions`](`pointblank.Actions`) class).

    Parameters
    ----------
    warning
        The threshold for the 'warning' level. This can be an absolute count or a fraction of the
        total. Using `True` will set this threshold value to `1`.
    error
        The threshold for the 'error' level. This can be an absolute count or a fraction of the
        total. Using `True` will set this threshold value to `1`.
    critical
        The threshold for the 'critical' level. This can be an absolute count or a fraction of the
        total. Using `True` will set this threshold value to `1`.

    Returns
    -------
    Thresholds
        A `Thresholds` object. This can be used when using the [`Validate`](`pointblank.Validate`)
        class (to set thresholds globally) or when defining validation steps like
        [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`) (so that threshold values are scoped to
        individual validation steps, overriding any global thresholds).

    Examples
    --------
    In a data validation workflow, you can set thresholds for the number of failing test units at
    different levels. For example, you can set a threshold for the 'warning' level when the number
    of failing test units exceeds 10% of the total number of test units:

    ```python
    thresholds_1 = pb.Thresholds(warning=0.1)
    ```

    You can also set thresholds for the 'error' and 'critical' levels:

    ```python
    thresholds_2 = pb.Thresholds(warning=0.1, error=0.2, critical=0.05)
    ```

    Thresholds can also be set as absolute counts. Here's an example where the 'warning' level is
    set to `5` failing test units:

    ```python
    thresholds_3 = pb.Thresholds(warning=5)
    ```

    The `thresholds` object can be used to set global thresholds for all validation steps. Or, you
    can set thresholds for individual validation steps, which will override the global thresholds.
    Here's a data validation workflow example where we set global thresholds and then override with
    different thresholds at the [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`) step:

    ```python
    validation = (
        pb.Validate(
            data=pb.load_dataset(dataset="small_table"),
            label="Example Validation",
            thresholds=pb.Thresholds(warning=0.1, error=0.2, critical=0.3)
        )
        .col_vals_not_null(columns=["c", "d"])
        .col_vals_gt(columns="a", value=3, thresholds=pb.Thresholds(warning=5))
        .interrogate()
    )

    validation
    ```

    As can be seen, the last step ([`col_vals_gt()`](`pointblank.Validate.col_vals_gt`)) has its own
    thresholds, which override the global thresholds set at the beginning of the validation workflow
    (in the [`Validate`](`pointblank.Validate`) class).
    

Actions(warning: 'str | Callable | list[str | Callable] | None' = None, error: 'str | Callable | list[str | Callable] | None' = None, critical: 'str | Callable | list[str | Callable] | None' = None, default: 'str | Callable | list[str | Callable] | None' = None, highest_only: 'bool' = True) -> None

    Definition of action values.

    Actions complement threshold values by defining what action should be taken when a threshold
    level is reached. The action can be a string or a `Callable`. When a string is used, it is
    interpreted as a message to be displayed. When a `Callable` is used, it will be invoked at
    interrogation time if the threshold level is met or exceeded.

    There are three threshold levels: 'warning', 'error', and 'critical'. These levels correspond
    to different levels of severity when a threshold is reached. Those thresholds can be defined
    using the [`Thresholds`](`pointblank.Thresholds`) class or various shorthand forms. Actions
    don't have to be defined for all threshold levels; if an action is not defined for a level in
    exceedance, no action will be taken. Likewise, there is no negative consequence (other than a
    no-op) for defining actions for thresholds that don't exist (e.g., setting an action for the
    'critical' level when no corresponding 'critical' threshold has been set).

    Parameters
    ----------
    warning
        A string, `Callable`, or list of `Callable`/string values for the 'warning' level. Using
        `None` means no action should be performed at the 'warning' level.
    error
        A string, `Callable`, or list of `Callable`/string values for the 'error' level. Using
        `None` means no action should be performed at the 'error' level.
    critical
        A string, `Callable`, or list of `Callable`/string values for the 'critical' level. Using
        `None` means no action should be performed at the 'critical' level.
    default
        A string, `Callable`, or list of `Callable`/string values for all threshold levels. This
        parameter can be used to set the same action for all threshold levels. If an action is
        defined for a specific threshold level, it will override the action set for all levels.
    highest_only
        A boolean value that, when set to `True` (the default), results in executing only the action
        for the highest threshold level that is exceeded. Useful when you want to ensure that only
        the most severe action is taken when multiple threshold levels are exceeded.

    Returns
    -------
    Actions
        An `Actions` object. This can be used when using the [`Validate`](`pointblank.Validate`)
        class (to set actions for meeting different threshold levels globally) or when defining
        validation steps like [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`) (so that actions
        are scoped to individual validation steps, overriding any globally set actions).

    Types of Actions
    ----------------
    Actions can be defined in different ways:

    1. **String**: A message to be displayed when the threshold level is met or exceeded.
    2. **Callable**: A function that is called when the threshold level is met or exceeded.
    3. **List of Strings/Callables**: Multiple messages or functions to be called when the threshold
       level is met or exceeded.

    The actions are executed at interrogation time when the threshold level assigned to the action
    is exceeded by the number or proportion of failing test units. When providing a string, it will
    simply be printed to the console. A callable will also be executed at the time of interrogation.
    If providing a list of strings or callables, each item in the list will be executed in order.
    Such a list can contain a mix of strings and callables.

    String Templating
    -----------------
    When using a string as an action, you can include placeholders for the following variables:

    - `{type}`: The validation step type where the action is executed (e.g., 'col_vals_gt',
    'col_vals_lt', etc.)
    - `{level}`: The threshold level where the action is executed ('warning', 'error', or
    'critical')
    - `{step}` or `{i}`: The step number in the validation workflow where the action is executed
    - `{col}` or `{column}`: The column name where the action is executed
    - `{val}` or `{value}`: An associated value for the validation method (e.g., the value to
    compare against in a 'col_vals_gt' validation step)
    - `{time}`: A datetime value for when the action was executed

    The first two placeholders can also be used in uppercase (e.g., `{TYPE}` or `{LEVEL}`) and the
    corresponding values will be displayed in uppercase. The placeholders are replaced with the
    actual values during interrogation.

    For example, the string `"{LEVEL}: '{type}' threshold exceeded for column {col}."` will be
    displayed as `"WARNING: 'col_vals_gt' threshold exceeded for column a."` when the 'warning'
    threshold is exceeded in a 'col_vals_gt' validation step involving column `a`.

    Crafting Callables with `get_action_metadata()`
    -----------------------------------------------
    When creating a callable function to be used as an action, you can use the
    [`get_action_metadata()`](`pointblank.get_action_metadata`) function to retrieve metadata about
    the step where the action is executed. This metadata contains information about the validation
    step, including the step type, level, step number, column name, and associated value. You can
    use this information to craft your action message or to take specific actions based on the
    metadata provided.

    Examples
    --------
    Let's define both threshold values and actions for a data validation workflow. We'll set these
    thresholds and actions globally for all validation steps. In this specific example, the only
    actions we'll define are for the 'critical' level:

    ```python
    import pointblank as pb

    validation = (
        pb.Validate(
            data=pb.load_dataset(dataset="game_revenue", tbl_type="duckdb"),
            thresholds=pb.Thresholds(warning=0.05, error=0.10, critical=0.15),
            actions=pb.Actions(critical="Major data quality issue found in step {step}."),
        )
        .col_vals_regex(columns="player_id", pattern=r"[A-Z]{12}[0-9]{3}")
        .col_vals_gt(columns="item_revenue", value=0.05)
        .col_vals_gt(columns="session_duration", value=15)
        .interrogate()
    )

    validation
    ```

    Because we set the 'critical' action to display `"Major data quality issue found."` in the
    console, this message will be displayed if the number of failing test units exceeds the
    'critical' threshold (set to 15% of the total number of test units). In step 3 of the validation
    workflow, the 'critical' threshold is exceeded, so the message is displayed in the console.

    Actions can be defined locally for individual validation steps, which will override any global
    actions set at the beginning of the validation workflow. Here's a variation of the above example
    where we set global threshold values but assign an action only for an individual validation
    step:

    ```python
    def dq_issue():
        from datetime import datetime

        print(f"Data quality issue found ({datetime.now()}).")

    validation = (
        pb.Validate(
            data=pb.load_dataset(dataset="game_revenue", tbl_type="duckdb"),
            thresholds=pb.Thresholds(warning=0.05, error=0.10, critical=0.15),
        )
        .col_vals_regex(columns="player_id", pattern=r"[A-Z]{12}[0-9]{3}")
        .col_vals_gt(columns="item_revenue", value=0.05)
        .col_vals_gt(
            columns="session_duration",
            value=15,
            actions=pb.Actions(warning=dq_issue),
        )
        .interrogate()
    )

    validation
    ```

    In this case, the 'warning' action is set to call the `dq_issue()` function. This action is
    only executed when the 'warning' threshold is exceeded in the 'session_duration' column. Because
    all three thresholds are exceeded in step 3, the 'warning' action of executing the function
    occurs (resulting in a message being printed to the console). If actions were set for the other
    two threshold levels, they would also be executed.

    See Also
    --------
    The [`get_action_metadata()`](`pointblank.get_action_metadata`) function, which can be used to
    retrieve metadata about the step where the action is executed.
    

FinalActions(*args)

    Define actions to be taken after validation is complete.

    Final actions are executed after all validation steps have been completed. They provide a
    mechanism to respond to the overall validation results, such as sending alerts when critical
    failures are detected or generating summary reports.

    Parameters
    ----------
    *actions
        One or more actions to execute after validation. An action can be (1) a callable function
        that will be executed with no arguments, or (2) a string message that will be printed to the
        console.

    Returns
    -------
    FinalActions
        An `FinalActions` object. This can be used when using the
        [`Validate`](`pointblank.Validate`) class (to set final actions for the validation
        workflow).

    Types of Actions
    ----------------
    Final actions can be defined in two different ways:

    1. **String**: A message to be displayed when the validation is complete.
    2. **Callable**: A function that is called when the validation is complete.

    The actions are executed at the end of the validation workflow. When providing a string, it will
    simply be printed to the console. A callable will also be executed at the time of validation
    completion. Several strings and callables can be provided to the `FinalActions` class, and
    they will be executed in the order they are provided.

    Crafting Callables with `get_validation_summary()`
    -------------------------------------------------
    When creating a callable function to be used as a final action, you can use the
    [`get_validation_summary()`](`pointblank.get_validation_summary`) function to retrieve the
    summary of the validation results. This summary contains information about the validation
    workflow, including the number of test units, the number of failing test units, and the
    threshold levels that were exceeded. You can use this information to craft your final action
    message or to take specific actions based on the validation results.

    Examples
    --------
    Final actions provide a powerful way to respond to the overall results of a validation workflow.
    They're especially useful for sending notifications, generating reports, or taking corrective
    actions based on the complete validation outcome.

    The following example shows how to create a final action that checks for critical failures
    and sends an alert:

    ```python
    import pointblank as pb

    def send_alert():
        summary = pb.get_validation_summary()
        if summary["highest_severity"] == "critical":
            print(f"ALERT: Critical validation failures found in {summary['tbl_name']}")

    validation = (
        pb.Validate(
            data=my_data,
            final_actions=pb.FinalActions(send_alert)
        )
        .col_vals_gt(columns="revenue", value=0)
        .interrogate()
    )
    ```

    In this example, the `send_alert()` function is defined to check the validation summary for
    critical failures. If any are found, an alert message is printed to the console. The function is
    passed to the `FinalActions` class, which ensures it will be executed after all validation steps
    are complete. Note that we used the
    [`get_validation_summary()`](`pointblank.get_validation_summary`) function to retrieve the
    summary of the validation results to help craft the alert message.

    Multiple final actions can be provided in a sequence. They will be executed in the order they
    are specified after all validation steps have completed:

    ```python
    validation = (
        pb.Validate(
            data=my_data,
            final_actions=pb.FinalActions(
                "Validation complete.",  # a string message
                send_alert,              # a callable function
                generate_report          # another callable function
            )
        )
        .col_vals_gt(columns="revenue", value=0)
        .interrogate()
    )
    ```

    See Also
    --------
    The [`get_validation_summary()`](`pointblank.get_validation_summary`) function, which can be
    used to retrieve the summary of the validation results.
    

Schema(columns: 'str | list[str] | list[tuple[str, str]] | list[tuple[str]] | dict[str, str] | None' = None, tbl: 'any | None' = None, **kwargs)
Definition of a schema object.

    The schema object defines the structure of a table. Once it is defined, the object can be used
    in a validation workflow, using `Validate` and its methods, to ensure that the structure of a
    table matches the expected schema. The validation method that works with the schema object is
    called [`col_schema_match()`](`pointblank.Validate.col_schema_match`).

    A schema for a table can be constructed with the `Schema` class in a number of ways:

    1. providing a list of column names to `columns=` (to check only the column names)
    2. using a list of one- or two-element tuples in `columns=` (to check both column names and
    optionally dtypes, should be in the form of `[(column_name, dtype), ...]`)
    3. providing a dictionary to `columns=`, where the keys are column names and the values are
    dtypes
    4. providing individual column arguments in the form of keyword arguments (constructed as
    `column_name=dtype`)

    The schema object can also be constructed by providing a DataFrame or Ibis table object (using
    the `tbl=` parameter) and the schema will be collected from either type of object. The schema
    object can be printed to display the column names and dtypes. Note that if `tbl=` is provided
    then there shouldn't be any other inputs provided through either `columns=` or `**kwargs`.

    Parameters
    ----------
    columns
        A list of strings (representing column names), a list of tuples (for column names and column
        dtypes), or a dictionary containing column and dtype information. If any of these inputs are
        provided here, it will take precedence over any column arguments provided via `**kwargs`.
    tbl
        A DataFrame (Polars or Pandas) or an Ibis table object from which the schema will be
        collected. Read the *Supported Input Table Types* section for details on the supported
        table types.
    **kwargs
        Individual column arguments that are in the form of `column=dtype` or
        `column=[dtype1, dtype2, ...]`. These will be ignored if the `columns=` parameter is not
        `None`.

    Returns
    -------
    Schema
        A schema object.

    Supported Input Table Types
    ---------------------------
    The `tbl=` parameter, if used, can be given any of the following table types:

    - Polars DataFrame (`"polars"`)
    - Pandas DataFrame (`"pandas"`)
    - PySpark table (`"pyspark"`)
    - DuckDB table (`"duckdb"`)*
    - MySQL table (`"mysql"`)*
    - PostgreSQL table (`"postgresql"`)*
    - SQLite table (`"sqlite"`)*
    - Microsoft SQL Server table (`"mssql"`)*
    - Snowflake table (`"snowflake"`)*
    - Databricks table (`"databricks"`)*
    - BigQuery table (`"bigquery"`)*
    - Parquet table (`"parquet"`)*

    The table types marked with an asterisk need to be prepared as Ibis tables (with type of
    `ibis.expr.types.relations.Table`). Furthermore, using `Schema(tbl=)` with these types of tables
    requires the Ibis library (`v9.5.0` or above) to be installed. If the input table is a Polars or
    Pandas DataFrame, the availability of Ibis is not needed.

    Additional Notes on Schema Construction
    ---------------------------------------
    While there is flexibility in how a schema can be constructed, there is the potential for some
    confusion. So let's go through each of the methods of constructing a schema in more detail and
    single out some important points.

    When providing a list of column names to `columns=`, a
    [`col_schema_match()`](`pointblank.Validate.col_schema_match`) validation step will only check
    the column names. Any arguments pertaining to dtypes will be ignored.

    When using a list of tuples in `columns=`, the tuples could contain the column name and dtype
    or just the column name. This construction allows for more flexibility in constructing the
    schema as some columns will be checked for dtypes and others will not. This method is the only
    way to have mixed checks of column names and dtypes in
    [`col_schema_match()`](`pointblank.Validate.col_schema_match`).

    When providing a dictionary to `columns=`, the keys are the column names and the values are the
    dtypes. This method of input is useful in those cases where you might already have a dictionary
    of column names and dtypes that you want to use as the schema.

    If using individual column arguments in the form of keyword arguments, the column names are the
    keyword arguments and the dtypes are the values. This method emphasizes readability and is
    perhaps more convenient when manually constructing a schema with a small number of columns.

    Finally, multiple dtypes can be provided for a single column by providing a list or tuple of
    dtypes in place of a scalar string value. Having multiple dtypes for a column allows for the
    dtype check via [`col_schema_match()`](`pointblank.Validate.col_schema_match`) to make multiple
    attempts at matching the column dtype. Should any of the dtypes match the column dtype, that
    part of the schema check will pass. Here are some examples of how you could provide single and
    multiple dtypes for a column:

    ```python
    # list of tuples
    schema_1 = pb.Schema(columns=[("name", "String"), ("age", ["Float64", "Int64"])])

    # dictionary
    schema_2 = pb.Schema(columns={"name": "String", "age": ["Float64", "Int64"]})

    # keyword arguments
    schema_3 = pb.Schema(name="String", age=["Float64", "Int64"])
    ```

    All of the above examples will construct the same schema object.

    Examples
    --------
    A schema can be constructed via the `Schema` class in multiple ways. Let's use the following
    Polars DataFrame as a basis for constructing a schema:

    ```python
    import pointblank as pb
    import polars as pl

    df = pl.DataFrame({
        "name": ["Alice", "Bob", "Charlie"],
        "age": [25, 30, 35],
        "height": [5.6, 6.0, 5.8]
    })
    ```

    You could provide `Schema(columns=)` a list of tuples containing column names and data types:

    ```python
    schema = pb.Schema(columns=[("name", "String"), ("age", "Int64"), ("height", "Float64")])
    ```

    Alternatively, a dictionary containing column names and dtypes also works:

    ```python
    schema = pb.Schema(columns={"name": "String", "age": "Int64", "height": "Float64"})
    ```

    Another input method involves using individual column arguments in the form of keyword
    arguments:

    ```python
    schema = pb.Schema(name="String", age="Int64", height="Float64")
    ```

    Finally, could also provide a DataFrame (Polars and Pandas) or an Ibis table object to `tbl=`
    and the schema will be collected:

    ```python
    schema = pb.Schema(tbl=df)
    ```

    Whichever method you choose, you can verify the schema inputs by printing the `schema` object:

    ```python
    print(schema)
    ```

    The `Schema` object can be used to validate the structure of a table against the schema. The
    relevant `Validate` method for this is
    [`col_schema_match()`](`pointblank.Validate.col_schema_match`). In a validation workflow, you'll
    have a target table (defined at the beginning of the workflow) and you might want to ensure that
    your expectations of the table structure are met. The
    [`col_schema_match()`](`pointblank.Validate.col_schema_match`) method works with a `Schema`
    object to validate the structure of the table. Here's an example of how you could use
    [`col_schema_match()`](`pointblank.Validate.col_schema_match`) in a validation workflow:

    ```python
    # Define the schema
    schema = pb.Schema(name="String", age="Int64", height="Float64")

    # Define a validation that checks the schema against the table (`df`)
    validation = (
        pb.Validate(data=df)
        .col_schema_match(schema=schema)
        .interrogate()
    )

    # Display the validation results
    validation
    ```

    The [`col_schema_match()`](`pointblank.Validate.col_schema_match`) validation method will
    validate the structure of the table against the schema during interrogation. If the structure of
    the table does not match the schema, the single test unit will fail. In this case, the defined
    schema matched the structure of the table, so the validation passed.

    We can also choose to check only the column names of the target table. This can be done by
    providing a simplified `Schema` object, which is given a list of column names:

    ```python
    schema = pb.Schema(columns=["name", "age", "height"])

    validation = (
        pb.Validate(data=df)
        .col_schema_match(schema=schema)
        .interrogate()
    )

    validation
    ```

    In this case, the schema only checks the column names of the table against the schema during
    interrogation. If the column names of the table do not match the schema, the single test unit
    will fail. In this case, the defined schema matched the column names of the table, so the
    validation passed.

    If you wanted to check column names and dtypes only for a subset of columns (and just the column
    names for the rest), you could use a list of mixed one- or two-item tuples in `columns=`:

    ```python
    schema = pb.Schema(columns=[("name", "String"), ("age", ), ("height", )])

    validation = (
        pb.Validate(data=df)
        .col_schema_match(schema=schema)
        .interrogate()
    )

    validation
    ```

    Not specifying a dtype for a column (as is the case for the `age` and `height` columns in the
    above example) will only check the column name.

    There may also be the case where you want to check the column names and specify multiple dtypes
    for a column to have several attempts at matching the dtype. This can be done by providing a
    list of dtypes where there would normally be a single dtype:

    ```python
    schema = pb.Schema(
      columns=[("name", "String"), ("age", ["Float64", "Int64"]), ("height", "Float64")]
    )

    validation = (
        pb.Validate(data=df)
        .col_schema_match(schema=schema)
        .interrogate()
    )

    validation
    ```

    For the `age` column, the schema will check for both `Float64` and `Int64` dtypes. If either of
    these dtypes is found in the column, the portion of the schema check will succeed.

    See Also
    --------
    The [`col_schema_match()`](`pointblank.Validate.col_schema_match`) validation method, where a
    `Schema` object is used in a validation workflow.
    

DraftValidation(data: 'FrameT | Any', model: 'str', api_key: 'str | None' = None, verify_ssl: 'bool' = True) -> None

    Draft a validation plan for a given table using an LLM.

    By using a large language model (LLM) to draft a validation plan, you can quickly generate a
    starting point for validating a table. This can be useful when you have a new table and you
    want to get a sense of how to validate it (and adjustments could always be made later). The
    `DraftValidation` class uses the `chatlas` package to draft a validation plan for a given table
    using an LLM from either the `"anthropic"`, `"openai"`, `"ollama"` or `"bedrock"` provider. You
    can install all requirements for the class through an optional 'generate' install of Pointblank
    via `pip install pointblank[generate]`.

    :::{.callout-warning}
    The `DraftValidation` class is still experimental. Please report any issues you encounter in
    the [Pointblank issue tracker](https://github.com/posit-dev/pointblank/issues).
    :::

    Parameters
    ----------
    data
        The data to be used for drafting a validation plan.
    model
        The model to be used. This should be in the form of `provider:model` (e.g.,
        `"anthropic:claude-sonnet-4-5"`). Supported providers are `"anthropic"`, `"openai"`,
        `"ollama"`, and `"bedrock"`.
    api_key
        The API key to be used for the model.
    verify_ssl
        Whether to verify SSL certificates when making requests to the LLM provider. Set to `False`
        to disable SSL verification (e.g., when behind a corporate firewall with self-signed
        certificates). Defaults to `True`. Use with caution as disabling SSL verification can pose
        security risks.

    Returns
    -------
    str
        The drafted validation plan.

    Constructing the `model` Argument
    ---------------------------------
    The `model=` argument should be constructed using the provider and model name separated by a
    colon (`provider:model`). The provider text can any of:

    - `"anthropic"` (Anthropic)
    - `"openai"` (OpenAI)
    - `"ollama"` (Ollama)
    - `"bedrock"` (Amazon Bedrock)

    The model name should be the specific model to be used from the provider. Model names are
    subject to change so consult the provider's documentation for the most up-to-date model names.

    Notes on Authentication
    -----------------------
    Providing a valid API key as a string in the `api_key` argument is adequate for getting started
    but you should consider using a more secure method for handling API keys.

    One way to do this is to load the API key from an environent variable and retrieve it using the
    `os` module (specifically the `os.getenv()` function). Places to store the API key might
    include `.bashrc`, `.bash_profile`, `.zshrc`, or `.zsh_profile`.

    Another solution is to store one or more model provider API keys in an `.env` file (in the root
    of your project). If the API keys have correct names (e.g., `ANTHROPIC_API_KEY` or
    `OPENAI_API_KEY`) then DraftValidation will automatically load the API key from the `.env` file
    and there's no need to provide the `api_key` argument. An `.env` file might look like this:

    ```plaintext
    ANTHROPIC_API_KEY="your_anthropic_api_key_here"
    OPENAI_API_KEY="your_openai_api_key_here"
    ```

    There's no need to have the `python-dotenv` package installed when using `.env` files in this
    way.

    Notes on SSL Certificate Verification
    --------------------------------------
    By default, SSL certificate verification is enabled for all requests to LLM providers. However,
    in certain network environments (such as corporate networks with self-signed certificates or
    firewall proxies), you may encounter SSL certificate verification errors.

    To disable SSL verification, set the `verify_ssl` parameter to `False`:

    ```python
    import pointblank as pb

    data = pb.load_dataset(dataset="nycflights", tbl_type="duckdb")

    # Disable SSL verification for networks with self-signed certificates
    pb.DraftValidation(
        data=data,
        model="anthropic:claude-sonnet-4-5",
        verify_ssl=False
    )
    ```

    :::{.callout-warning}
    Disabling SSL verification (through `verify_ssl=False`) can expose your API keys and data to
    man-in-the-middle attacks. Only use this option in trusted network environments and when
    absolutely necessary.
    :::

    Notes on Data Sent to the Model Provider
    ----------------------------------------
    The data sent to the model provider is a JSON summary of the table. This data summary is
    generated internally by `DraftValidation` using the `DataScan` class. The summary includes the
    following information:

    - the number of rows and columns in the table
    - the type of dataset (e.g., Polars, DuckDB, Pandas, etc.)
    - the column names and their types
    - column level statistics such as the number of missing values, min, max, mean, and median, etc.
    - a short list of data values in each column

    The JSON summary is used to provide the model with the necessary information to draft a
    validation plan. As such, even very large tables can be used with the `DraftValidation` class
    since the contents of the table are not sent to the model provider.

    The Amazon Bedrock is a special case since it is a self-hosted model and security controls are
    in place to ensure that data is kept within the user's AWS environment. If using an Ollama
    model all data is handled locally, though only a few models are capable enough to perform the
    task of drafting a validation plan.

    Examples
    --------
    Let's look at how the `DraftValidation` class can be used to draft a validation plan for a
    table. The table to be used is `"nycflights"`, which is available here via the
    [`load_dataset()`](`pointblank.load_dataset`) function. The model to be used is
    `"anthropic:claude-sonnet-4-5"` (which performs very well compared to other LLMs). The
    example assumes that the API key is stored in an `.env` file as `ANTHROPIC_API_KEY`.

    ```python
    import pointblank as pb

    # Load the "nycflights" dataset as a DuckDB table
    data = pb.load_dataset(dataset="nycflights", tbl_type="duckdb")

    # Draft a validation plan for the "nycflights" table
    pb.DraftValidation(data=data, model="anthropic:claude-sonnet-4-5")
    ```

    The output will be a drafted validation plan for the `"nycflights"` table and this will appear
    in the console.

    ````plaintext
    ```python
    import pointblank as pb

    # Define schema based on column names and dtypes
    schema = pb.Schema(columns=[
        ("year", "int64"),
        ("month", "int64"),
        ("day", "int64"),
        ("dep_time", "int64"),
        ("sched_dep_time", "int64"),
        ("dep_delay", "int64"),
        ("arr_time", "int64"),
        ("sched_arr_time", "int64"),
        ("arr_delay", "int64"),
        ("carrier", "string"),
        ("flight", "int64"),
        ("tailnum", "string"),
        ("origin", "string"),
        ("dest", "string"),
        ("air_time", "int64"),
        ("distance", "int64"),
        ("hour", "int64"),
        ("minute", "int64")
    ])

    # The validation plan
    validation = (
        pb.Validate(
            data=your_data,  # Replace your_data with the actual data variable
            label="Draft Validation",
            thresholds=pb.Thresholds(warning=0.10, error=0.25, critical=0.35)
        )
        .col_schema_match(schema=schema)
        .col_vals_not_null(columns=[
            "year", "month", "day", "sched_dep_time", "carrier", "flight",
            "origin", "dest", "distance", "hour", "minute"
        ])
        .col_vals_between(columns="month", left=1, right=12)
        .col_vals_between(columns="day", left=1, right=31)
        .col_vals_between(columns="sched_dep_time", left=106, right=2359)
        .col_vals_between(columns="dep_delay", left=-43, right=1301, na_pass=True)
        .col_vals_between(columns="air_time", left=20, right=695, na_pass=True)
        .col_vals_between(columns="distance", left=17, right=4983)
        .col_vals_between(columns="hour", left=1, right=23)
        .col_vals_between(columns="minute", left=0, right=59)
        .col_vals_in_set(columns="origin", set=["EWR", "LGA", "JFK"])
        .col_count_match(count=18)
        .row_count_match(count=336776)
        .rows_distinct()
        .interrogate()
    )

    validation
    ```
    ````

    The drafted validation plan can be copied and pasted into a Python script or notebook for
    further use. In other words, the generated plan can be adjusted as needed to suit the specific
    requirements of the table being validated.

    Note that the output does not know how the data was obtained, so it uses the placeholder
    `your_data` in the `data=` argument of the `Validate` class. When adapted for use, this should
    be replaced with the actual data variable.
    

## The Validation Steps family

Validation steps can be thought of as sequential validations on the target
data. We call `Validate`'s validation methods to build up a validation plan: a collection of steps
that, in the aggregate, provides good validation coverage.

col_vals_gt(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', value: 'float | int | Column', na_pass: 'bool' = False, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Are column data greater than a fixed value or data in another column?

        The `col_vals_gt()` validation method checks whether column values in a table are
        *greater than* a specified `value=` (the exact comparison used in this function is
        `col_val > value`). The `value=` can be specified as a single, literal value or as a column
        name given in [`col()`](`pointblank.col`). This validation will operate over the number of
        test units that is equal to the number of rows in the table (determined after any `pre=`
        mutation has been applied).

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        value
            The value to compare against. This can be a single value or a single column name given
            in [`col()`](`pointblank.col`). The latter option allows for a column-to-column
            comparison. For more information on which types of values are allowed, see the
            *What Can Be Used in `value=`?* section.
        na_pass
            Should any encountered None, NA, or Null values be considered as passing test units? By
            default, this is `False`. Set to `True` to pass test units with missing values.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        What Can Be Used in `value=`?
        -----------------------------
        The `value=` argument allows for a variety of input types. The most common are:

        - a single numeric value
        - a single date or datetime value
        - A [`col()`](`pointblank.col`) object that represents a column name

        When supplying a number as the basis of comparison, keep in mind that all resolved columns
        must also be numeric. Should you have columns that are of the date or datetime types, you
        can supply a date or datetime value as the `value=` argument. There is flexibility in how
        you provide the date or datetime value, as it can be:

        - a string-based date or datetime (e.g., `"2023-10-01"`, `"2023-10-01 13:45:30"`, etc.)
        - a date or datetime object using the `datetime` module (e.g., `datetime.date(2023, 10, 1)`,
          `datetime.datetime(2023, 10, 1, 13, 45, 30)`, etc.)

        Finally, when supplying a column name in the `value=` argument, it must be specified within
        [`col()`](`pointblank.col`). This is a column-to-column comparison and, crucially, the
        columns being compared must be of the same type (e.g., both numeric, both date, etc.).

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        columns via `columns=` and `value=col(...)` that are expected to be present in the
        transformed table, but may not exist in the table before preprocessing. Regarding the
        lifetime of the transformed table, it only exists during the validation step and is not
        stored in the `Validate` object or used in subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with three numeric columns (`a`,
        `b`, and `c`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [5, 6, 5, 7, 6, 5],
                "b": [1, 2, 1, 2, 2, 2],
                "c": [2, 1, 2, 2, 3, 4],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that values in column `a` are all greater than the value of `4`. We'll
        determine if this validation had any failing test units (there are six test units, one for
        each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_gt(columns="a", value=4)
            .interrogate()
        )

        validation
        ```

        Printing the `validation` object shows the validation table in an HTML viewing environment.
        The validation table shows the single entry that corresponds to the validation step created
        by using `col_vals_gt()`. All test units passed, and there are no failing test units.

        Aside from checking a column against a literal value, we can also use a column name in the
        `value=` argument (with the helper function [`col()`](`pointblank.col`) to perform a
        column-to-column comparison. For the next example, we'll use `col_vals_gt()` to check
        whether the values in column `c` are greater than values in column `b`.

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_gt(columns="c", value=pb.col("b"))
            .interrogate()
        )

        validation
        ```

        The validation table reports two failing test units. The specific failing cases are:

        - Row 1: `c` is `1` and `b` is `2`.
        - Row 3: `c` is `2` and `b` is `2`.
        

col_vals_lt(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', value: 'float | int | Column', na_pass: 'bool' = False, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Are column data less than a fixed value or data in another column?

        The `col_vals_lt()` validation method checks whether column values in a table are
        *less than* a specified `value=` (the exact comparison used in this function is
        `col_val < value`). The `value=` can be specified as a single, literal value or as a column
        name given in [`col()`](`pointblank.col`). This validation will operate over the number of
        test units that is equal to the number of rows in the table (determined after any `pre=`
        mutation has been applied).

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        value
            The value to compare against. This can be a single value or a single column name given
            in [`col()`](`pointblank.col`). The latter option allows for a column-to-column
            comparison. For more information on which types of values are allowed, see the
            *What Can Be Used in `value=`?* section.
        na_pass
            Should any encountered None, NA, or Null values be considered as passing test units? By
            default, this is `False`. Set to `True` to pass test units with missing values.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        What Can Be Used in `value=`?
        -----------------------------
        The `value=` argument allows for a variety of input types. The most common are:

        - a single numeric value
        - a single date or datetime value
        - A [`col()`](`pointblank.col`) object that represents a column name

        When supplying a number as the basis of comparison, keep in mind that all resolved columns
        must also be numeric. Should you have columns that are of the date or datetime types, you
        can supply a date or datetime value as the `value=` argument. There is flexibility in how
        you provide the date or datetime value, as it can be:

        - a string-based date or datetime (e.g., `"2023-10-01"`, `"2023-10-01 13:45:30"`, etc.)
        - a date or datetime object using the `datetime` module (e.g., `datetime.date(2023, 10, 1)`,
          `datetime.datetime(2023, 10, 1, 13, 45, 30)`, etc.)

        Finally, when supplying a column name in the `value=` argument, it must be specified within
        [`col()`](`pointblank.col`). This is a column-to-column comparison and, crucially, the
        columns being compared must be of the same type (e.g., both numeric, both date, etc.).

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        columns via `columns=` and `value=col(...)` that are expected to be present in the
        transformed table, but may not exist in the table before preprocessing. Regarding the
        lifetime of the transformed table, it only exists during the validation step and is not
        stored in the `Validate` object or used in subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with three numeric columns (`a`,
        `b`, and `c`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [5, 6, 5, 9, 7, 5],
                "b": [1, 2, 1, 2, 2, 2],
                "c": [2, 1, 1, 4, 3, 4],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that values in column `a` are all less than the value of `10`. We'll
        determine if this validation had any failing test units (there are six test units, one for
        each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_lt(columns="a", value=10)
            .interrogate()
        )

        validation
        ```

        Printing the `validation` object shows the validation table in an HTML viewing environment.
        The validation table shows the single entry that corresponds to the validation step created
        by using `col_vals_lt()`. All test units passed, and there are no failing test units.

        Aside from checking a column against a literal value, we can also use a column name in the
        `value=` argument (with the helper function [`col()`](`pointblank.col`) to perform a
        column-to-column comparison. For the next example, we'll use `col_vals_lt()` to check
        whether the values in column `b` are less than values in column `c`.

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_lt(columns="b", value=pb.col("c"))
            .interrogate()
        )

        validation
        ```

        The validation table reports two failing test units. The specific failing cases are:

        - Row 1: `b` is `2` and `c` is `1`.
        - Row 2: `b` is `1` and `c` is `1`.
        

col_vals_ge(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', value: 'float | int | Column', na_pass: 'bool' = False, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Are column data greater than or equal to a fixed value or data in another column?

        The `col_vals_ge()` validation method checks whether column values in a table are
        *greater than or equal to* a specified `value=` (the exact comparison used in this function
        is `col_val >= value`). The `value=` can be specified as a single, literal value or as a
        column name given in [`col()`](`pointblank.col`). This validation will operate over the
        number of test units that is equal to the number of rows in the table (determined after any
        `pre=` mutation has been applied).

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        value
            The value to compare against. This can be a single value or a single column name given
            in [`col()`](`pointblank.col`). The latter option allows for a column-to-column
            comparison. For more information on which types of values are allowed, see the
            *What Can Be Used in `value=`?* section.
        na_pass
            Should any encountered None, NA, or Null values be considered as passing test units? By
            default, this is `False`. Set to `True` to pass test units with missing values.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        What Can Be Used in `value=`?
        -----------------------------
        The `value=` argument allows for a variety of input types. The most common are:

        - a single numeric value
        - a single date or datetime value
        - A [`col()`](`pointblank.col`) object that represents a column name

        When supplying a number as the basis of comparison, keep in mind that all resolved columns
        must also be numeric. Should you have columns that are of the date or datetime types, you
        can supply a date or datetime value as the `value=` argument. There is flexibility in how
        you provide the date or datetime value, as it can be:

        - a string-based date or datetime (e.g., `"2023-10-01"`, `"2023-10-01 13:45:30"`, etc.)
        - a date or datetime object using the `datetime` module (e.g., `datetime.date(2023, 10, 1)`,
          `datetime.datetime(2023, 10, 1, 13, 45, 30)`, etc.)

        Finally, when supplying a column name in the `value=` argument, it must be specified within
        [`col()`](`pointblank.col`). This is a column-to-column comparison and, crucially, the
        columns being compared must be of the same type (e.g., both numeric, both date, etc.).

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        columns via `columns=` and `value=col(...)` that are expected to be present in the
        transformed table, but may not exist in the table before preprocessing. Regarding the
        lifetime of the transformed table, it only exists during the validation step and is not
        stored in the `Validate` object or used in subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with three numeric columns (`a`,
        `b`, and `c`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [5, 6, 5, 9, 7, 5],
                "b": [5, 3, 1, 8, 2, 3],
                "c": [2, 3, 1, 4, 3, 4],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that values in column `a` are all greater than or equal to the value of `5`.
        We'll determine if this validation had any failing test units (there are six test units, one
        for each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_ge(columns="a", value=5)
            .interrogate()
        )

        validation
        ```

        Printing the `validation` object shows the validation table in an HTML viewing environment.
        The validation table shows the single entry that corresponds to the validation step created
        by using `col_vals_ge()`. All test units passed, and there are no failing test units.

        Aside from checking a column against a literal value, we can also use a column name in the
        `value=` argument (with the helper function [`col()`](`pointblank.col`) to perform a
        column-to-column comparison. For the next example, we'll use `col_vals_ge()` to check
        whether the values in column `b` are greater than values in column `c`.

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_ge(columns="b", value=pb.col("c"))
            .interrogate()
        )

        validation
        ```

        The validation table reports two failing test units. The specific failing cases are:

        - Row 0: `b` is `2` and `c` is `3`.
        - Row 4: `b` is `3` and `c` is `4`.
        

col_vals_le(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', value: 'float | int | Column', na_pass: 'bool' = False, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Are column data less than or equal to a fixed value or data in another column?

        The `col_vals_le()` validation method checks whether column values in a table are
        *less than or equal to* a specified `value=` (the exact comparison used in this function is
        `col_val <= value`). The `value=` can be specified as a single, literal value or as a column
        name given in [`col()`](`pointblank.col`). This validation will operate over the number of
        test units that is equal to the number of rows in the table (determined after any `pre=`
        mutation has been applied).

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        value
            The value to compare against. This can be a single value or a single column name given
            in [`col()`](`pointblank.col`). The latter option allows for a column-to-column
            comparison. For more information on which types of values are allowed, see the
            *What Can Be Used in `value=`?* section.
        na_pass
            Should any encountered None, NA, or Null values be considered as passing test units? By
            default, this is `False`. Set to `True` to pass test units with missing values.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        What Can Be Used in `value=`?
        -----------------------------
        The `value=` argument allows for a variety of input types. The most common are:

        - a single numeric value
        - a single date or datetime value
        - A [`col()`](`pointblank.col`) object that represents a column name

        When supplying a number as the basis of comparison, keep in mind that all resolved columns
        must also be numeric. Should you have columns that are of the date or datetime types, you
        can supply a date or datetime value as the `value=` argument. There is flexibility in how
        you provide the date or datetime value, as it can be:

        - a string-based date or datetime (e.g., `"2023-10-01"`, `"2023-10-01 13:45:30"`, etc.)
        - a date or datetime object using the `datetime` module (e.g., `datetime.date(2023, 10, 1)`,
          `datetime.datetime(2023, 10, 1, 13, 45, 30)`, etc.)

        Finally, when supplying a column name in the `value=` argument, it must be specified within
        [`col()`](`pointblank.col`). This is a column-to-column comparison and, crucially, the
        columns being compared must be of the same type (e.g., both numeric, both date, etc.).

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        columns via `columns=` and `value=col(...)` that are expected to be present in the
        transformed table, but may not exist in the table before preprocessing. Regarding the
        lifetime of the transformed table, it only exists during the validation step and is not
        stored in the `Validate` object or used in subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with three numeric columns (`a`,
        `b`, and `c`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [5, 6, 5, 9, 7, 5],
                "b": [1, 3, 1, 5, 2, 5],
                "c": [2, 1, 1, 4, 3, 4],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that values in column `a` are all less than or equal to the value of `9`.
        We'll determine if this validation had any failing test units (there are six test units, one
        for each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_le(columns="a", value=9)
            .interrogate()
        )

        validation
        ```

        Printing the `validation` object shows the validation table in an HTML viewing environment.
        The validation table shows the single entry that corresponds to the validation step created
        by using `col_vals_le()`. All test units passed, and there are no failing test units.

        Aside from checking a column against a literal value, we can also use a column name in the
        `value=` argument (with the helper function [`col()`](`pointblank.col`) to perform a
        column-to-column comparison. For the next example, we'll use `col_vals_le()` to check
        whether the values in column `c` are less than values in column `b`.

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_le(columns="c", value=pb.col("b"))
            .interrogate()
        )

        validation
        ```

        The validation table reports two failing test units. The specific failing cases are:

        - Row 0: `c` is `2` and `b` is `1`.
        - Row 4: `c` is `3` and `b` is `2`.
        

col_vals_eq(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', value: 'float | int | Column', na_pass: 'bool' = False, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Are column data equal to a fixed value or data in another column?

        The `col_vals_eq()` validation method checks whether column values in a table are
        *equal to* a specified `value=` (the exact comparison used in this function is
        `col_val == value`). The `value=` can be specified as a single, literal value or as a column
        name given in [`col()`](`pointblank.col`). This validation will operate over the number of
        test units that is equal to the number of rows in the table (determined after any `pre=`
        mutation has been applied).

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        value
            The value to compare against. This can be a single value or a single column name given
            in [`col()`](`pointblank.col`). The latter option allows for a column-to-column
            comparison. For more information on which types of values are allowed, see the
            *What Can Be Used in `value=`?* section.
        na_pass
            Should any encountered None, NA, or Null values be considered as passing test units? By
            default, this is `False`. Set to `True` to pass test units with missing values.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        What Can Be Used in `value=`?
        -----------------------------
        The `value=` argument allows for a variety of input types. The most common are:

        - a single numeric value
        - a single date or datetime value
        - A [`col()`](`pointblank.col`) object that represents a column name

        When supplying a number as the basis of comparison, keep in mind that all resolved columns
        must also be numeric. Should you have columns that are of the date or datetime types, you
        can supply a date or datetime value as the `value=` argument. There is flexibility in how
        you provide the date or datetime value, as it can be:

        - a string-based date or datetime (e.g., `"2023-10-01"`, `"2023-10-01 13:45:30"`, etc.)
        - a date or datetime object using the `datetime` module (e.g., `datetime.date(2023, 10, 1)`,
          `datetime.datetime(2023, 10, 1, 13, 45, 30)`, etc.)

        Finally, when supplying a column name in the `value=` argument, it must be specified within
        [`col()`](`pointblank.col`). This is a column-to-column comparison and, crucially, the
        columns being compared must be of the same type (e.g., both numeric, both date, etc.).

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        columns via `columns=` and `value=col(...)` that are expected to be present in the
        transformed table, but may not exist in the table before preprocessing. Regarding the
        lifetime of the transformed table, it only exists during the validation step and is not
        stored in the `Validate` object or used in subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with two numeric columns (`a` and
        `b`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [5, 5, 5, 5, 5, 5],
                "b": [5, 5, 5, 6, 5, 4],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that values in column `a` are all equal to the value of `5`. We'll determine
        if this validation had any failing test units (there are six test units, one for each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_eq(columns="a", value=5)
            .interrogate()
        )

        validation
        ```

        Printing the `validation` object shows the validation table in an HTML viewing environment.
        The validation table shows the single entry that corresponds to the validation step created
        by using `col_vals_eq()`. All test units passed, and there are no failing test units.

        Aside from checking a column against a literal value, we can also use a column name in the
        `value=` argument (with the helper function [`col()`](`pointblank.col`) to perform a
        column-to-column comparison. For the next example, we'll use `col_vals_eq()` to check
        whether the values in column `a` are equal to the values in column `b`.

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_eq(columns="a", value=pb.col("b"))
            .interrogate()
        )

        validation
        ```

        The validation table reports two failing test units. The specific failing cases are:

        - Row 3: `a` is `5` and `b` is `6`.
        - Row 5: `a` is `5` and `b` is `4`.
        

col_vals_ne(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', value: 'float | int | Column', na_pass: 'bool' = False, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Are column data not equal to a fixed value or data in another column?

        The `col_vals_ne()` validation method checks whether column values in a table are
        *not equal to* a specified `value=` (the exact comparison used in this function is
        `col_val != value`). The `value=` can be specified as a single, literal value or as a column
        name given in [`col()`](`pointblank.col`). This validation will operate over the number of
        test units that is equal to the number of rows in the table (determined after any `pre=`
        mutation has been applied).

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        value
            The value to compare against. This can be a single value or a single column name given
            in [`col()`](`pointblank.col`). The latter option allows for a column-to-column
            comparison. For more information on which types of values are allowed, see the
            *What Can Be Used in `value=`?* section.
        na_pass
            Should any encountered None, NA, or Null values be considered as passing test units? By
            default, this is `False`. Set to `True` to pass test units with missing values.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        What Can Be Used in `value=`?
        -----------------------------
        The `value=` argument allows for a variety of input types. The most common are:

        - a single numeric value
        - a single date or datetime value
        - A [`col()`](`pointblank.col`) object that represents a column name

        When supplying a number as the basis of comparison, keep in mind that all resolved columns
        must also be numeric. Should you have columns that are of the date or datetime types, you
        can supply a date or datetime value as the `value=` argument. There is flexibility in how
        you provide the date or datetime value, as it can be:

        - a string-based date or datetime (e.g., `"2023-10-01"`, `"2023-10-01 13:45:30"`, etc.)
        - a date or datetime object using the `datetime` module (e.g., `datetime.date(2023, 10, 1)`,
          `datetime.datetime(2023, 10, 1, 13, 45, 30)`, etc.)

        Finally, when supplying a column name in the `value=` argument, it must be specified within
        [`col()`](`pointblank.col`). This is a column-to-column comparison and, crucially, the
        columns being compared must be of the same type (e.g., both numeric, both date, etc.).

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        columns via `columns=` and `value=col(...)` that are expected to be present in the
        transformed table, but may not exist in the table before preprocessing. Regarding the
        lifetime of the transformed table, it only exists during the validation step and is not
        stored in the `Validate` object or used in subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with two numeric columns (`a` and
        `b`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [5, 5, 5, 5, 5, 5],
                "b": [5, 6, 3, 6, 5, 8],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that values in column `a` are not equal to the value of `3`. We'll determine
        if this validation had any failing test units (there are six test units, one for each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_ne(columns="a", value=3)
            .interrogate()
        )

        validation
        ```

        Printing the `validation` object shows the validation table in an HTML viewing environment.
        The validation table shows the single entry that corresponds to the validation step created
        by using `col_vals_ne()`. All test units passed, and there are no failing test units.

        Aside from checking a column against a literal value, we can also use a column name in the
        `value=` argument (with the helper function [`col()`](`pointblank.col`) to perform a
        column-to-column comparison. For the next example, we'll use `col_vals_ne()` to check
        whether the values in column `a` aren't equal to the values in column `b`.

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_ne(columns="a", value=pb.col("b"))
            .interrogate()
        )

        validation
        ```

        The validation table reports two failing test units. The specific failing cases are in rows
        0 and 4, where `a` is `5` and `b` is `5` in both cases (i.e., they are equal to each other).
        

col_vals_between(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', left: 'float | int | Column', right: 'float | int | Column', inclusive: 'tuple[bool, bool]' = (True, True), na_pass: 'bool' = False, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Do column data lie between two specified values or data in other columns?

        The `col_vals_between()` validation method checks whether column values in a table fall
        within a range. The range is specified with three arguments: `left=`, `right=`, and
        `inclusive=`. The `left=` and `right=` values specify the lower and upper bounds. These
        bounds can be specified as literal values or as column names provided within
        [`col()`](`pointblank.col`). The validation will operate over the number of test units that
        is equal to the number of rows in the table (determined after any `pre=` mutation has been
        applied).

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        left
            The lower bound of the range. This can be a single value or a single column name given
            in [`col()`](`pointblank.col`). The latter option allows for a column-to-column
            comparison for this bound. See the *What Can Be Used in `left=` and `right=`?* section
            for details on this.
        right
            The upper bound of the range. This can be a single value or a single column name given
            in [`col()`](`pointblank.col`). The latter option allows for a column-to-column
            comparison for this bound. See the *What Can Be Used in `left=` and `right=`?* section
            for details on this.
        inclusive
            A tuple of two boolean values indicating whether the comparison should be inclusive. The
            position of the boolean values correspond to the `left=` and `right=` values,
            respectively. By default, both values are `True`.
        na_pass
            Should any encountered None, NA, or Null values be considered as passing test units? By
            default, this is `False`. Set to `True` to pass test units with missing values.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        What Can Be Used in `left=` and `right=`?
        -----------------------------------------
        The `left=` and `right=` arguments both allow for a variety of input types. The most common
        are:

        - a single numeric value
        - a single date or datetime value
        - A [`col()`](`pointblank.col`) object that represents a column in the target table

        When supplying a number as the basis of comparison, keep in mind that all resolved columns
        must also be numeric. Should you have columns that are of the date or datetime types, you
        can supply a date or datetime value within `left=` and `right=`. There is flexibility in how
        you provide the date or datetime values for the bounds; they can be:

        - string-based dates or datetimes (e.g., `"2023-10-01"`, `"2023-10-01 13:45:30"`, etc.)
        - date or datetime objects using the `datetime` module (e.g., `datetime.date(2023, 10, 1)`,
        `datetime.datetime(2023, 10, 1, 13, 45, 30)`, etc.)

        Finally, when supplying a column name in either `left=` or `right=` (or both), it must be
        specified within [`col()`](`pointblank.col`). This facilitates column-to-column comparisons
        and, crucially, the columns being compared to either/both of the bounds must be of the same
        type as the column data (e.g., all numeric, all dates, etc.).

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        columns via `columns=` and `left=col(...)`/`right=col(...)` that are expected to be present
        in the transformed table, but may not exist in the table before preprocessing. Regarding the
        lifetime of the transformed table, it only exists during the validation step and is not
        stored in the `Validate` object or used in subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with three numeric columns (`a`,
        `b`, and `c`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [2, 3, 2, 4, 3, 4],
                "b": [5, 6, 1, 6, 8, 5],
                "c": [9, 8, 8, 7, 7, 8],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that values in column `a` are all between the fixed boundary values of `1`
        and `5`. We'll determine if this validation had any failing test units (there are six test
        units, one for each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_between(columns="a", left=1, right=5)
            .interrogate()
        )

        validation
        ```

        Printing the `validation` object shows the validation table in an HTML viewing environment.
        The validation table shows the single entry that corresponds to the validation step created
        by using `col_vals_between()`. All test units passed, and there are no failing test units.

        Aside from checking a column against two literal values representing the lower and upper
        bounds, we can also provide column names to the `left=` and/or `right=` arguments (by using
        the helper function [`col()`](`pointblank.col`). In this way, we can perform three
        additional comparison types:

        1. `left=column`, `right=column`
        2. `left=literal`, `right=column`
        3. `left=column`, `right=literal`

        For the next example, we'll use `col_vals_between()` to check whether the values in column
        `b` are between than corresponding values in columns `a` (lower bound) and `c` (upper
        bound).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_between(columns="b", left=pb.col("a"), right=pb.col("c"))
            .interrogate()
        )

        validation
        ```

        The validation table reports two failing test units. The specific failing cases are:

        - Row 2: `b` is `1` but the bounds are `2` (`a`) and `8` (`c`).
        - Row 4: `b` is `8` but the bounds are `3` (`a`) and `7` (`c`).
        

col_vals_outside(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', left: 'float | int | Column', right: 'float | int | Column', inclusive: 'tuple[bool, bool]' = (True, True), na_pass: 'bool' = False, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Do column data lie outside of two specified values or data in other columns?

        The `col_vals_between()` validation method checks whether column values in a table *do not*
        fall within a certain range. The range is specified with three arguments: `left=`, `right=`,
        and `inclusive=`. The `left=` and `right=` values specify the lower and upper bounds. These
        bounds can be specified as literal values or as column names provided within
        [`col()`](`pointblank.col`). The validation will operate over the number of test units that
        is equal to the number of rows in the table (determined after any `pre=` mutation has been
        applied).

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        left
            The lower bound of the range. This can be a single value or a single column name given
            in [`col()`](`pointblank.col`). The latter option allows for a column-to-column
            comparison for this bound. See the *What Can Be Used in `left=` and `right=`?* section
            for details on this.
        right
            The upper bound of the range. This can be a single value or a single column name given
            in [`col()`](`pointblank.col`). The latter option allows for a column-to-column
            comparison for this bound. See the *What Can Be Used in `left=` and `right=`?* section
            for details on this.
        inclusive
            A tuple of two boolean values indicating whether the comparison should be inclusive. The
            position of the boolean values correspond to the `left=` and `right=` values,
            respectively. By default, both values are `True`.
        na_pass
            Should any encountered None, NA, or Null values be considered as passing test units? By
            default, this is `False`. Set to `True` to pass test units with missing values.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        What Can Be Used in `left=` and `right=`?
        -----------------------------------------
        The `left=` and `right=` arguments both allow for a variety of input types. The most common
        are:

        - a single numeric value
        - a single date or datetime value
        - A [`col()`](`pointblank.col`) object that represents a column in the target table

        When supplying a number as the basis of comparison, keep in mind that all resolved columns
        must also be numeric. Should you have columns that are of the date or datetime types, you
        can supply a date or datetime value within `left=` and `right=`. There is flexibility in how
        you provide the date or datetime values for the bounds; they can be:

        - string-based dates or datetimes (e.g., `"2023-10-01"`, `"2023-10-01 13:45:30"`, etc.)
        - date or datetime objects using the `datetime` module (e.g., `datetime.date(2023, 10, 1)`,
        `datetime.datetime(2023, 10, 1, 13, 45, 30)`, etc.)

        Finally, when supplying a column name in either `left=` or `right=` (or both), it must be
        specified within [`col()`](`pointblank.col`). This facilitates column-to-column comparisons
        and, crucially, the columns being compared to either/both of the bounds must be of the same
        type as the column data (e.g., all numeric, all dates, etc.).

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        columns via `columns=` and `left=col(...)`/`right=col(...)` that are expected to be present
        in the transformed table, but may not exist in the table before preprocessing. Regarding the
        lifetime of the transformed table, it only exists during the validation step and is not
        stored in the `Validate` object or used in subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with three numeric columns (`a`,
        `b`, and `c`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [5, 6, 5, 7, 5, 5],
                "b": [2, 3, 6, 4, 3, 6],
                "c": [9, 8, 8, 9, 9, 7],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that values in column `a` are all outside the fixed boundary values of `1`
        and `4`. We'll determine if this validation had any failing test units (there are six test
        units, one for each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_outside(columns="a", left=1, right=4)
            .interrogate()
        )

        validation
        ```

        Printing the `validation` object shows the validation table in an HTML viewing environment.
        The validation table shows the single entry that corresponds to the validation step created
        by using `col_vals_outside()`. All test units passed, and there are no failing test units.

        Aside from checking a column against two literal values representing the lower and upper
        bounds, we can also provide column names to the `left=` and/or `right=` arguments (by using
        the helper function [`col()`](`pointblank.col`). In this way, we can perform three
        additional comparison types:

        1. `left=column`, `right=column`
        2. `left=literal`, `right=column`
        3. `left=column`, `right=literal`

        For the next example, we'll use `col_vals_outside()` to check whether the values in column
        `b` are outside of the range formed by the corresponding values in columns `a` (lower bound)
        and `c` (upper bound).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_outside(columns="b", left=pb.col("a"), right=pb.col("c"))
            .interrogate()
        )

        validation
        ```

        The validation table reports two failing test units. The specific failing cases are:

        - Row 2: `b` is `6` and the bounds are `5` (`a`) and `8` (`c`).
        - Row 5: `b` is `6` and the bounds are `5` (`a`) and `7` (`c`).
        

col_vals_in_set(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', set: 'Collection[Any]', pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Validate whether column values are in a set of values.

        The `col_vals_in_set()` validation method checks whether column values in a table are part
        of a specified `set=` of values. This validation will operate over the number of test units
        that is equal to the number of rows in the table (determined after any `pre=` mutation has
        been applied).

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        set
            A collection of values to compare against. Can be a list of values, a Python Enum class,
            or a collection containing Enum instances. When an Enum class is provided, all enum
            values will be used. When a collection contains Enum instances, their values will be
            extracted automatically.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        a column via `columns=` that is expected to be present in the transformed table, but may not
        exist in the table before preprocessing. Regarding the lifetime of the transformed table, it
        only exists during the validation step and is not stored in the `Validate` object or used in
        subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with two numeric columns (`a` and
        `b`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [5, 2, 4, 6, 2, 5],
                "b": [5, 8, 2, 6, 5, 1],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that values in column `a` are all in the set of `[2, 3, 4, 5, 6]`. We'll
        determine if this validation had any failing test units (there are six test units, one for
        each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_in_set(columns="a", set=[2, 3, 4, 5, 6])
            .interrogate()
        )

        validation
        ```

        Printing the `validation` object shows the validation table in an HTML viewing environment.
        The validation table shows the single entry that corresponds to the validation step created
        by using `col_vals_in_set()`. All test units passed, and there are no failing test units.

        Now, let's use that same set of values for a validation on column `b`.

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_in_set(columns="b", set=[2, 3, 4, 5, 6])
            .interrogate()
        )

        validation
        ```

        The validation table reports two failing test units. The specific failing cases are for the
        column `b` values of `8` and `1`, which are not in the set of `[2, 3, 4, 5, 6]`.

        **Using Python Enums**

        The `col_vals_in_set()` method also supports Python Enum classes and instances, which can
        make validations more readable and maintainable:

        ```python
        from enum import Enum

        class Color(Enum):
            RED = "red"
            GREEN = "green"
            BLUE = "blue"

        # Create a table with color data
        tbl_colors = pl.DataFrame({
            "product": ["shirt", "pants", "hat", "shoes"],
            "color": ["red", "blue", "green", "yellow"]
        })

        # Validate using an Enum class (all enum values are allowed)
        validation = (
            pb.Validate(data=tbl_colors)
            .col_vals_in_set(columns="color", set=Color)
            .interrogate()
        )

        validation
        ```

        This validation will fail for the `"yellow"` value since it's not in the `Color` enum.

        You can also use specific Enum instances or mix them with regular values:

        ```python
        # Validate using specific Enum instances
        validation = (
            pb.Validate(data=tbl_colors)
            .col_vals_in_set(columns="color", set=[Color.RED, Color.BLUE])
            .interrogate()
        )

        # Mix Enum instances with regular values
        validation = (
            pb.Validate(data=tbl_colors)
            .col_vals_in_set(columns="color", set=[Color.RED, Color.BLUE, "yellow"])
            .interrogate()
        )

        validation
        ```

        In this case, the `"green"` value will cause a failing test unit since it's not part of the
        specified set.
        

col_vals_not_in_set(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', set: 'Collection[Any]', pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Validate whether column values are not in a set of values.

        The `col_vals_not_in_set()` validation method checks whether column values in a table are
        *not* part of a specified `set=` of values. This validation will operate over the number of
        test units that is equal to the number of rows in the table (determined after any `pre=`
        mutation has been applied).

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        set
            A collection of values to compare against. Can be a list of values, a Python Enum class,
            or a collection containing Enum instances. When an Enum class is provided, all enum
            values will be used. When a collection contains Enum instances, their values will be
            extracted automatically.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        a column via `columns=` that is expected to be present in the transformed table, but may not
        exist in the table before preprocessing. Regarding the lifetime of the transformed table, it
        only exists during the validation step and is not stored in the `Validate` object or used in
        subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with two numeric columns (`a` and
        `b`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [7, 8, 1, 9, 1, 7],
                "b": [1, 8, 2, 6, 9, 1],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that none of the values in column `a` are in the set of `[2, 3, 4, 5, 6]`.
        We'll determine if this validation had any failing test units (there are six test units, one
        for each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_not_in_set(columns="a", set=[2, 3, 4, 5, 6])
            .interrogate()
        )

        validation
        ```

        Printing the `validation` object shows the validation table in an HTML viewing environment.
        The validation table shows the single entry that corresponds to the validation step created
        by using `col_vals_not_in_set()`. All test units passed, and there are no failing test
        units.

        Now, let's use that same set of values for a validation on column `b`.

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_not_in_set(columns="b", set=[2, 3, 4, 5, 6])
            .interrogate()
        )

        validation
        ```

        The validation table reports two failing test units. The specific failing cases are for the
        column `b` values of `2` and `6`, both of which are in the set of `[2, 3, 4, 5, 6]`.

        **Using Python Enums**

        Like `col_vals_in_set()`, this method also supports Python Enum classes and instances:

        ```python
        from enum import Enum

        class InvalidStatus(Enum):
            DELETED = "deleted"
            ARCHIVED = "archived"

        # Create a table with status data
        status_table = pl.DataFrame({
            "product": ["widget", "gadget", "tool", "device"],
            "status": ["active", "pending", "deleted", "active"]
        })

        # Validate that no values are in the invalid status set
        validation = (
            pb.Validate(data=status_table)
            .col_vals_not_in_set(columns="status", set=InvalidStatus)
            .interrogate()
        )

        validation
        ```

        This `"deleted"` value in the `status` column will fail since it matches one of the invalid
        statuses in the `InvalidStatus` enum.
        

col_vals_increasing(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', allow_stationary: 'bool' = False, decreasing_tol: 'float | None' = None, na_pass: 'bool' = False, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Are column data increasing by row?

        The `col_vals_increasing()` validation method checks whether column values in a table are
        increasing when moving down a table. There are options for allowing missing values in the
        target column, allowing stationary phases (where consecutive values don't change), and even
        one for allowing decreasing movements up to a certain threshold. This validation will
        operate over the number of test units that is equal to the number of rows in the table
        (determined after any `pre=` mutation has been applied).

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        allow_stationary
            An option to allow pauses in increasing values. For example, if the values for the test
            units are `[80, 82, 82, 85, 88]` then the third unit (`82`, appearing a second time)
            would be marked as failing when `allow_stationary` is `False`. Using
            `allow_stationary=True` will result in all the test units in `[80, 82, 82, 85, 88]` to
            be marked as passing.
        decreasing_tol
            An optional threshold value that allows for movement of numerical values in the negative
            direction. By default this is `None` but using a numerical value will set the absolute
            threshold of negative travel allowed across numerical test units. Note that setting a
            value here also has the effect of setting `allow_stationary` to `True`.
        na_pass
            Should any encountered None, NA, or Null values be considered as passing test units? By
            default, this is `False`. Set to `True` to pass test units with missing values.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with a numeric column (`a`). The
        table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [1, 2, 3, 4, 5, 6],
                "b": [1, 2, 2, 3, 4, 5],
                "c": [1, 2, 1, 3, 4, 5],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that values in column `a` are increasing. We'll determine if this validation
        had any failing test units (there are six test units, one for each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_increasing(columns="a")
            .interrogate()
        )

        validation
        ```

        The validation passed as all values in column `a` are increasing. Now let's check column
        `b` which has a stationary value:

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_increasing(columns="b")
            .interrogate()
        )

        validation
        ```

        This validation fails at the third row because the value `2` is repeated. If we want to
        allow stationary values, we can use `allow_stationary=True`:

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_increasing(columns="b", allow_stationary=True)
            .interrogate()
        )

        validation
        ```
        

col_vals_decreasing(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', allow_stationary: 'bool' = False, increasing_tol: 'float | None' = None, na_pass: 'bool' = False, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Are column data decreasing by row?

        The `col_vals_decreasing()` validation method checks whether column values in a table are
        decreasing when moving down a table. There are options for allowing missing values in the
        target column, allowing stationary phases (where consecutive values don't change), and even
        one for allowing increasing movements up to a certain threshold. This validation will
        operate over the number of test units that is equal to the number of rows in the table
        (determined after any `pre=` mutation has been applied).

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        allow_stationary
            An option to allow pauses in decreasing values. For example, if the values for the test
            units are `[88, 85, 85, 82, 80]` then the third unit (`85`, appearing a second time)
            would be marked as failing when `allow_stationary` is `False`. Using
            `allow_stationary=True` will result in all the test units in `[88, 85, 85, 82, 80]` to
            be marked as passing.
        increasing_tol
            An optional threshold value that allows for movement of numerical values in the positive
            direction. By default this is `None` but using a numerical value will set the absolute
            threshold of positive travel allowed across numerical test units. Note that setting a
            value here also has the effect of setting `allow_stationary` to `True`.
        na_pass
            Should any encountered None, NA, or Null values be considered as passing test units? By
            default, this is `False`. Set to `True` to pass test units with missing values.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with a numeric column (`a`). The
        table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [6, 5, 4, 3, 2, 1],
                "b": [5, 4, 4, 3, 2, 1],
                "c": [5, 4, 5, 3, 2, 1],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that values in column `a` are decreasing. We'll determine if this validation
        had any failing test units (there are six test units, one for each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_decreasing(columns="a")
            .interrogate()
        )

        validation
        ```

        The validation passed as all values in column `a` are decreasing. Now let's check column
        `b` which has a stationary value:

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_decreasing(columns="b")
            .interrogate()
        )

        validation
        ```

        This validation fails at the third row because the value `4` is repeated. If we want to
        allow stationary values, we can use `allow_stationary=True`:

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_decreasing(columns="b", allow_stationary=True)
            .interrogate()
        )

        validation
        ```
        

col_vals_null(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Validate whether values in a column are Null.

        The `col_vals_null()` validation method checks whether column values in a table are Null.
        This validation will operate over the number of test units that is equal to the number
        of rows in the table.

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        a column via `columns=` that is expected to be present in the transformed table, but may not
        exist in the table before preprocessing. Regarding the lifetime of the transformed table, it
        only exists during the validation step and is not stored in the `Validate` object or used in
        subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with two numeric columns (`a` and
        `b`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [None, None, None, None],
                "b": [None, 2, None, 9],
            }
        ).with_columns(pl.col("a").cast(pl.Int64))

        pb.preview(tbl)
        ```

        Let's validate that values in column `a` are all Null values. We'll determine if this
        validation had any failing test units (there are four test units, one for each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_null(columns="a")
            .interrogate()
        )

        validation
        ```

        Printing the `validation` object shows the validation table in an HTML viewing environment.
        The validation table shows the single entry that corresponds to the validation step created
        by using `col_vals_null()`. All test units passed, and there are no failing test units.

        Now, let's use that same set of values for a validation on column `b`.

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_null(columns="b")
            .interrogate()
        )

        validation
        ```

        The validation table reports two failing test units. The specific failing cases are for the
        two non-Null values in column `b`.
        

col_vals_not_null(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Validate whether values in a column are not Null.

        The `col_vals_not_null()` validation method checks whether column values in a table are not
        Null. This validation will operate over the number of test units that is equal to the number
        of rows in the table.

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        a column via `columns=` that is expected to be present in the transformed table, but may not
        exist in the table before preprocessing. Regarding the lifetime of the transformed table, it
        only exists during the validation step and is not stored in the `Validate` object or used in
        subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with two numeric columns (`a` and
        `b`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [4, 7, 2, 8],
                "b": [5, None, 1, None],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that none of the values in column `a` are Null values. We'll determine if
        this validation had any failing test units (there are four test units, one for each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_not_null(columns="a")
            .interrogate()
        )

        validation
        ```

        Printing the `validation` object shows the validation table in an HTML viewing environment.
        The validation table shows the single entry that corresponds to the validation step created
        by using `col_vals_not_null()`. All test units passed, and there are no failing test units.

        Now, let's use that same set of values for a validation on column `b`.

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_not_null(columns="b")
            .interrogate()
        )

        validation
        ```

        The validation table reports two failing test units. The specific failing cases are for the
        two Null values in column `b`.
        

col_vals_regex(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', pattern: 'str', na_pass: 'bool' = False, inverse: 'bool' = False, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Validate whether column values match a regular expression pattern.

        The `col_vals_regex()` validation method checks whether column values in a table
        correspond to a `pattern=` matching expression. This validation will operate over the number
        of test units that is equal to the number of rows in the table (determined after any `pre=`
        mutation has been applied).

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        pattern
            A regular expression pattern to compare against.
        na_pass
            Should any encountered None, NA, or Null values be considered as passing test units? By
            default, this is `False`. Set to `True` to pass test units with missing values.
        inverse
            Should the validation step be inverted? If `True`, then the expectation is that column
            values should *not* match the specified `pattern=` regex.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        a column via `columns=` that is expected to be present in the transformed table, but may not
        exist in the table before preprocessing. Regarding the lifetime of the transformed table, it
        only exists during the validation step and is not stored in the `Validate` object or used in
        subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with two string columns (`a` and
        `b`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": ["rb-0343", "ra-0232", "ry-0954", "rc-1343"],
                "b": ["ra-0628", "ra-583", "rya-0826", "rb-0735"],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that all of the values in column `a` match a particular regex pattern. We'll
        determine if this validation had any failing test units (there are four test units, one for
        each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_regex(columns="a", pattern=r"r[a-z]-[0-9]{4}")
            .interrogate()
        )

        validation
        ```

        Printing the `validation` object shows the validation table in an HTML viewing environment.
        The validation table shows the single entry that corresponds to the validation step created
        by using `col_vals_regex()`. All test units passed, and there are no failing test units.

        Now, let's use the same regex for a validation on column `b`.

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_regex(columns="b", pattern=r"r[a-z]-[0-9]{4}")
            .interrogate()
        )

        validation
        ```

        The validation table reports two failing test units. The specific failing cases are for the
        string values of rows 1 and 2 in column `b`.
        

col_vals_within_spec(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', spec: 'str', na_pass: 'bool' = False, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Validate whether column values fit within a specification.

        The `col_vals_within_spec()` validation method checks whether column values in a table
        correspond to a specification (`spec=`) type (details of which are available in the
        *Specifications* section). Specifications include common data types like email addresses,
        URLs, postal codes, vehicle identification numbers (VINs), International Bank Account
        Numbers (IBANs), and more. This validation will operate over the number of test units that
        is equal to the number of rows in the table.

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        spec
            A specification string for defining the specification type. Examples are `"email"`,
            `"url"`, and `"postal_code[USA]"`. See the *Specifications* section for all available
            options.
        na_pass
            Should any encountered None, NA, or Null values be considered as passing test units? By
            default, this is `False`. Set to `True` to pass test units with missing values.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Specifications
        --------------
        A specification type must be used with the `spec=` argument. This is a string-based keyword
        that corresponds to the type of data in the specified columns. The following keywords can
        be used:

        - `"isbn"`: The International Standard Book Number (ISBN) is a unique numerical identifier
          for books. This keyword validates both 10-digit and 13-digit ISBNs.

        - `"vin"`: A vehicle identification number (VIN) is a unique code used by the automotive
          industry to identify individual motor vehicles.

        - `"postal_code[<country_code>]"`: A postal code (also known as postcodes, PIN, or ZIP
          codes) is a series of letters, digits, or both included in a postal address. Because the
          coding varies by country, a country code in either the 2-letter (ISO 3166-1 alpha-2) or
          3-letter (ISO 3166-1 alpha-3) format needs to be supplied (e.g., `"postal_code[US]"` or
          `"postal_code[USA]"`). The keyword alias `"zip"` can be used for US ZIP codes.

        - `"credit_card"`: A credit card number can be validated across a variety of issuers. The
          validation uses the Luhn algorithm.

        - `"iban[<country_code>]"`: The International Bank Account Number (IBAN) is a system of
          identifying bank accounts across countries. Because the length and coding varies by
          country, a country code needs to be supplied (e.g., `"iban[DE]"` or `"iban[DEU]"`).

        - `"swift"`: Business Identifier Codes (also known as SWIFT-BIC, BIC, or SWIFT code) are
          unique identifiers for financial and non-financial institutions.

        - `"phone"`, `"email"`, `"url"`, `"ipv4"`, `"ipv6"`, `"mac"`: Phone numbers, email
          addresses, Internet URLs, IPv4 or IPv6 addresses, and MAC addresses can be validated with
          their respective keywords.

        Only a single `spec=` value should be provided per function call.

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        a column via `columns=` that is expected to be present in the transformed table, but may not
        exist in the table before preprocessing. Regarding the lifetime of the transformed table, it
        only exists during the validation step and is not stored in the `Validate` object or used in
        subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with an email column. The table
        is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "email": [
                    "user@example.com",
                    "admin@test.org",
                    "invalid-email",
                    "contact@company.co.uk",
                ],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that all of the values in the `email` column are valid email addresses.
        We'll determine if this validation had any failing test units (there are four test units,
        one for each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_within_spec(columns="email", spec="email")
            .interrogate()
        )

        validation
        ```

        The validation table shows that one test unit failed (the invalid email address in row 3).
        

col_vals_expr(self, expr: 'any', pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Validate column values using a custom expression.

        The `col_vals_expr()` validation method checks whether column values in a table satisfy a
        custom `expr=` expression. This validation will operate over the number of test units that
        is equal to the number of rows in the table (determined after any `pre=` mutation has been
        applied).

        Parameters
        ----------
        expr
            A column expression that will evaluate each row in the table, returning a boolean value
            per table row. If the target table is a Polars DataFrame, the expression should either
            be a Polars column expression or a Narwhals one. For a Pandas DataFrame, the expression
            should either be a lambda expression or a Narwhals column expression.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Regarding the lifetime of the
        transformed table, it only exists during the validation step and is not stored in the
        `Validate` object or used in subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with three columns (`a`, `b`, and
        `c`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [1, 2, 1, 7, 8, 6],
                "b": [0, 0, 0, 1, 1, 1],
                "c": [0.5, 0.3, 0.8, 1.4, 1.9, 1.2],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that the values in column `a` are all integers. We'll determine if this
        validation had any failing test units (there are six test units, one for each row).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_vals_expr(expr=pl.col("a") % 1 == 0)
            .interrogate()
        )

        validation
        ```

        Printing the `validation` object shows the validation table in an HTML viewing environment.
        The validation table shows the single entry that corresponds to the validation step created
        by using `col_vals_expr()`. All test units passed, with no failing test units.
        

rows_distinct(self, columns_subset: 'str | list[str] | None' = None, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Validate whether rows in the table are distinct.

        The `rows_distinct()` method checks whether rows in the table are distinct. This validation
        will operate over the number of test units that is equal to the number of rows in the table
        (determined after any `pre=` mutation has been applied).

        Parameters
        ----------
        columns_subset
            A single column or a list of columns to use as a subset for the distinct comparison.
            If `None`, then all columns in the table will be used for the comparison. If multiple
            columns are supplied, the distinct comparison will be made over the combination of
            values in those columns.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        columns via `columns_subset=` that are expected to be present in the transformed table, but
        may not exist in the table before preprocessing. Regarding the lifetime of the transformed
        table, it only exists during the validation step and is not stored in the `Validate` object
        or used in subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with three string columns
        (`col_1`, `col_2`, and `col_3`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "col_1": ["a", "b", "c", "d"],
                "col_2": ["a", "a", "c", "d"],
                "col_3": ["a", "a", "d", "e"],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that the rows in the table are distinct with `rows_distinct()`. We'll
        determine if this validation had any failing test units (there are four test units, one for
        each row). A failing test units means that a given row is not distinct from every other row.

        ```python
        validation = (
            pb.Validate(data=tbl)
            .rows_distinct()
            .interrogate()
        )

        validation
        ```

        From this validation table we see that there are no failing test units. All rows in the
        table are distinct from one another.

        We can also use a subset of columns to determine distinctness. Let's specify the subset
        using columns `col_2` and `col_3` for the next validation.

        ```python
        validation = (
            pb.Validate(data=tbl)
            .rows_distinct(columns_subset=["col_2", "col_3"])
            .interrogate()
        )

        validation
        ```

        The validation table reports two failing test units. The first and second rows are
        duplicated when considering only the values in columns `col_2` and `col_3`. There's only
        one set of duplicates but there are two failing test units since each row is compared to all
        others.
        

rows_complete(self, columns_subset: 'str | list[str] | None' = None, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Validate whether row data are complete by having no missing values.

        The `rows_complete()` method checks whether rows in the table are complete. Completeness
        of a row means that there are no missing values within the row. This validation will operate
        over the number of test units that is equal to the number of rows in the table (determined
        after any `pre=` mutation has been applied). A subset of columns can be specified for the
        completeness check. If no subset is provided, all columns in the table will be used.

        Parameters
        ----------
        columns_subset
            A single column or a list of columns to use as a subset for the completeness check. If
            `None` (the default), then all columns in the table will be used.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list). Read the *Segmentation* section for usage information.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that you can refer to
        columns via `columns_subset=` that are expected to be present in the transformed table, but
        may not exist in the table before preprocessing. Regarding the lifetime of the transformed
        table, it only exists during the validation step and is not stored in the `Validate` object
        or used in subsequent validation steps.

        Segmentation
        ------------
        The `segments=` argument allows for the segmentation of a validation step into multiple
        segments. This is useful for applying the same validation step to different subsets of the
        data. The segmentation can be done based on a single column or specific fields within a
        column.

        Providing a single column name will result in a separate validation step for each unique
        value in that column. For example, if you have a column called `"region"` with values
        `"North"`, `"South"`, and `"East"`, the validation step will be applied separately to each
        region.

        Alternatively, you can provide a tuple that specifies a column name and its corresponding
        values to segment on. For example, if you have a column called `"date"` and you want to
        segment on only specific dates, you can provide a tuple like
        `("date", ["2023-01-01", "2023-01-02"])`. Any other values in the column will be disregarded
        (i.e., no validation steps will be created for them).

        A list with a combination of column names and tuples can be provided as well. This allows
        for more complex segmentation scenarios. The following inputs are both valid:

        ```
        # Segments from all unique values in the `region` column
        # and specific dates in the `date` column
        segments=["region", ("date", ["2023-01-01", "2023-01-02"])]

        # Segments from all unique values in the `region` and `date` columns
        segments=["region", "date"]
        ```

        The segmentation is performed during interrogation, and the resulting validation steps will
        be numbered sequentially. Each segment will have its own validation step, and the results
        will be reported separately. This allows for a more granular analysis of the data and helps
        identify issues within specific segments.

        Importantly, the segmentation process will be performed after any preprocessing of the data
        table. Because of this, one can conceivably use the `pre=` argument to generate a column
        that can be used for segmentation. For example, you could create a new column called
        `"segment"` through use of `pre=` and then use that column for segmentation.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with three string columns
        (`col_1`, `col_2`, and `col_3`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "col_1": ["a", None, "c", "d"],
                "col_2": ["a", "a", "c", None],
                "col_3": ["a", "a", "d", None],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that the rows in the table are complete with `rows_complete()`. We'll
        determine if this validation had any failing test units (there are four test units, one for
        each row). A failing test units means that a given row is not complete (i.e., has at least
        one missing value).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .rows_complete()
            .interrogate()
        )

        validation
        ```

        From this validation table we see that there are two failing test units. This is because
        two rows in the table have at least one missing value (the second row and the last row).

        We can also use a subset of columns to determine completeness. Let's specify the subset
        using columns `col_2` and `col_3` for the next validation.

        ```python
        validation = (
            pb.Validate(data=tbl)
            .rows_complete(columns_subset=["col_2", "col_3"])
            .interrogate()
        )

        validation
        ```

        The validation table reports a single failing test units. The last row contains missing
        values in both the `col_2` and `col_3` columns.
        others.
        

col_exists(self, columns: 'str | list[str] | Column | ColumnSelector | ColumnSelectorNarwhals', thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Validate whether one or more columns exist in the table.

        The `col_exists()` method checks whether one or more columns exist in the target table. The
        only requirement is specification of the column names. Each validation step or expectation
        will operate over a single test unit, which is whether the column exists or not.

        Parameters
        ----------
        columns
            A single column or a list of columns to validate. Can also use
            [`col()`](`pointblank.col`) with column selectors to specify one or more columns. If
            multiple columns are supplied or resolved, there will be a separate validation step
            generated for each column.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step(s) meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with a string columns (`a`) and a
        numeric column (`b`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": ["apple", "banana", "cherry", "date"],
                "b": [1, 6, 3, 5],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that the columns `a` and `b` actually exist in the table. We'll determine if
        this validation had any failing test units (each validation will have a single test unit).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_exists(columns=["a", "b"])
            .interrogate()
        )

        validation
        ```

        Printing the `validation` object shows the validation table in an HTML viewing environment.
        The validation table shows two entries (one check per column) generated by the
        `col_exists()` validation step. Both steps passed since both columns provided in `columns=`
        are present in the table.

        Now, let's check for the existence of a different set of columns.

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_exists(columns=["b", "c"])
            .interrogate()
        )

        validation
        ```

        The validation table reports one passing validation step (the check for column `b`) and one
        failing validation step (the check for column `c`, which doesn't exist).
        

col_schema_match(self, schema: 'Schema', complete: 'bool' = True, in_order: 'bool' = True, case_sensitive_colnames: 'bool' = True, case_sensitive_dtypes: 'bool' = True, full_match_dtypes: 'bool' = True, pre: 'Callable | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Do columns in the table (and their types) match a predefined schema?

        The `col_schema_match()` method works in conjunction with an object generated by the
        [`Schema`](`pointblank.Schema`) class. That class object is the expectation for the actual
        schema of the target table. The validation step operates over a single test unit, which is
        whether the schema matches that of the table (within the constraints enforced by the
        `complete=`, and `in_order=` options).

        Parameters
        ----------
        schema
            A `Schema` object that represents the expected schema of the table. This object is
            generated by the [`Schema`](`pointblank.Schema`) class.
        complete
            Should the schema match be complete? If `True`, then the target table must have all
            columns specified in the schema. If `False`, then the table can have additional columns
            not in the schema (i.e., the schema is a subset of the target table's columns).
        in_order
            Should the schema match be in order? If `True`, then the columns in the schema must
            appear in the same order as they do in the target table. If `False`, then the order of
            columns in the schema and the target table can differ.
        case_sensitive_colnames
            Should the schema match be case-sensitive with regard to column names? If `True`, then
            the column names in the schema and the target table must match exactly. If `False`, then
            the column names are compared in a case-insensitive manner.
        case_sensitive_dtypes
            Should the schema match be case-sensitive with regard to column data types? If `True`,
            then the column data types in the schema and the target table must match exactly. If
            `False`, then the column data types are compared in a case-insensitive manner.
        full_match_dtypes
            Should the schema match require a full match of data types? If `True`, then the column
            data types in the schema and the target table must match exactly. If `False` then
            substring matches are allowed, so a schema data type of `Int` would match a target table
            data type of `Int64`.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. Regarding the lifetime of the transformed table, it only exists during the
        validation step and is not stored in the `Validate` object or used in subsequent validation
        steps.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with three columns (string,
        integer, and float). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": ["apple", "banana", "cherry", "date"],
                "b": [1, 6, 3, 5],
                "c": [1.1, 2.2, 3.3, 4.4],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that the columns in the table match a predefined schema. A schema can be
        defined using the [`Schema`](`pointblank.Schema`) class.

        ```python
        schema = pb.Schema(
            columns=[("a", "String"), ("b", "Int64"), ("c", "Float64")]
        )
        ```

        You can print the schema object to verify that the expected schema is as intended.

        ```python
        print(schema)
        ```

        Now, we'll use the `col_schema_match()` method to validate the table against the expected
        `schema` object. There is a single test unit for this validation step (whether the schema
        matches the table or not).

        ```python
        validation = (
            pb.Validate(data=tbl)
            .col_schema_match(schema=schema)
            .interrogate()
        )

        validation
        ```

        The validation table shows that the schema matches the table. The single test unit passed
        since the table columns and their types match the schema.
        

row_count_match(self, count: 'int | FrameT | Any', tol: 'Tolerance' = 0, inverse: 'bool' = False, pre: 'Callable | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Validate whether the row count of the table matches a specified count.

        The `row_count_match()` method checks whether the row count of the target table matches a
        specified count. This validation will operate over a single test unit, which is whether the
        row count matches the specified count.

        We also have the option to invert the validation step by setting `inverse=True`. This will
        make the expectation that the row count of the target table *does not* match the specified
        count.

        Parameters
        ----------
        count
            The expected row count of the table. This can be an integer value, a Polars or Pandas
            DataFrame object, or an Ibis backend table. If a DataFrame/table is provided, the row
            count of that object will be used as the expected count.
        tol
            The tolerance allowable for the row count match. This can be specified as a single
            numeric value (integer or float) or as a tuple of two integers representing the lower
            and upper bounds of the tolerance range. If a single integer value (greater than 1) is
            provided, it represents the absolute bounds of the tolerance, ie. plus or minus the value.
            If a float value (between 0-1) is provided, it represents the relative tolerance, ie.
            plus or minus the relative percentage of the target. If a tuple is provided, it represents
            the lower and upper absolute bounds of the tolerance range. See the examples for more.
        inverse
            Should the validation step be inverted? If `True`, then the expectation is that the row
            count of the target table should not match the specified `count=` value.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Regarding the lifetime of the
        transformed table, it only exists during the validation step and is not stored in the
        `Validate` object or used in subsequent validation steps.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use the built in dataset `"small_table"`. The table can be
        obtained by calling `load_dataset("small_table")`.

        Let's validate that the number of rows in the table matches a fixed value. In this case, we
        will use the value `13` as the expected row count.

        ```python
        validation = (
            pb.Validate(data=small_table)
            .row_count_match(count=13)
            .interrogate()
        )

        validation
        ```

        The validation table shows that the expectation value of `13` matches the actual count of
        rows in the target table. So, the single test unit passed.


        Let's modify our example to show the different ways we can allow some tolerance to our validation
        by using the `tol` argument.

        ```python
        smaller_small_table = small_table.sample(n = 12) # within the lower bound
        validation = (
            pb.Validate(data=smaller_small_table)
            .row_count_match(count=13,tol=(2, 0)) # minus 2 but plus 0, ie. 11-13
            .interrogate()
        )

        validation

        validation = (
            pb.Validate(data=smaller_small_table)
            .row_count_match(count=13,tol=.05) # .05% tolerance of 13
            .interrogate()
        )

        even_smaller_table = small_table.sample(n = 2)
        validation = (
            pb.Validate(data=even_smaller_table)
            .row_count_match(count=13,tol=5) # plus or minus 5; this test will fail
            .interrogate()
        )

        validation
        ```

        
col_count_match(self, count: 'int | FrameT | Any', inverse: 'bool' = False, pre: 'Callable | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Validate whether the column count of the table matches a specified count.

        The `col_count_match()` method checks whether the column count of the target table matches a
        specified count. This validation will operate over a single test unit, which is whether the
        column count matches the specified count.

        We also have the option to invert the validation step by setting `inverse=True`. This will
        make the expectation that column row count of the target table *does not* match the
        specified count.

        Parameters
        ----------
        count
            The expected column count of the table. This can be an integer value, a Polars or Pandas
            DataFrame object, or an Ibis backend table. If a DataFrame/table is provided, the column
            count of that object will be used as the expected count.
        inverse
            Should the validation step be inverted? If `True`, then the expectation is that the
            column count of the target table should not match the specified `count=` value.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Regarding the lifetime of the
        transformed table, it only exists during the validation step and is not stored in the
        `Validate` object or used in subsequent validation steps.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use the built in dataset `"game_revenue"`. The table can be
        obtained by calling `load_dataset("game_revenue")`.

        Let's validate that the number of columns in the table matches a fixed value. In this case,
        we will use the value `11` as the expected column count.

        ```python
        validation = (
            pb.Validate(data=game_revenue)
            .col_count_match(count=11)
            .interrogate()
        )

        validation
        ```

        The validation table shows that the expectation value of `11` matches the actual count of
        columns in the target table. So, the single test unit passed.
        

tbl_match(self, tbl_compare: 'FrameT | Any', pre: 'Callable | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Validate whether the target table matches a comparison table.

        The `tbl_match()` method checks whether the target table's composition matches that of a
        comparison table. The validation performs a comprehensive comparison using progressively
        stricter checks (from least to most stringent):

        1. **Column count match**: both tables must have the same number of columns
        2. **Row count match**: both tables must have the same number of rows
        3. **Schema match (loose)**: column names and dtypes match (case-insensitive, any order)
        4. **Schema match (order)**: columns in the correct order (case-insensitive names)
        5. **Schema match (exact)**: column names match exactly (case-sensitive, correct order)
        6. **Data match**: values in corresponding cells must be identical

        This progressive approach helps identify exactly where tables differ. The validation will
        fail at the first check that doesn't pass, making it easier to diagnose mismatches. This
        validation operates over a single test unit (pass/fail for complete table match).

        Parameters
        ----------
        tbl_compare
            The comparison table to validate against. This can be a DataFrame object (Polars or
            Pandas), an Ibis table object, or a callable that returns a table. If a callable is
            provided, it will be executed during interrogation to obtain the comparison table.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Note that the same preprocessing
        is **not** applied to the comparison table; only the target table is preprocessed. Regarding
        the lifetime of the transformed table, it only exists during the validation step and is not
        stored in the `Validate` object or used in subsequent validation steps.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Cross-Backend Validation
        ------------------------
        The `tbl_match()` method supports **automatic backend coercion** when comparing tables from
        different backends (e.g., comparing a Polars DataFrame against a Pandas DataFrame, or
        comparing database tables from DuckDB/SQLite against in-memory DataFrames). When tables with
        different backends are detected, the comparison table is automatically converted to match the
        data table's backend before validation proceeds.

        **Certified Backend Combinations:**

        All combinations of the following backends have been tested and certified to work (in both
        directions):

        - Pandas DataFrame
        - Polars DataFrame
        - DuckDB (native)
        - DuckDB (as Ibis table)
        - SQLite (via Ibis)

        Note that database backends (DuckDB, SQLite, PostgreSQL, MySQL, Snowflake, BigQuery) are
        automatically materialized during validation:

        - if comparing **against Polars**: materialized to Polars
        - if comparing **against Pandas**: materialized to Pandas
        - if **both tables are database backends**: both materialized to Polars

        This ensures optimal performance and type consistency.

        **Data Types That Work Best in Cross-Backend Validation:**

        - numeric types: int, float columns (including proper NaN handling)
        - string types: text columns with consistent encodings
        - boolean types: True/False values
        - null values: `None` and `NaN` are treated as equivalent across backends
        - list columns: nested list structures (with basic types)

        **Known Limitations:**

        While many data types work well in cross-backend validation, there are some known
        limitations to be aware of:

        - date/datetime types: When converting between Polars and Pandas, date objects may be
          represented differently. For example, `datetime.date` objects in Pandas may become
          `pd.Timestamp` objects when converted from Polars, leading to false mismatches. To work
          around this, ensure both tables use the same datetime representation before comparison.
        - custom types: User-defined types or complex nested structures may not convert cleanly
          between backends and could cause unexpected comparison failures.
        - categorical types: Categorical/factor columns may have different internal
          representations across backends.
        - timezone-aware datetimes: Timezone handling differs between backends and may cause
          comparison issues.

        Here are some ideas to overcome such limitations:

        - for date/datetime columns, consider using `pre=` preprocessing to normalize representations
          before comparison.
        - when working with custom types, manually convert tables to the same backend before using
          `tbl_match()`.
        - use the same datetime precision (e.g., milliseconds vs microseconds) in both tables.

        Examples
        --------
        For the examples here, we'll create two simple tables to demonstrate the `tbl_match()`
        validation.

        ```python
        import pointblank as pb
        import polars as pl

        # Create the first table
        tbl_1 = pl.DataFrame({
            "a": [1, 2, 3, 4],
            "b": ["w", "x", "y", "z"],
            "c": [4.0, 5.0, 6.0, 7.0]
        })

        # Create an identical table
        tbl_2 = pl.DataFrame({
            "a": [1, 2, 3, 4],
            "b": ["w", "x", "y", "z"],
            "c": [4.0, 5.0, 6.0, 7.0]
        })

        pb.preview(tbl_1)
        ```

        Let's validate that `tbl_1` matches `tbl_2`. Since these tables are identical, the
        validation should pass.

        ```python
        validation = (
            pb.Validate(data=tbl_1)
            .tbl_match(tbl_compare=tbl_2)
            .interrogate()
        )

        validation
        ```

        The validation table shows that the single test unit passed, indicating that the two tables
        match completely.

        Now, let's create a table with a slight difference and see what happens.

        ```python
        # Create a table with one different value
        tbl_3 = pl.DataFrame({
            "a": [1, 2, 3, 4],
            "b": ["w", "x", "y", "z"],
            "c": [4.0, 5.5, 6.0, 7.0]  # Changed 5.0 to 5.5
        })

        validation = (
            pb.Validate(data=tbl_1)
            .tbl_match(tbl_compare=tbl_3)
            .interrogate()
        )

        validation
        ```

        The validation table shows that the single test unit failed because the tables don't match
        (one value is different in column `c`).
        

conjointly(self, *exprs: 'Callable', pre: 'Callable | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Perform multiple row-wise validations for joint validity.

        The `conjointly()` validation method checks whether each row in the table passes multiple
        validation conditions simultaneously. This enables compound validation logic where a test
        unit (typically a row) must satisfy all specified conditions to pass the validation.

        This method accepts multiple validation expressions as callables, which should return
        boolean expressions when applied to the data. You can use lambdas that incorporate
        Polars/Pandas/Ibis expressions (based on the target table type) or create more complex
        validation functions. The validation will operate over the number of test units that is
        equal to the number of rows in the table (determined after any `pre=` mutation has been
        applied).

        Parameters
        ----------
        *exprs
            Multiple validation expressions provided as callable functions. Each callable should
            accept a table as its single argument and return a boolean expression or Series/Column
            that evaluates to boolean values for each row.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Regarding the lifetime of the
        transformed table, it only exists during the validation step and is not stored in the
        `Validate` object or used in subsequent validation steps.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        For the examples here, we'll use a simple Polars DataFrame with three numeric columns (`a`,
        `b`, and `c`). The table is shown below:

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [5, 7, 1, 3, 9, 4],
                "b": [6, 3, 0, 5, 8, 2],
                "c": [10, 4, 8, 9, 10, 5],
            }
        )

        pb.preview(tbl)
        ```

        Let's validate that the values in each row satisfy multiple conditions simultaneously:

        1. Column `a` should be greater than 2
        2. Column `b` should be less than 7
        3. The sum of `a` and `b` should be less than the value in column `c`

        We'll use `conjointly()` to check all these conditions together:

        ```python
        validation = (
            pb.Validate(data=tbl)
            .conjointly(
                lambda df: pl.col("a") > 2,
                lambda df: pl.col("b") < 7,
                lambda df: pl.col("a") + pl.col("b") < pl.col("c")
            )
            .interrogate()
        )

        validation
        ```

        The validation table shows that not all rows satisfy all three conditions together. For a
        row to pass the conjoint validation, all three conditions must be true for that row.

        We can also use preprocessing to filter the data before applying the conjoint validation:

        ```python
        # Define preprocessing function for serialization compatibility
        def filter_by_c_gt_5(df):
            return df.filter(pl.col("c") > 5)

        validation = (
            pb.Validate(data=tbl)
            .conjointly(
                lambda df: pl.col("a") > 2,
                lambda df: pl.col("b") < 7,
                lambda df: pl.col("a") + pl.col("b") < pl.col("c"),
                pre=filter_by_c_gt_5
            )
            .interrogate()
        )

        validation
        ```

        This allows for more complex validation scenarios where the data is first prepared and then
        validated against multiple conditions simultaneously.

        Or, you can use the backend-agnostic column expression helper
        [`expr_col()`](`pointblank.expr_col`) to write expressions that work across different table
        backends:

        ```python
        tbl = pl.DataFrame(
            {
                "a": [5, 7, 1, 3, 9, 4],
                "b": [6, 3, 0, 5, 8, 2],
                "c": [10, 4, 8, 9, 10, 5],
            }
        )

        # Using backend-agnostic syntax with expr_col()
        validation = (
            pb.Validate(data=tbl)
            .conjointly(
                lambda df: pb.expr_col("a") > 2,
                lambda df: pb.expr_col("b") < 7,
                lambda df: pb.expr_col("a") + pb.expr_col("b") < pb.expr_col("c")
            )
            .interrogate()
        )

        validation
        ```

        Using [`expr_col()`](`pointblank.expr_col`) allows your validation code to work consistently
        across Pandas, Polars, and Ibis table backends without changes, making your validation
        pipelines more portable.

        See Also
        --------
        Look at the documentation of the [`expr_col()`](`pointblank.expr_col`) function for more
        information on how to use it with different table backends.
        

specially(self, expr: 'Callable', pre: 'Callable | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Perform a specialized validation with customized logic.

        The `specially()` validation method allows for the creation of specialized validation
        expressions that can be used to validate specific conditions or logic in the data. This
        method provides maximum flexibility by accepting a custom callable that encapsulates
        your validation logic.

        The callable function can have one of two signatures:

        - a function accepting a single parameter (the data table): `def validate(data): ...`
        - a function with no parameters: `def validate(): ...`

        The second form is particularly useful for environment validations that don't need to
        inspect the data table.

        The callable function must ultimately return one of:

        1. a single boolean value or boolean list
        2. a table where the final column contains boolean values (column name is unimportant)

        The validation will operate over the number of test units that is equal to the number of
        rows in the data table (if returning a table with boolean values). If returning a scalar
        boolean value, the validation will operate over a single test unit. For a return of a list
        of boolean values, the length of the list constitutes the number of test units.

        Parameters
        ----------
        expr
            A callable function that defines the specialized validation logic. This function should:
            (1) accept the target data table as its single argument (though it may ignore it), or
            (2) take no parameters at all (for environment validations). The function must
            ultimately return boolean values representing validation results. Design your function
            to incorporate any custom parameters directly within the function itself using closure
            variables or default parameters.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
            Have a look at the *Preprocessing* section for more information on how to use this
            argument.
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect. Look at the *Thresholds*
            section for information on how to set threshold levels.
        actions
            Optional actions to take when the validation step meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Preprocessing
        -------------
        The `pre=` argument allows for a preprocessing function or lambda to be applied to the data
        table during interrogation. This function should take a table as input and return a modified
        table. This is useful for performing any necessary transformations or filtering on the data
        before the validation step is applied.

        The preprocessing function can be any callable that takes a table as input and returns a
        modified table. For example, you could use a lambda function to filter the table based on
        certain criteria or to apply a transformation to the data. Regarding the lifetime of the
        transformed table, it only exists during the validation step and is not stored in the
        `Validate` object or used in subsequent validation steps.

        Thresholds
        ----------
        The `thresholds=` parameter is used to set the failure-condition levels for the validation
        step. If they are set here at the step level, these thresholds will override any thresholds
        set at the global level in `Validate(thresholds=...)`.

        There are three threshold levels: 'warning', 'error', and 'critical'. The threshold values
        can either be set as a proportion failing of all test units (a value between `0` to `1`),
        or, the absolute number of failing test units (as integer that's `1` or greater).

        Thresholds can be defined using one of these input schemes:

        1. use the [`Thresholds`](`pointblank.Thresholds`) class (the most direct way to create
        thresholds)
        2. provide a tuple of 1-3 values, where position `0` is the 'warning' level, position `1` is
        the 'error' level, and position `2` is the 'critical' level
        3. create a dictionary of 1-3 value entries; the valid keys: are 'warning', 'error', and
        'critical'
        4. a single integer/float value denoting absolute number or fraction of failing test units
        for the 'warning' level only

        If the number of failing test units exceeds set thresholds, the validation step will be
        marked as 'warning', 'error', or 'critical'. All of the threshold levels don't need to be
        set, you're free to set any combination of them.

        Aside from reporting failure conditions, thresholds can be used to determine the actions to
        take for each level of failure (using the `actions=` parameter).

        Examples
        --------
        The `specially()` method offers maximum flexibility for validation, allowing you to create
        custom validation logic that fits your specific needs. The following examples demonstrate
        different patterns and use cases for this powerful validation approach.

        ### Simple validation with direct table access

        This example shows the most straightforward use case where we create a function that
        directly checks if the sum of two columns is positive.

        ```python
        import pointblank as pb
        import polars as pl

        simple_tbl = pl.DataFrame({
            "a": [5, 7, 1, 3, 9, 4],
            "b": [6, 3, 0, 5, 8, 2]
        })

        # Simple function that validates directly on the table
        def validate_sum_positive(data):
            return data.select(pl.col("a") + pl.col("b") > 0)

        (
            pb.Validate(data=simple_tbl)
            .specially(expr=validate_sum_positive)
            .interrogate()
        )
        ```

        The function returns a Polars DataFrame with a single boolean column indicating whether
        the sum of columns `a` and `b` is positive for each row. Each row in the resulting DataFrame
        is a distinct test unit. This pattern works well for simple validations where you don't need
        configurable parameters.

        ### Advanced validation with closure variables for parameters

        When you need to make your validation configurable, you can use the function factory pattern
        (also known as closures) to create parameterized validations:

        ```python
        # Create a parameterized validation function using closures
        def make_column_ratio_validator(col1, col2, min_ratio):
            def validate_column_ratio(data):
                return data.select((pl.col(col1) / pl.col(col2)) > min_ratio)
            return validate_column_ratio

        (
            pb.Validate(data=simple_tbl)
            .specially(
                expr=make_column_ratio_validator(col1="a", col2="b", min_ratio=0.5)
            )
            .interrogate()
        )
        ```

        This approach allows you to create reusable validation functions that can be configured with
        different parameters without modifying the function itself.

        ### Validation function returning a list of booleans

        This example demonstrates how to create a validation function that returns a list of boolean
        values, where each element represents a separate test unit:

        ```python
        import pointblank as pb
        import polars as pl
        import random

        # Create sample data
        transaction_tbl = pl.DataFrame({
            "transaction_id": [f"TX{i:04d}" for i in range(1, 11)],
            "amount": [120.50, 85.25, 50.00, 240.75, 35.20, 150.00, 85.25, 65.00, 210.75, 90.50],
            "category": ["food", "shopping", "entertainment", "travel", "utilities",
                        "food", "shopping", "entertainment", "travel", "utilities"]
        })

        # Define a validation function that returns a list of booleans
        def validate_transaction_rules(data):
            # Create a list to store individual test results
            test_results = []

            # Check each row individually against multiple business rules
            for row in data.iter_rows(named=True):
                # Rule: transaction IDs must start with "TX" and be 6 chars long
                valid_id = row["transaction_id"].startswith("TX") and len(row["transaction_id"]) == 6

                # Rule: Amounts must be appropriate for their category
                valid_amount = True
                if row["category"] == "food" and (row["amount"] < 10 or row["amount"] > 200):
                    valid_amount = False
                elif row["category"] == "utilities" and (row["amount"] < 20 or row["amount"] > 300):
                    valid_amount = False
                elif row["category"] == "entertainment" and row["amount"] > 100:
                    valid_amount = False

                # A transaction passes if it satisfies both rules
                test_results.append(valid_id and valid_amount)

            return test_results

        (
            pb.Validate(data=transaction_tbl)
            .specially(
                expr=validate_transaction_rules,
                brief="Validate transaction IDs and amounts by category."
            )
            .interrogate()
        )
        ```

        This example shows how to create a validation function that applies multiple business rules
        to each row and returns a list of boolean results. Each boolean in the list represents a
        separate test unit, and a test unit passes only if all rules are satisfied for a given row.

        The function iterates through each row in the data table, checking:

        1. if transaction IDs follow the required format
        2. if transaction amounts are appropriate for their respective categories

        This approach is powerful when you need to apply complex, conditional logic that can't be
        easily expressed using the built-in validation functions.

        ### Table-level validation returning a single boolean

        Sometimes you need to validate properties of the entire table rather than row-by-row. In
        these cases, your function can return a single boolean value:

        ```python
        def validate_table_properties(data):
            # Check if table has at least one row with column 'a' > 10
            has_large_values = data.filter(pl.col("a") > 10).height > 0

            # Check if mean of column 'b' is positive
            has_positive_mean = data.select(pl.mean("b")).item() > 0

            # Return a single boolean for the entire table
            return has_large_values and has_positive_mean

        (
            pb.Validate(data=simple_tbl)
            .specially(expr=validate_table_properties)
            .interrogate()
        )
        ```

        This example demonstrates how to perform multiple checks on the table as a whole and combine
        them into a single validation result.

        ### Environment validation that doesn't use the data table

        The `specially()` validation method can even be used to validate aspects of your environment
        that are completely independent of the data:

        ```python
        def validate_pointblank_version():
            try:
                import importlib.metadata
                version = importlib.metadata.version("pointblank")
                version_parts = version.split(".")

                # Get major and minor components regardless of how many parts there are
                major = int(version_parts[0])
                minor = int(version_parts[1])

                # Check both major and minor components for version `0.9+`
                return (major > 0) or (major == 0 and minor >= 9)

            except Exception as e:
                # More specific error handling could be added here
                print(f"Version check failed: {e}")
                return False

        (
            pb.Validate(data=simple_tbl)
            .specially(
                expr=validate_pointblank_version,
                brief="Check Pointblank version `>=0.9.0`."
            )
            .interrogate()
        )
        ```

        This pattern shows how to validate external dependencies or environment conditions as part
        of your validation workflow. Notice that the function doesn't take any parameters at all,
        which makes it cleaner when the validation doesn't need to access the data table.

        By combining these patterns, you can create sophisticated validation workflows that address
        virtually any data quality requirement in your organization.
        

prompt(self, prompt: 'str', model: 'str', columns_subset: 'str | list[str] | None' = None, batch_size: 'int' = 1000, max_concurrent: 'int' = 3, pre: 'Callable | None' = None, segments: 'SegmentSpec | None' = None, thresholds: 'int | float | bool | tuple | dict | Thresholds' = None, actions: 'Actions | None' = None, brief: 'str | bool | None' = None, active: 'bool' = True) -> 'Validate'

        Validate rows using AI/LLM-powered analysis.

        The `prompt()` validation method uses Large Language Models (LLMs) to validate rows of data
        based on natural language criteria. Similar to other Pointblank validation methods, this
        generates binary test results (pass/fail) that integrate seamlessly with the standard
        reporting framework.

        Like `col_vals_*()` methods, `prompt()` evaluates data against specific criteria, but
        instead of using programmatic rules, it uses natural language prompts interpreted by an LLM.
        Like `rows_distinct()` and `rows_complete()`, it operates at the row level and allows you to
        specify a subset of columns for evaluation using `columns_subset=`.

        The system automatically combines your validation criteria from the `prompt=` parameter with
        the necessary technical context, data formatting instructions, and response structure
        requirements. This is all so you only need to focus on describing your validation logic in
        plain language.

        Each row becomes a test unit that either passes or fails the validation criteria, producing
        the familiar True/False results that appear in Pointblank validation reports. This method
        is particularly useful for complex validation rules that are difficult to express with
        traditional validation methods, such as semantic checks, context-dependent validation, or
        subjective quality assessments.

        Parameters
        ----------
        prompt
            A natural language description of the validation criteria. This prompt should clearly
            describe what constitutes valid vs invalid rows. Some examples:
            `"Each row should contain a valid email address and a realistic person name"`,
            `"Values should indicate positive sentiment"`,
            `"The description should mention a country name"`.
        columns_subset
            A single column or list of columns to include in the validation. If `None`, all columns
            will be included. Specifying fewer columns can improve performance and reduce API costs
            so try to include only the columns necessary for the validation.
        model
            The model to be used. This should be in the form of `provider:model` (e.g.,
            `"anthropic:claude-sonnet-4-5"`). Supported providers are `"anthropic"`, `"openai"`,
            `"ollama"`, and `"bedrock"`. The model name should be the specific model to be used from
            the provider. Model names are subject to change so consult the provider's documentation
            for the most up-to-date model names.
        batch_size
            Number of rows to process in each batch. Larger batches are more efficient but may hit
            API limits. Default is `1000`.
        max_concurrent
            Maximum number of concurrent API requests. Higher values speed up processing but may
            hit rate limits. Default is `3`.
        pre
            An optional preprocessing function or lambda to apply to the data table during
            interrogation. This function should take a table as input and return a modified table.
        segments
            An optional directive on segmentation, which serves to split a validation step into
            multiple (one step per segment). Can be a single column name, a tuple that specifies a
            column name and its corresponding values to segment on, or a combination of both
            (provided as a list).
        thresholds
            Set threshold failure levels for reporting and reacting to exceedences of the levels.
            The thresholds are set at the step level and will override any global thresholds set in
            `Validate(thresholds=...)`. The default is `None`, which means that no thresholds will
            be set locally and global thresholds (if any) will take effect.
        actions
            Optional actions to take when the validation step meets or exceeds any set threshold
            levels. If provided, the [`Actions`](`pointblank.Actions`) class should be used to
            define the actions.
        brief
            An optional brief description of the validation step that will be displayed in the
            reporting table. You can use the templating elements like `"{step}"` to insert
            the step number, or `"{auto}"` to include an automatically generated brief. If `True`
            the entire brief will be automatically generated. If `None` (the default) then there
            won't be a brief.
        active
            A boolean value indicating whether the validation step should be active. Using `False`
            will make the validation step inactive (still reporting its presence and keeping indexes
            for the steps unchanged).

        Returns
        -------
        Validate
            The `Validate` object with the added validation step.

        Constructing the `model` Argument
        ---------------------------------
        The `model=` argument should be constructed using the provider and model name separated by a
        colon (`provider:model`). The provider text can any of:

        - `"anthropic"` (Anthropic)
        - `"openai"` (OpenAI)
        - `"ollama"` (Ollama)
        - `"bedrock"` (Amazon Bedrock)

        The model name should be the specific model to be used from the provider. Model names are
        subject to change so consult the provider's documentation for the most up-to-date model
        names.

        Notes on Authentication
        -----------------------
        API keys are automatically loaded from environment variables or `.env` files and are **not**
        stored in the validation object for security reasons. You should consider using a secure
        method for handling API keys.

        One way to do this is to load the API key from an environment variable and retrieve it using
        the `os` module (specifically the `os.getenv()` function). Places to store the API key might
        include `.bashrc`, `.bash_profile`, `.zshrc`, or `.zsh_profile`.

        Another solution is to store one or more model provider API keys in an `.env` file (in the
        root of your project). If the API keys have correct names (e.g., `ANTHROPIC_API_KEY` or
        `OPENAI_API_KEY`) then the AI validation will automatically load the API key from the `.env`
        file. An `.env` file might look like this:

        ```plaintext
        ANTHROPIC_API_KEY="your_anthropic_api_key_here"
        OPENAI_API_KEY="your_openai_api_key_here"
        ```

        There's no need to have the `python-dotenv` package installed when using `.env` files in
        this way.

        **Provider-specific setup**:

        - **OpenAI**: set `OPENAI_API_KEY` environment variable or create `.env` file
        - **Anthropic**: set `ANTHROPIC_API_KEY` environment variable or create `.env` file
        - **Ollama**: no API key required, just ensure Ollama is running locally
        - **Bedrock**: configure AWS credentials through standard AWS methods

        AI Validation Process
        ---------------------
        The AI validation process works as follows:

        1. data batching: the data is split into batches of the specified size
        2. row deduplication: duplicate rows (based on selected columns) are identified and only
        unique combinations are sent to the LLM for analysis
        3. json conversion: each batch of unique rows is converted to JSON format for the LLM
        4. prompt construction: the user prompt is embedded in a structured system prompt
        5. llm processing: each batch is sent to the LLM for analysis
        6. response parsing: LLM responses are parsed to extract validation results
        7. result projection: results are mapped back to all original rows using row signatures
        8. result aggregation: results from all batches are combined

        **Performance Optimization**: the process uses row signature memoization to avoid redundant
        LLM calls. When multiple rows have identical values in the selected columns, only one
        representative row is validated, and the result is applied to all matching rows. This can
        dramatically reduce API costs and processing time for datasets with repetitive patterns.

        The LLM receives data in this JSON format:

        ```json
        {
          "columns": ["col1", "col2", "col3"],
          "rows": [
            {"col1": "value1", "col2": "value2", "col3": "value3", "_pb_row_index": 0},
            {"col1": "value4", "col2": "value5", "col3": "value6", "_pb_row_index": 1}
          ]
        }
        ```

        The LLM returns validation results in this format:
        ```json
        [
          {"index": 0, "result": true},
          {"index": 1, "result": false}
        ]
        ```

        Prompt Design Tips
        ------------------
        For best results, design prompts that are:

        - boolean-oriented: frame validation criteria to elicit clear valid/invalid responses
        - specific: clearly define what makes a row valid/invalid
        - unambiguous: avoid subjective language that could be interpreted differently
        - context-aware: include relevant business rules or domain knowledge
        - example-driven: consider providing examples in the prompt when helpful

        **Critical**: Prompts must be designed so the LLM can determine whether each row passes or
        fails the validation criteria. The system expects binary validation responses, so avoid
        open-ended questions or prompts that might generate explanatory text instead of clear
        pass/fail judgments.

        Good prompt examples:

        - "Each row should contain a valid email address in the 'email' column and a non-empty name
        in the 'name' column"
        - "The 'sentiment' column should contain positive sentiment words (happy, good, excellent,
        etc.)"
        - "Product descriptions should mention at least one technical specification"

        Poor prompt examples (avoid these):

        - "What do you think about this data?" (too open-ended)
        - "Describe the quality of each row" (asks for description, not validation)
        - "How would you improve this data?" (asks for suggestions, not pass/fail)

        Performance Considerations
        --------------------------
        AI validation is significantly slower than traditional validation methods due to API calls
        to LLM providers. However, performance varies dramatically based on data characteristics:

        **High Memoization Scenarios** (seconds to minutes):

        - data with many duplicate rows in the selected columns
        - low cardinality data (repeated patterns)
        - small number of unique row combinations

        **Low Memoization Scenarios** (minutes to hours):

        - high cardinality data with mostly unique rows
        - large datasets with few repeated patterns
        - all or most rows requiring individual LLM evaluation

        The row signature memoization optimization can reduce processing time significantly when
        data has repetitive patterns. For datasets where every row is unique, expect longer
        processing times similar to validating each row individually.

        **Strategies to Reduce Processing Time**:

        - test on data slices: define a sampling function like `def sample_1000(df): return df.head(1000)`
        and use `pre=sample_1000` to validate on smaller samples
        - filter relevant data: define filter functions like `def active_only(df): return df.filter(df["status"] == "active")`
        and use `pre=active_only` to focus on a specific subset
        - optimize column selection: use `columns_subset=` to include only the columns necessary
        for validation
        - start with smaller batches: begin with `batch_size=100` for testing, then increase
        gradually
        - reduce concurrency: lower `max_concurrent=1` if hitting rate limits
        - use faster/cheaper models: consider using smaller or more efficient models for initial
        testing before switching to more capable models

        Examples
        --------
        The following examples demonstrate how to use AI validation for different types of data
        quality checks. These examples show both basic usage and more advanced configurations with
        custom thresholds and actions.

        **Basic AI validation example:**

        This first example shows a simple validation scenario where we want to check that customer
        records have both valid email addresses and non-empty names. Notice how we use
        `columns_subset=` to focus only on the relevant columns, which improves both performance
        and cost-effectiveness.

        ```python
        import pointblank as pb
        import polars as pl

        # Sample data with email and name columns
        tbl = pl.DataFrame({
            "email": ["john@example.com", "invalid-email", "jane@test.org"],
            "name": ["John Doe", "", "Jane Smith"],
            "age": [25, 30, 35]
        })

        # Validate using AI
        validation = (
            pb.Validate(data=tbl)
            .prompt(
                prompt="Each row should have a valid email address and a non-empty name",
                columns_subset=["email", "name"],  # Only check these columns
                model="openai:gpt-4o-mini",
            )
            .interrogate()
        )

        validation
        ```

        In this example, the AI will identify that the second row fails validation because it has
        an invalid email format (`"invalid-email"`) and the third row also fails because it has an
        empty name field. The validation results will show 2 out of 3 rows failing the criteria.

        **Advanced example with custom thresholds:**

        This more sophisticated example demonstrates how to use AI validation with custom thresholds
        and actions. Here we're validating phone number formats to ensure they include area codes,
        which is a common data quality requirement for customer contact information.

        ```python
        customer_data = pl.DataFrame({
            "customer_id": [1, 2, 3, 4, 5],
            "name": ["John Doe", "Jane Smith", "Bob Johnson", "Alice Brown", "Charlie Davis"],
            "phone_number": [
                "(555) 123-4567",  # Valid with area code
                "555-987-6543",    # Valid with area code
                "123-4567",        # Missing area code
                "(800) 555-1234",  # Valid with area code
                "987-6543"         # Missing area code
            ]
        })

        validation = (
            pb.Validate(data=customer_data)
            .prompt(
                prompt="Do all the phone numbers include an area code?",
                columns_subset="phone_number",  # Only check the `phone_number` column
                model="openai:gpt-4o",
                batch_size=500,
                max_concurrent=5,
                thresholds=pb.Thresholds(warning=0.1, error=0.2, critical=0.3),
                actions=pb.Actions(error="Too many phone numbers missing area codes.")
            )
            .interrogate()
        )
        ```

        This validation will identify that 2 out of 5 phone numbers (40%) are missing area codes,
        which exceeds all threshold levels. The validation will trigger the specified error action
        since the failure rate (40%) is above the error threshold (20%). The AI can recognize
        various phone number formats and determine whether they include area codes.
        

## The Column Selection family

A flexible way to select columns for validation is to use the `col()`
function along with column selection helper functions. A combination of `col()` + `starts_with()`,
`matches()`, etc., allows for the selection of multiple target columns (mapping a validation across
many steps). Furthermore, the `col()` function can be used to declare a comparison column (e.g.,
for the `value=` argument in many `col_vals_*()` methods) when you can't use a fixed value
for comparison.

col(exprs: 'str | ColumnSelector | ColumnSelectorNarwhals') -> 'Column | ColumnLiteral | ColumnSelectorNarwhals'

    Helper function for referencing a column in the input table.

    Many of the validation methods (i.e., `col_vals_*()` methods) in Pointblank have a `value=`
    argument. These validations are comparisons between column values and a literal value, or,
    between column values and adjacent values in another column. The `col()` helper function is used
    to specify that it is a column being referenced, not a literal value.

    The `col()` doesn't check that the column exists in the input table. It acts to signal that the
    value being compared is a column value. During validation (i.e., when
    [`interrogate()`](`pointblank.Validate.interrogate`) is called), Pointblank will then check that
    the column exists in the input table.

    For creating expressions to use with the `conjointly()` validation method, use the
    [`expr_col()`](`pointblank.expr_col`) function instead.

    Parameters
    ----------
    exprs
        Either the name of a single column in the target table, provided as a string, or, an
        expression involving column selector functions (e.g., `starts_with("a")`,
        `ends_with("e") | starts_with("a")`, etc.).

    Returns
    -------
    Column | ColumnLiteral | ColumnSelectorNarwhals:
        A column object or expression representing the column reference.

    Usage with the `columns=` Argument
    -----------------------------------
    The `col()` function can be used in the `columns=` argument of the following validation methods:

    - [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`)
    - [`col_vals_lt()`](`pointblank.Validate.col_vals_lt`)
    - [`col_vals_ge()`](`pointblank.Validate.col_vals_ge`)
    - [`col_vals_le()`](`pointblank.Validate.col_vals_le`)
    - [`col_vals_eq()`](`pointblank.Validate.col_vals_eq`)
    - [`col_vals_ne()`](`pointblank.Validate.col_vals_ne`)
    - [`col_vals_between()`](`pointblank.Validate.col_vals_between`)
    - [`col_vals_outside()`](`pointblank.Validate.col_vals_outside`)
    - [`col_vals_in_set()`](`pointblank.Validate.col_vals_in_set`)
    - [`col_vals_not_in_set()`](`pointblank.Validate.col_vals_not_in_set`)
    - [`col_vals_increasing()`](`pointblank.Validate.col_vals_increasing`)
    - [`col_vals_decreasing()`](`pointblank.Validate.col_vals_decreasing`)
    - [`col_vals_null()`](`pointblank.Validate.col_vals_null`)
    - [`col_vals_not_null()`](`pointblank.Validate.col_vals_not_null`)
    - [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`)
    - [`col_vals_within_spec()`](`pointblank.Validate.col_vals_within_spec`)
    - [`col_exists()`](`pointblank.Validate.col_exists`)

    If specifying a single column with certainty (you have the exact name), `col()` is not necessary
    since you can just pass the column name as a string (though it is still valid to use
    `col("column_name")`, if preferred). However, if you want to select columns based on complex
    logic involving multiple column selector functions (e.g., columns that start with `"a"` but
    don't end with `"e"`), you need to use `col()` to wrap expressions involving column selector
    functions and logical operators such as `&`, `|`, `-`, and `~`.

    Here is an example of such usage with the [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`)
    validation method:

    ```python
    col_vals_gt(columns=col(starts_with("a") & ~ends_with("e")), value=10)
    ```

    If using only a single column selector function, you can pass the function directly to the
    `columns=` argument of the validation method, or, you can use `col()` to wrap the function
    (either is valid though the first is more concise). Here is an example of that simpler usage:

    ```python
    col_vals_gt(columns=starts_with("a"), value=10)
    ```

    Usage with the `value=`, `left=`, and `right=` Arguments
    --------------------------------------------------------
    The `col()` function can be used in the `value=` argument of the following validation methods

    - [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`)
    - [`col_vals_lt()`](`pointblank.Validate.col_vals_lt`)
    - [`col_vals_ge()`](`pointblank.Validate.col_vals_ge`)
    - [`col_vals_le()`](`pointblank.Validate.col_vals_le`)
    - [`col_vals_eq()`](`pointblank.Validate.col_vals_eq`)
    - [`col_vals_ne()`](`pointblank.Validate.col_vals_ne`)

    and in the `left=` and `right=` arguments (either or both) of these two validation methods

    - [`col_vals_between()`](`pointblank.Validate.col_vals_between`)
    - [`col_vals_outside()`](`pointblank.Validate.col_vals_outside`)

    You cannot use column selector functions such as [`starts_with()`](`pointblank.starts_with`)
    in either of the `value=`, `left=`, or `right=` arguments since there would be no guarantee that
    a single column will be resolved from the target table with this approach. The `col()` function
    is used to signal that the value being compared is a column value and not a literal value.

    Available Selectors
    -------------------
    There is a collection of selectors available in pointblank, allowing you to select columns based
    on attributes of column names and positions. The selectors are:

    - [`starts_with()`](`pointblank.starts_with`)
    - [`ends_with()`](`pointblank.ends_with`)
    - [`contains()`](`pointblank.contains`)
    - [`matches()`](`pointblank.matches`)
    - [`everything()`](`pointblank.everything`)
    - [`first_n()`](`pointblank.first_n`)
    - [`last_n()`](`pointblank.last_n`)

    Alternatively, we support selectors from the Narwhals library! Those selectors can additionally
    take advantage of the data types of the columns. The selectors are:

    - `boolean()`
    - `by_dtype()`
    - `categorical()`
    - `matches()`
    - `numeric()`
    - `string()`

    Have a look at the [Narwhals API documentation on selectors](https://narwhals-dev.github.io/narwhals/api-reference/selectors/)
    for more information.

    Examples
    --------
    Suppose we have a table with columns `a` and `b` and we'd like to validate that the values in
    column `a` are greater than the values in column `b`. We can use the `col()` helper function to
    reference the comparison column when creating the validation step.

    ```python
    import pointblank as pb
    import polars as pl

    tbl = pl.DataFrame(
        {
            "a": [5, 6, 5, 7, 6, 5],
            "b": [4, 2, 3, 3, 4, 3],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(columns="a", value=pb.col("b"))
        .interrogate()
    )

    validation
    ```

    From results of the validation table it can be seen that values in `a` were greater than values
    in `b` for every row (or test unit). Using `value=pb.col("b")` specified that the greater-than
    comparison is across columns, not with a fixed literal value.

    If you want to select an arbitrary set of columns upon which to base a validation, you can use
    column selector functions (e.g., [`starts_with()`](`pointblank.starts_with`),
    [`ends_with()`](`pointblank.ends_with`), etc.) to specify columns in the `columns=` argument of
    a validation method. Let's use the [`starts_with()`](`pointblank.starts_with`) column selector
    function to select columns that start with `"paid"` and validate that the values in those
    columns are greater than `10`.

    ```python
    tbl = pl.DataFrame(
        {
            "name": ["Alice", "Bob", "Charlie"],
            "paid_2021": [16.32, 16.25, 15.75],
            "paid_2022": [18.62, 16.95, 18.25],
            "person_id": ["A123", "B456", "C789"],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(columns=pb.col(pb.starts_with("paid")), value=10)
        .interrogate()
    )

    validation
    ```

    In the above example the `col()` function contains the invocation of the
    [`starts_with()`](`pointblank.starts_with`) column selector function. This is not strictly
    necessary when using a single column selector function, so `columns=pb.starts_with("paid")`
    would be equivalent usage here. However, the use of `col()` is required when using multiple
    column selector functions with logical operators. Here is an example of that more complex usage:

    ```python
    tbl = pl.DataFrame(
        {
            "name": ["Alice", "Bob", "Charlie"],
            "hours_2022": [160, 180, 160],
            "hours_2023": [182, 168, 175],
            "hours_2024": [200, 165, 190],
            "paid_2022": [18.62, 16.95, 18.25],
            "paid_2023": [19.29, 17.75, 18.35],
            "paid_2024": [20.73, 18.35, 20.10],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(
            columns=pb.col(pb.starts_with("paid") & pb.matches("2023|2024")),
            value=10
        )
        .interrogate()
    )

    validation
    ```

    In the above example the `col()` function contains the invocation of the
    [`starts_with()`](`pointblank.starts_with`) and [`matches()`](`pointblank.matches`) column
    selector functions, combined with the `&` operator. This is necessary to specify the set of
    columns that start with `"paid"` *and* match the text `"2023"` or `"2024"`.

    If you'd like to take advantage of Narwhals selectors, that's also possible. Here is an example
    of using the `numeric()` column selector function to select all numeric columns for validation,
    checking that their values are greater than `0`.

    ```python
    import narwhals.selectors as ncs

    tbl = pl.DataFrame(
        {
            "name": ["Alice", "Bob", "Charlie"],
            "hours_2022": [160, 180, 160],
            "hours_2023": [182, 168, 175],
            "hours_2024": [200, 165, 190],
            "paid_2022": [18.62, 16.95, 18.25],
            "paid_2023": [19.29, 17.75, 18.35],
            "paid_2024": [20.73, 18.35, 20.10],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_ge(columns=pb.col(ncs.numeric()), value=0)
        .interrogate()
    )

    validation
    ```

    In the above example the `col()` function contains the invocation of the `numeric()` column
    selector function from Narwhals. As with the other selectors, this is not strictly necessary
    when using a single column selector, so `columns=ncs.numeric()` would also be fine here.

    Narwhals selectors can also use operators to combine multiple selectors. Here is an example of
    using the `numeric()` and [`matches()`](`pointblank.matches`) selectors together to select all
    numeric columns that fit a specific pattern.

    ```python
    tbl = pl.DataFrame(
        {
            "name": ["Alice", "Bob", "Charlie"],
            "2022_status": ["ft", "ft", "pt"],
            "2023_status": ["ft", "pt", "ft"],
            "2024_status": ["ft", "pt", "ft"],
            "2022_pay_total": [18.62, 16.95, 18.25],
            "2023_pay_total": [19.29, 17.75, 18.35],
            "2024_pay_total": [20.73, 18.35, 20.10],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_lt(columns=pb.col(ncs.numeric() & ncs.matches("2023|2024")), value=30)
        .interrogate()
    )

    validation
    ```

    In the above example the `col()` function contains the invocation of the `numeric()` and
    [`matches()`](`pointblank.matches`) column selector functions from Narwhals, combined with the
    `&` operator. This is necessary to specify the set of columns that are numeric *and* match the
    text `"2023"` or `"2024"`.

    See Also
    --------
    Create a column expression for use in `conjointly()` validation with the
    [`expr_col()`](`pointblank.expr_col`) function.
    

starts_with(text: 'str', case_sensitive: 'bool' = False) -> 'StartsWith'

    Select columns that start with specified text.

    Many validation methods have a `columns=` argument that can be used to specify the columns for
    validation (e.g., [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`),
    [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`), etc.). The `starts_with()` selector
    function can be used to select one or more columns that start with some specified text. So if
    the set of table columns consists of

    `[name_first, name_last, age, address]`

    and you want to validate columns that start with `"name"`, you can use
    `columns=starts_with("name")`. This will select the `name_first` and `name_last` columns.

    There will be a validation step created for every resolved column. Note that if there aren't any
    columns resolved from using `starts_with()` (or any other expression using selector functions),
    the validation step will fail to be evaluated during the interrogation process. Such a failure
    to evaluate will be reported in the validation results but it won't affect the interrogation
    process overall (i.e., the process won't be halted).

    Parameters
    ----------
    text
        The text that the column name should start with.
    case_sensitive
        Whether column names should be treated as case-sensitive. The default is `False`.

    Returns
    -------
    StartsWith
        A `StartsWith` object, which can be used to select columns that start with the specified
        text.

    Relevant Validation Methods where `starts_with()` can be Used
    -------------------------------------------------------------
    This selector function can be used in the `columns=` argument of the following validation
    methods:

    - [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`)
    - [`col_vals_lt()`](`pointblank.Validate.col_vals_lt`)
    - [`col_vals_ge()`](`pointblank.Validate.col_vals_ge`)
    - [`col_vals_le()`](`pointblank.Validate.col_vals_le`)
    - [`col_vals_eq()`](`pointblank.Validate.col_vals_eq`)
    - [`col_vals_ne()`](`pointblank.Validate.col_vals_ne`)
    - [`col_vals_between()`](`pointblank.Validate.col_vals_between`)
    - [`col_vals_outside()`](`pointblank.Validate.col_vals_outside`)
    - [`col_vals_in_set()`](`pointblank.Validate.col_vals_in_set`)
    - [`col_vals_not_in_set()`](`pointblank.Validate.col_vals_not_in_set`)
    - [`col_vals_increasing()`](`pointblank.Validate.col_vals_increasing`)
    - [`col_vals_decreasing()`](`pointblank.Validate.col_vals_decreasing`)
    - [`col_vals_null()`](`pointblank.Validate.col_vals_null`)
    - [`col_vals_not_null()`](`pointblank.Validate.col_vals_not_null`)
    - [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`)
    - [`col_vals_within_spec()`](`pointblank.Validate.col_vals_within_spec`)
    - [`col_exists()`](`pointblank.Validate.col_exists`)

    The `starts_with()` selector function doesn't need to be used in isolation. Read the next
    section for information on how to compose it with other column selectors for more refined ways
    to select columns.

    Additional Flexibilty through Composition with Other Column Selectors
    ---------------------------------------------------------------------
    The `starts_with()` function can be composed with other column selectors to create fine-grained
    column selections. For example, to select columns that start with `"a"` and end with `"e"`, you
    can use the `starts_with()` and [`ends_with()`](`pointblank.ends_with`) functions together. The
    only condition is that the expressions are wrapped in the [`col()`](`pointblank.col`) function,
    like this:

    ```python
    col(starts_with("a") & ends_with("e"))
    ```

    There are four operators that can be used to compose column selectors:

    - `&` (*and*)
    - `|` (*or*)
    - `-` (*difference*)
    - `~` (*not*)

    The `&` operator is used to select columns that satisfy both conditions. The `|` operator is
    used to select columns that satisfy either condition. The `-` operator is used to select columns
    that satisfy the first condition but not the second. The `~` operator is used to select columns
    that don't satisfy the condition. As many selector functions can be used as needed and the
    operators can be combined to create complex column selection criteria (parentheses can be used
    to group conditions and control the order of evaluation).

    Examples
    --------
    Suppose we have a table with columns `name`, `paid_2021`, `paid_2022`, and `person_id` and
    we'd like to validate that the values in columns that start with `"paid"` are greater than `10`.
    We can use the `starts_with()` column selector function to specify the columns that start with
    `"paid"` as the columns to validate.

    ```python
    import pointblank as pb
    import polars as pl

    tbl = pl.DataFrame(
        {
            "name": ["Alice", "Bob", "Charlie"],
            "paid_2021": [16.32, 16.25, 15.75],
            "paid_2022": [18.62, 16.95, 18.25],
            "person_id": ["A123", "B456", "C789"],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(columns=pb.starts_with("paid"), value=10)
        .interrogate()
    )

    validation
    ```

    From the results of the validation table we get two validation steps, one for `paid_2021` and
    one for `paid_2022`. The values in both columns were all greater than `10`.

    We can also use the `starts_with()` function in combination with other column selectors (within
    [`col()`](`pointblank.col`)) to create more complex column selection criteria (i.e., to select
    columns that satisfy multiple conditions). For example, to select columns that start with
    `"paid"` and match the text `"2023"` or `"2024"`, we can use the `&` operator to combine column
    selectors.

    ```python
    tbl = pl.DataFrame(
        {
            "name": ["Alice", "Bob", "Charlie"],
            "hours_2022": [160, 180, 160],
            "hours_2023": [182, 168, 175],
            "hours_2024": [200, 165, 190],
            "paid_2022": [18.62, 16.95, 18.25],
            "paid_2023": [19.29, 17.75, 18.35],
            "paid_2024": [20.73, 18.35, 20.10],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(
            columns=pb.col(pb.starts_with("paid") & pb.matches("23|24")),
            value=10
        )
        .interrogate()
    )

    validation
    ```

    From the results of the validation table we get two validation steps, one for `paid_2023` and
    one for `paid_2024`.
    

ends_with(text: 'str', case_sensitive: 'bool' = False) -> 'EndsWith'

    Select columns that end with specified text.

    Many validation methods have a `columns=` argument that can be used to specify the columns for
    validation (e.g., [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`),
    [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`), etc.). The `ends_with()` selector
    function can be used to select one or more columns that end with some specified text. So if the
    set of table columns consists of

    `[first_name, last_name, age, address]`

    and you want to validate columns that end with `"name"`, you can use
    `columns=ends_with("name")`. This will select the `first_name` and `last_name` columns.

    There will be a validation step created for every resolved column. Note that if there aren't any
    columns resolved from using `ends_with()` (or any other expression using selector functions),
    the validation step will fail to be evaluated during the interrogation process. Such a failure
    to evaluate will be reported in the validation results but it won't affect the interrogation
    process overall (i.e., the process won't be halted).

    Parameters
    ----------
    text
        The text that the column name should end with.
    case_sensitive
        Whether column names should be treated as case-sensitive. The default is `False`.

    Returns
    -------
    EndsWith
        An `EndsWith` object, which can be used to select columns that end with the specified text.

    Relevant Validation Methods where `ends_with()` can be Used
    -----------------------------------------------------------
    This selector function can be used in the `columns=` argument of the following validation
    methods:

    - [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`)
    - [`col_vals_lt()`](`pointblank.Validate.col_vals_lt`)
    - [`col_vals_ge()`](`pointblank.Validate.col_vals_ge`)
    - [`col_vals_le()`](`pointblank.Validate.col_vals_le`)
    - [`col_vals_eq()`](`pointblank.Validate.col_vals_eq`)
    - [`col_vals_ne()`](`pointblank.Validate.col_vals_ne`)
    - [`col_vals_between()`](`pointblank.Validate.col_vals_between`)
    - [`col_vals_outside()`](`pointblank.Validate.col_vals_outside`)
    - [`col_vals_in_set()`](`pointblank.Validate.col_vals_in_set`)
    - [`col_vals_not_in_set()`](`pointblank.Validate.col_vals_not_in_set`)
    - [`col_vals_increasing()`](`pointblank.Validate.col_vals_increasing`)
    - [`col_vals_decreasing()`](`pointblank.Validate.col_vals_decreasing`)
    - [`col_vals_null()`](`pointblank.Validate.col_vals_null`)
    - [`col_vals_not_null()`](`pointblank.Validate.col_vals_not_null`)
    - [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`)
    - [`col_vals_within_spec()`](`pointblank.Validate.col_vals_within_spec`)
    - [`col_exists()`](`pointblank.Validate.col_exists`)

    The `ends_with()` selector function doesn't need to be used in isolation. Read the next section
    for information on how to compose it with other column selectors for more refined ways to select
    columns.

    Additional Flexibilty through Composition with Other Column Selectors
    ---------------------------------------------------------------------
    The `ends_with()` function can be composed with other column selectors to create fine-grained
    column selections. For example, to select columns that end with `"e"` and start with `"a"`, you
    can use the `ends_with()` and [`starts_with()`](`pointblank.starts_with`) functions together.
    The only condition is that the expressions are wrapped in the [`col()`](`pointblank.col`)
    function, like this:

    ```python
    col(ends_with("e") & starts_with("a"))
    ```

    There are four operators that can be used to compose column selectors:

    - `&` (*and*)
    - `|` (*or*)
    - `-` (*difference*)
    - `~` (*not*)

    The `&` operator is used to select columns that satisfy both conditions. The `|` operator is
    used to select columns that satisfy either condition. The `-` operator is used to select columns
    that satisfy the first condition but not the second. The `~` operator is used to select columns
    that don't satisfy the condition. As many selector functions can be used as needed and the
    operators can be combined to create complex column selection criteria (parentheses can be used
    to group conditions and control the order of evaluation).

    Examples
    --------
    Suppose we have a table with columns `name`, `2021_pay`, `2022_pay`, and `person_id` and
    we'd like to validate that the values in columns that end with `"pay"` are greater than `10`.
    We can use the `ends_with()` column selector function to specify the columns that end with
    `"pay"` as the columns to validate.

    ```python
    import pointblank as pb
    import polars as pl

    tbl = pl.DataFrame(
        {
            "name": ["Alice", "Bob", "Charlie"],
            "2021_pay": [16.32, 16.25, 15.75],
            "2022_pay": [18.62, 16.95, 18.25],
            "person_id": ["A123", "B456", "C789"],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(columns=pb.ends_with("pay"), value=10)
        .interrogate()
    )

    validation
    ```

    From the results of the validation table we get two validation steps, one for `2021_pay` and one
    for `2022_pay`. The values in both columns were all greater than `10`.

    We can also use the `ends_with()` function in combination with other column selectors (within
    [`col()`](`pointblank.col`)) to create more complex column selection criteria (i.e., to select
    columns that satisfy multiple conditions). For example, to select columns that end with `"pay"`
    and match the text `"2023"` or `"2024"`, we can use the `&` operator to combine column
    selectors.

    ```python
    tbl = pl.DataFrame(
        {
            "name": ["Alice", "Bob", "Charlie"],
            "2022_hours": [160, 180, 160],
            "2023_hours": [182, 168, 175],
            "2024_hours": [200, 165, 190],
            "2022_pay": [18.62, 16.95, 18.25],
            "2023_pay": [19.29, 17.75, 18.35],
            "2024_pay": [20.73, 18.35, 20.10],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(
            columns=pb.col(pb.ends_with("pay") & pb.matches("2023|2024")),
            value=10
        )
        .interrogate()
    )

    validation
    ```

    From the results of the validation table we get two validation steps, one for `2023_pay` and one
    for `2024_pay`.
    

contains(text: 'str', case_sensitive: 'bool' = False) -> 'Contains'

    Select columns that contain specified text.

    Many validation methods have a `columns=` argument that can be used to specify the columns for
    validation (e.g., [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`),
    [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`), etc.). The `contains()` selector
    function can be used to select one or more columns that contain some specified text. So if the
    set of table columns consists of

    `[profit, conv_first, conv_last, highest_conv, age]`

    and you want to validate columns that have `"conv"` in the name, you can use
    `columns=contains("conv")`. This will select the `conv_first`, `conv_last`, and `highest_conv`
    columns.

    There will be a validation step created for every resolved column. Note that if there aren't any
    columns resolved from using `contains()` (or any other expression using selector functions), the
    validation step will fail to be evaluated during the interrogation process. Such a failure to
    evaluate will be reported in the validation results but it won't affect the interrogation
    process overall (i.e., the process won't be halted).

    Parameters
    ----------
    text
        The text that the column name should contain.
    case_sensitive
        Whether column names should be treated as case-sensitive. The default is `False`.

    Returns
    -------
    Contains
        A `Contains` object, which can be used to select columns that contain the specified text.

    Relevant Validation Methods where `contains()` can be Used
    ----------------------------------------------------------
    This selector function can be used in the `columns=` argument of the following validation
    methods:

    - [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`)
    - [`col_vals_lt()`](`pointblank.Validate.col_vals_lt`)
    - [`col_vals_ge()`](`pointblank.Validate.col_vals_ge`)
    - [`col_vals_le()`](`pointblank.Validate.col_vals_le`)
    - [`col_vals_eq()`](`pointblank.Validate.col_vals_eq`)
    - [`col_vals_ne()`](`pointblank.Validate.col_vals_ne`)
    - [`col_vals_between()`](`pointblank.Validate.col_vals_between`)
    - [`col_vals_outside()`](`pointblank.Validate.col_vals_outside`)
    - [`col_vals_in_set()`](`pointblank.Validate.col_vals_in_set`)
    - [`col_vals_not_in_set()`](`pointblank.Validate.col_vals_not_in_set`)
    - [`col_vals_increasing()`](`pointblank.Validate.col_vals_increasing`)
    - [`col_vals_decreasing()`](`pointblank.Validate.col_vals_decreasing`)
    - [`col_vals_null()`](`pointblank.Validate.col_vals_null`)
    - [`col_vals_not_null()`](`pointblank.Validate.col_vals_not_null`)
    - [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`)
    - [`col_vals_within_spec()`](`pointblank.Validate.col_vals_within_spec`)
    - [`col_exists()`](`pointblank.Validate.col_exists`)

    The `contains()` selector function doesn't need to be used in isolation. Read the next section
    for information on how to compose it with other column selectors for more refined ways to select
    columns.

    Additional Flexibilty through Composition with Other Column Selectors
    ---------------------------------------------------------------------
    The `contains()` function can be composed with other column selectors to create fine-grained
    column selections. For example, to select columns that have the text `"_n"` and start with
    `"item"`, you can use the `contains()` and [`starts_with()`](`pointblank.starts_with`) functions
    together. The only condition is that the expressions are wrapped in the
    [`col()`](`pointblank.col`) function, like this:

    ```python
    col(contains("_n") & starts_with("item"))
    ```

    There are four operators that can be used to compose column selectors:

    - `&` (*and*)
    - `|` (*or*)
    - `-` (*difference*)
    - `~` (*not*)

    The `&` operator is used to select columns that satisfy both conditions. The `|` operator is
    used to select columns that satisfy either condition. The `-` operator is used to select columns
    that satisfy the first condition but not the second. The `~` operator is used to select columns
    that don't satisfy the condition. As many selector functions can be used as needed and the
    operators can be combined to create complex column selection criteria (parentheses can be used
    to group conditions and control the order of evaluation).

    Examples
    --------
    Suppose we have a table with columns `name`, `2021_pay_total`, `2022_pay_total`, and `person_id`
    and we'd like to validate that the values in columns having `"pay"` in the name are greater than
    `10`. We can use the `contains()` column selector function to specify the column names that
    contain `"pay"` as the columns to validate.

    ```python
    import pointblank as pb
    import polars as pl

    tbl = pl.DataFrame(
        {
            "name": ["Alice", "Bob", "Charlie"],
            "2021_pay_total": [16.32, 16.25, 15.75],
            "2022_pay_total": [18.62, 16.95, 18.25],
            "person_id": ["A123", "B456", "C789"],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(columns=pb.contains("pay"), value=10)
        .interrogate()
    )

    validation
    ```

    From the results of the validation table we get two validation steps, one for `2021_pay_total`
    and one for `2022_pay_total`. The values in both columns were all greater than `10`.

    We can also use the `contains()` function in combination with other column selectors (within
    [`col()`](`pointblank.col`)) to create more complex column selection criteria (i.e., to select
    columns that satisfy multiple conditions). For example, to select columns that contain `"pay"`
    and match the text `"2023"` or `"2024"`, we can use the `&` operator to combine column
    selectors.

    ```python
    tbl = pl.DataFrame(
        {
            "name": ["Alice", "Bob", "Charlie"],
            "2022_hours": [160, 180, 160],
            "2023_hours": [182, 168, 175],
            "2024_hours": [200, 165, 190],
            "2022_pay_total": [18.62, 16.95, 18.25],
            "2023_pay_total": [19.29, 17.75, 18.35],
            "2024_pay_total": [20.73, 18.35, 20.10],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(
            columns=pb.col(pb.contains("pay") & pb.matches("2023|2024")),
            value=10
        )
        .interrogate()
    )

    validation
    ```

    From the results of the validation table we get two validation steps, one for `2023_pay_total`
    and one for `2024_pay_total`.
    

matches(pattern: 'str', case_sensitive: 'bool' = False) -> 'Matches'

    Select columns that match a specified regular expression pattern.

    Many validation methods have a `columns=` argument that can be used to specify the columns for
    validation (e.g., [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`),
    [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`), etc.). The `matches()` selector
    function can be used to select one or more columns matching a provided regular expression
    pattern. So if the set of table columns consists of

    `[rev_01, rev_02, profit_01, profit_02, age]`

    and you want to validate columns that have two digits at the end of the name, you can use
    `columns=matches(r"[0-9]{2}$")`. This will select the `rev_01`, `rev_02`, `profit_01`, and
    `profit_02` columns.

    There will be a validation step created for every resolved column. Note that if there aren't any
    columns resolved from using `matches()` (or any other expression using selector functions), the
    validation step will fail to be evaluated during the interrogation process. Such a failure to
    evaluate will be reported in the validation results but it won't affect the interrogation
    process overall (i.e., the process won't be halted).

    Parameters
    ----------
    pattern
        The regular expression pattern that the column name should match.
    case_sensitive
        Whether column names should be treated as case-sensitive. The default is `False`.

    Returns
    -------
    Matches
        A `Matches` object, which can be used to select columns that match the specified pattern.

    Relevant Validation Methods where `matches()` can be Used
    ---------------------------------------------------------
    This selector function can be used in the `columns=` argument of the following validation
    methods:

    - [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`)
    - [`col_vals_lt()`](`pointblank.Validate.col_vals_lt`)
    - [`col_vals_ge()`](`pointblank.Validate.col_vals_ge`)
    - [`col_vals_le()`](`pointblank.Validate.col_vals_le`)
    - [`col_vals_eq()`](`pointblank.Validate.col_vals_eq`)
    - [`col_vals_ne()`](`pointblank.Validate.col_vals_ne`)
    - [`col_vals_between()`](`pointblank.Validate.col_vals_between`)
    - [`col_vals_outside()`](`pointblank.Validate.col_vals_outside`)
    - [`col_vals_in_set()`](`pointblank.Validate.col_vals_in_set`)
    - [`col_vals_not_in_set()`](`pointblank.Validate.col_vals_not_in_set`)
    - [`col_vals_increasing()`](`pointblank.Validate.col_vals_increasing`)
    - [`col_vals_decreasing()`](`pointblank.Validate.col_vals_decreasing`)
    - [`col_vals_null()`](`pointblank.Validate.col_vals_null`)
    - [`col_vals_not_null()`](`pointblank.Validate.col_vals_not_null`)
    - [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`)
    - [`col_vals_within_spec()`](`pointblank.Validate.col_vals_within_spec`)
    - [`col_exists()`](`pointblank.Validate.col_exists`)

    The `matches()` selector function doesn't need to be used in isolation. Read the next section
    for information on how to compose it with other column selectors for more refined ways to select
    columns.

    Additional Flexibilty through Composition with Other Column Selectors
    ---------------------------------------------------------------------
    The `matches()` function can be composed with other column selectors to create fine-grained
    column selections. For example, to select columns that have the text starting with five digits
    and end with `"_id"`, you can use the `matches()` and [`ends_with()`](`pointblank.ends_with`)
    functions together. The only condition is that the expressions are wrapped in the
    [`col()`](`pointblank.col`) function, like this:

    ```python
    col(matches(r"^[0-9]{5}") & ends_with("_id"))
    ```

    There are four operators that can be used to compose column selectors:

    - `&` (*and*)
    - `|` (*or*)
    - `-` (*difference*)
    - `~` (*not*)

    The `&` operator is used to select columns that satisfy both conditions. The `|` operator is
    used to select columns that satisfy either condition. The `-` operator is used to select columns
    that satisfy the first condition but not the second. The `~` operator is used to select columns
    that don't satisfy the condition. As many selector functions can be used as needed and the
    operators can be combined to create complex column selection criteria (parentheses can be used
    to group conditions and control the order of evaluation).

    Examples
    --------
    Suppose we have a table with columns `name`, `id_old`, `new_identifier`, and `pay_2021` and we'd
    like to validate that text values in columns having `"id"` or `"identifier"` in the name have a
    specific syntax. We can use the `matches()` column selector function to specify the columns that
    match the pattern.

    ```python
    import pointblank as pb
    import polars as pl

    tbl = pl.DataFrame(
        {
            "name": ["Alice", "Bob", "Charlie"],
            "id_old": ["ID0021", "ID0032", "ID0043"],
            "new_identifier": ["ID9054", "ID9065", "ID9076"],
            "pay_2021": [16.32, 16.25, 15.75],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_regex(columns=pb.matches("id|identifier"), pattern=r"ID[0-9]{4}")
        .interrogate()
    )

    validation
    ```

    From the results of the validation table we get two validation steps, one for `id_old` and one
    for `new_identifier`. The values in both columns all match the pattern `"ID[0-9]{4}"`.

    We can also use the `matches()` function in combination with other column selectors (within
    [`col()`](`pointblank.col`)) to create more complex column selection criteria (i.e., to select
    columns that satisfy multiple conditions). For example, to select columns that contain `"pay"`
    and match the text `"2023"` or `"2024"`, we can use the `&` operator to combine column
    selectors.

    ```python
    tbl = pl.DataFrame(
        {
            "name": ["Alice", "Bob", "Charlie"],
            "2022_hours": [160, 180, 160],
            "2023_hours": [182, 168, 175],
            "2024_hours": [200, 165, 190],
            "2022_pay_total": [18.62, 16.95, 18.25],
            "2023_pay_total": [19.29, 17.75, 18.35],
            "2024_pay_total": [20.73, 18.35, 20.10],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(
            columns=pb.col(pb.contains("pay") & pb.matches("2023|2024")),
            value=10
        )
        .interrogate()
    )

    validation
    ```

    From the results of the validation table we get two validation steps, one for `2023_pay_total`
    and one for `2024_pay_total`.
    

everything() -> 'Everything'

    Select all columns.

    Many validation methods have a `columns=` argument that can be used to specify the columns for
    validation (e.g., [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`),
    [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`), etc.). The `everything()` selector
    function can be used to select every column in the table. If you have a table with six columns
    and they're all suitable for a specific type of validation, you can use `columns=everything())`
    and all six columns will be selected for validation.

    Returns
    -------
    Everything
        An `Everything` object, which can be used to select all columns.

    Relevant Validation Methods where `everything()` can be Used
    ------------------------------------------------------------
    This selector function can be used in the `columns=` argument of the following validation
    methods:

    - [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`)
    - [`col_vals_lt()`](`pointblank.Validate.col_vals_lt`)
    - [`col_vals_ge()`](`pointblank.Validate.col_vals_ge`)
    - [`col_vals_le()`](`pointblank.Validate.col_vals_le`)
    - [`col_vals_eq()`](`pointblank.Validate.col_vals_eq`)
    - [`col_vals_ne()`](`pointblank.Validate.col_vals_ne`)
    - [`col_vals_between()`](`pointblank.Validate.col_vals_between`)
    - [`col_vals_outside()`](`pointblank.Validate.col_vals_outside`)
    - [`col_vals_in_set()`](`pointblank.Validate.col_vals_in_set`)
    - [`col_vals_not_in_set()`](`pointblank.Validate.col_vals_not_in_set`)
    - [`col_vals_increasing()`](`pointblank.Validate.col_vals_increasing`)
    - [`col_vals_decreasing()`](`pointblank.Validate.col_vals_decreasing`)
    - [`col_vals_null()`](`pointblank.Validate.col_vals_null`)
    - [`col_vals_not_null()`](`pointblank.Validate.col_vals_not_null`)
    - [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`)
    - [`col_vals_within_spec()`](`pointblank.Validate.col_vals_within_spec`)
    - [`col_exists()`](`pointblank.Validate.col_exists`)

    The `everything()` selector function doesn't need to be used in isolation. Read the next section
    for information on how to compose it with other column selectors for more refined ways to select
    columns.

    Additional Flexibilty through Composition with Other Column Selectors
    ---------------------------------------------------------------------
    The `everything()` function can be composed with other column selectors to create fine-grained
    column selections. For example, to select all column names except those having starting with
    "id_", you can use the `everything()` and [`starts_with()`](`pointblank.starts_with`)
    functions together. The only condition is that the expressions are wrapped in the
    [`col()`](`pointblank.col`) function, like this:

    ```python
    col(everything() - starts_with("id_"))
    ```

    There are four operators that can be used to compose column selectors:

    - `&` (*and*)
    - `|` (*or*)
    - `-` (*difference*)
    - `~` (*not*)

    The `&` operator is used to select columns that satisfy both conditions. The `|` operator is
    used to select columns that satisfy either condition. The `-` operator is used to select columns
    that satisfy the first condition but not the second. The `~` operator is used to select columns
    that don't satisfy the condition. As many selector functions can be used as needed and the
    operators can be combined to create complex column selection criteria (parentheses can be used
    to group conditions and control the order of evaluation).

    Examples
    --------
    Suppose we have a table with several numeric columns and we'd like to validate that all these
    columns have less than `1000`. We can use the `everything()` column selector function to select
    all columns for validation.

    ```python
    import pointblank as pb
    import polars as pl

    tbl = pl.DataFrame(
        {
            "2023_hours": [182, 168, 175],
            "2024_hours": [200, 165, 190],
            "2023_pay_total": [19.29, 17.75, 18.35],
            "2024_pay_total": [20.73, 18.35, 20.10],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_lt(columns=pb.everything(), value=1000)
        .interrogate()
    )

    validation
    ```

    From the results of the validation table we get four validation steps, one each column in the
    table. The values in every column were all lower than `1000`.

    We can also use the `everything()` function in combination with other column selectors (within
    [`col()`](`pointblank.col`)) to create more complex column selection criteria (i.e., to select
    columns that satisfy multiple conditions). For example, to select every column except those that
    begin with `"2023"` we can use the `-` operator to combine column selectors.

    ```python
    tbl = pl.DataFrame(
        {
            "2023_hours": [182, 168, 175],
            "2024_hours": [200, 165, 190],
            "2023_pay_total": [19.29, 17.75, 18.35],
            "2024_pay_total": [20.73, 18.35, 20.10],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_lt(columns=pb.col(pb.everything() - pb.starts_with("2023")), value=1000)
        .interrogate()
    )

    validation
    ```

    From the results of the validation table we get two validation steps, one for `2024_hours` and
    one for `2024_pay_total`.
    

first_n(n: 'int', offset: 'int' = 0) -> 'FirstN'

    Select the first `n` columns in the column list.

    Many validation methods have a `columns=` argument that can be used to specify the columns for
    validation (e.g., [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`),
    [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`), etc.). The `first_n()` selector
    function can be used to select *n* columns positioned at the start of the column list. So if the
    set of table columns consists of

    `[rev_01, rev_02, profit_01, profit_02, age]`

    and you want to validate the first two columns, you can use `columns=first_n(2)`. This will
    select the `rev_01` and `rev_02` columns and a validation step will be created for each.

    The `offset=` parameter can be used to skip a certain number of columns from the start of the
    column list. So if you want to select the third and fourth columns, you can use
    `columns=first_n(2, offset=2)`.

    Parameters
    ----------
    n
        The number of columns to select from the start of the column list. Should be a positive
        integer value. If `n` is greater than the number of columns in the table, all columns will
        be selected.
    offset
        The offset from the start of the column list. The default is `0`. If `offset` is greater
        than the number of columns in the table, no columns will be selected.

    Returns
    -------
    FirstN
        A `FirstN` object, which can be used to select the first `n` columns.

    Relevant Validation Methods where `first_n()` can be Used
    ---------------------------------------------------------
    This selector function can be used in the `columns=` argument of the following validation
    methods:

    - [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`)
    - [`col_vals_lt()`](`pointblank.Validate.col_vals_lt`)
    - [`col_vals_ge()`](`pointblank.Validate.col_vals_ge`)
    - [`col_vals_le()`](`pointblank.Validate.col_vals_le`)
    - [`col_vals_eq()`](`pointblank.Validate.col_vals_eq`)
    - [`col_vals_ne()`](`pointblank.Validate.col_vals_ne`)
    - [`col_vals_between()`](`pointblank.Validate.col_vals_between`)
    - [`col_vals_outside()`](`pointblank.Validate.col_vals_outside`)
    - [`col_vals_in_set()`](`pointblank.Validate.col_vals_in_set`)
    - [`col_vals_not_in_set()`](`pointblank.Validate.col_vals_not_in_set`)
    - [`col_vals_increasing()`](`pointblank.Validate.col_vals_increasing`)
    - [`col_vals_decreasing()`](`pointblank.Validate.col_vals_decreasing`)
    - [`col_vals_null()`](`pointblank.Validate.col_vals_null`)
    - [`col_vals_not_null()`](`pointblank.Validate.col_vals_not_null`)
    - [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`)
    - [`col_vals_within_spec()`](`pointblank.Validate.col_vals_within_spec`)
    - [`col_exists()`](`pointblank.Validate.col_exists`)

    The `first_n()` selector function doesn't need to be used in isolation. Read the next section
    for information on how to compose it with other column selectors for more refined ways to select
    columns.

    Additional Flexibilty through Composition with Other Column Selectors
    ---------------------------------------------------------------------
    The `first_n()` function can be composed with other column selectors to create fine-grained
    column selections. For example, to select all column names starting with "rev" along with the
    first two columns, you can use the `first_n()` and [`starts_with()`](`pointblank.starts_with`)
    functions together. The only condition is that the expressions are wrapped in the
    [`col()`](`pointblank.col`) function, like this:

    ```python
    col(first_n(2) | starts_with("rev"))
    ```

    There are four operators that can be used to compose column selectors:

    - `&` (*and*)
    - `|` (*or*)
    - `-` (*difference*)
    - `~` (*not*)

    The `&` operator is used to select columns that satisfy both conditions. The `|` operator is
    used to select columns that satisfy either condition. The `-` operator is used to select columns
    that satisfy the first condition but not the second. The `~` operator is used to select columns
    that don't satisfy the condition. As many selector functions can be used as needed and the
    operators can be combined to create complex column selection criteria (parentheses can be used
    to group conditions and control the order of evaluation).

    Examples
    --------
    Suppose we have a table with columns `paid_2021`, `paid_2022`, `paid_2023`, `paid_2024`, and
    `name` and we'd like to validate that the values in the first four columns are greater than
    `10`. We can use the `first_n()` column selector function to specify that the first four columns
    in the table are the columns to validate.

    ```python
    import pointblank as pb
    import polars as pl

    tbl = pl.DataFrame(
        {
            "paid_2021": [17.94, 16.55, 17.85],
            "paid_2022": [18.62, 16.95, 18.25],
            "paid_2023": [19.29, 17.75, 18.35],
            "paid_2024": [20.73, 18.35, 20.10],
            "name": ["Alice", "Bob", "Charlie"],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(columns=pb.first_n(4), value=10)
        .interrogate()
    )

    validation
    ```

    From the results of the validation table we get four validation steps. The values in all those
    columns were all greater than `10`.

    We can also use the `first_n()` function in combination with other column selectors (within
    [`col()`](`pointblank.col`)) to create more complex column selection criteria (i.e., to select
    columns that satisfy multiple conditions). For example, to select the first four columns but
    also omit those columns that end with `"2023"`, we can use the `-` operator to combine column
    selectors.

    ```python
    tbl = pl.DataFrame(
        {
            "paid_2021": [17.94, 16.55, 17.85],
            "paid_2022": [18.62, 16.95, 18.25],
            "paid_2023": [19.29, 17.75, 18.35],
            "paid_2024": [20.73, 18.35, 20.10],
            "name": ["Alice", "Bob", "Charlie"],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(columns=pb.col(pb.first_n(4) - pb.ends_with("2023")), value=10)
        .interrogate()
    )

    validation
    ```

    From the results of the validation table we get three validation steps, one for `paid_2021`,
    `paid_2022`, and `paid_2024`.
    

last_n(n: 'int', offset: 'int' = 0) -> 'LastN'

    Select the last `n` columns in the column list.

    Many validation methods have a `columns=` argument that can be used to specify the columns for
    validation (e.g., [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`),
    [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`), etc.). The `last_n()` selector
    function can be used to select *n* columns positioned at the end of the column list. So if the
    set of table columns consists of

    `[age, rev_01, rev_02, profit_01, profit_02]`

    and you want to validate the last two columns, you can use `columns=last_n(2)`. This will select
    the `profit_01` and `profit_02` columns and a validation step will be created for each.

    The `offset=` parameter can be used to skip a certain number of columns from the end of the
    column list. So if you want to select the third and fourth columns from the end, you can use
    `columns=last_n(2, offset=2)`.

    Parameters
    ----------
    n
        The number of columns to select from the end of the column list. Should be a positive
        integer value. If `n` is greater than the number of columns in the table, all columns will
        be selected.
    offset
        The offset from the end of the column list. The default is `0`. If `offset` is greater than
        the number of columns in the table, no columns will be selected.

    Returns
    -------
    LastN
        A `LastN` object, which can be used to select the last `n` columns.

    Relevant Validation Methods where `last_n()` can be Used
    --------------------------------------------------------
    This selector function can be used in the `columns=` argument of the following validation
    methods:

    - [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`)
    - [`col_vals_lt()`](`pointblank.Validate.col_vals_lt`)
    - [`col_vals_ge()`](`pointblank.Validate.col_vals_ge`)
    - [`col_vals_le()`](`pointblank.Validate.col_vals_le`)
    - [`col_vals_eq()`](`pointblank.Validate.col_vals_eq`)
    - [`col_vals_ne()`](`pointblank.Validate.col_vals_ne`)
    - [`col_vals_between()`](`pointblank.Validate.col_vals_between`)
    - [`col_vals_outside()`](`pointblank.Validate.col_vals_outside`)
    - [`col_vals_in_set()`](`pointblank.Validate.col_vals_in_set`)
    - [`col_vals_not_in_set()`](`pointblank.Validate.col_vals_not_in_set`)
    - [`col_vals_increasing()`](`pointblank.Validate.col_vals_increasing`)
    - [`col_vals_decreasing()`](`pointblank.Validate.col_vals_decreasing`)
    - [`col_vals_null()`](`pointblank.Validate.col_vals_null`)
    - [`col_vals_not_null()`](`pointblank.Validate.col_vals_not_null`)
    - [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`)
    - [`col_vals_within_spec()`](`pointblank.Validate.col_vals_within_spec`)
    - [`col_exists()`](`pointblank.Validate.col_exists`)

    The `last_n()` selector function doesn't need to be used in isolation. Read the next section for
    information on how to compose it with other column selectors for more refined ways to select
    columns.

    Additional Flexibilty through Composition with Other Column Selectors
    ---------------------------------------------------------------------
    The `last_n()` function can be composed with other column selectors to create fine-grained
    column selections. For example, to select all column names starting with "rev" along with the
    last two columns, you can use the `last_n()` and [`starts_with()`](`pointblank.starts_with`)
    functions together. The only condition is that the expressions are wrapped in the
    [`col()`](`pointblank.col`) function, like this:

    ```python
    col(last_n(2) | starts_with("rev"))
    ```

    There are four operators that can be used to compose column selectors:

    - `&` (*and*)
    - `|` (*or*)
    - `-` (*difference*)
    - `~` (*not*)

    The `&` operator is used to select columns that satisfy both conditions. The `|` operator is
    used to select columns that satisfy either condition. The `-` operator is used to select columns
    that satisfy the first condition but not the second. The `~` operator is used to select columns
    that don't satisfy the condition. As many selector functions can be used as needed and the
    operators can be combined to create complex column selection criteria (parentheses can be used
    to group conditions and control the order of evaluation).

    Examples
    --------
    Suppose we have a table with columns `name`, `paid_2021`, `paid_2022`, `paid_2023`, and
    `paid_2024` and we'd like to validate that the values in the last four columns are greater than
    `10`. We can use the `last_n()` column selector function to specify that the last four columns
    in the table are the columns to validate.

    ```python
    import pointblank as pb
    import polars as pl

    tbl = pl.DataFrame(
        {
            "name": ["Alice", "Bob", "Charlie"],
            "paid_2021": [17.94, 16.55, 17.85],
            "paid_2022": [18.62, 16.95, 18.25],
            "paid_2023": [19.29, 17.75, 18.35],
            "paid_2024": [20.73, 18.35, 20.10],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(columns=pb.last_n(4), value=10)
        .interrogate()
    )

    validation
    ```

    From the results of the validation table we get four validation steps. The values in all those
    columns were all greater than `10`.

    We can also use the `last_n()` function in combination with other column selectors (within
    [`col()`](`pointblank.col`)) to create more complex column selection criteria (i.e., to select
    columns that satisfy multiple conditions). For example, to select the last four columns but also
    omit those columns that end with `"2023"`, we can use the `-` operator to combine column
    selectors.

    ```python
    tbl = pl.DataFrame(
        {
            "name": ["Alice", "Bob", "Charlie"],
            "paid_2021": [17.94, 16.55, 17.85],
            "paid_2022": [18.62, 16.95, 18.25],
            "paid_2023": [19.29, 17.75, 18.35],
            "paid_2024": [20.73, 18.35, 20.10],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(columns=pb.col(pb.last_n(4) - pb.ends_with("2023")), value=10)
        .interrogate()
    )

    validation
    ```

    From the results of the validation table we get three validation steps, one for `paid_2021`,
    `paid_2022`, and `paid_2024`.
    

expr_col(column_name: 'str') -> 'ColumnExpression'

    Create a column expression for use in `conjointly()` validation.

    This function returns a ColumnExpression object that supports operations like `>`, `<`, `+`,
    etc. for use in [`conjointly()`](`pointblank.Validate.conjointly`) validation expressions.

    Parameters
    ----------
    column_name
        The name of the column to reference.

    Returns
    -------
    ColumnExpression
        A column expression that can be used in comparisons and operations.

    Examples
    --------
    Let's say we have a table with three columns: `a`, `b`, and `c`. We want to validate that:

    - The values in column `a` are greater than `2`.
    - The values in column `b` are less than `7`.
    - The sum of columns `a` and `b` is less than the values in column `c`.

    We can use the `expr_col()` function to create a column expression for each of these conditions.

    ```python
    import pointblank as pb
    import polars as pl

    tbl = pl.DataFrame(
        {
            "a": [5, 7, 1, 3, 9, 4],
            "b": [6, 3, 0, 5, 8, 2],
            "c": [10, 4, 8, 9, 10, 5],
        }
    )

    # Using expr_col() to create backend-agnostic validation expressions
    validation = (
        pb.Validate(data=tbl)
        .conjointly(
            lambda df: pb.expr_col("a") > 2,
            lambda df: pb.expr_col("b") < 7,
            lambda df: pb.expr_col("a") + pb.expr_col("b") < pb.expr_col("c")
        )
        .interrogate()
    )

    validation
    ```

    The above code creates a validation object that checks the specified conditions using the
    `expr_col()` function. The resulting validation table will show whether each condition was
    satisfied for each row in the table.

    See Also
    --------
    The [`conjointly()`](`pointblank.Validate.conjointly`) validation method, which is where this
    function should be used.
    

## The Segments family

Combine multiple values into a single segment using `seg_*()` helper functions.

seg_group(values: 'list[Any]') -> 'Segment'

    Group together values for segmentation.

    Many validation methods have a `segments=` argument that can be used to specify one or more
    columns, or certain values within a column, to create segments for validation (e.g.,
    [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`),
    [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`), etc.). When passing in a column, or
    a tuple with a column and certain values, a segment will be created for each individual value
    within the column or given values. The `seg_group()` selector enables values to be grouped
    together into a segment. For example, if you were to create a segment for a column "region",
    investigating just "North" and "South" regions, a typical segment would look like:

    `segments=("region", ["North", "South"])`

    This would create two validation steps, one for each of the regions. If you wanted to group
    these two regions into a single segment, you could use the `seg_group()` function like this:

    `segments=("region", pb.seg_group(["North", "South"]))`

    You could create a second segment for "East" and "West" regions like this:

    `segments=("region", pb.seg_group([["North", "South"], ["East", "West"]]))`

    There will be a validation step created for every segment. Note that if there aren't any
    segments created using `seg_group()` (or any other segment expression), the validation step will
    fail to be evaluated during the interrogation process. Such a failure to evaluate will be
    reported in the validation results but it won't affect the interrogation process overall
    (i.e., the process won't be halted).

    Parameters
    ----------
    values
        A list of values to be grouped into a segment. This can be a single list or a list of lists.

    Returns
    -------
    Segment
        A `Segment` object, which can be used to combine values into a segment.

    Examples
    --------
    Let's say we're analyzing sales from our local bookstore, and want to check the number of books
    sold for the month exceeds a certain threshold. We could pass in the argument
    `segments="genre"`, which would return a segment for each unique genre in the datasets. We could
    also pass in `segments=("genre", ["Fantasy", "Science Fiction"])`, to only create segments for
    those two genres. However, if we wanted to group these two genres into a single segment, we
    could use the `seg_group()` function.

    ```python
    import pointblank as pb
    import polars as pl

    tbl = pl.DataFrame(
        {
            "title": [
                "The Hobbit",
                "Harry Potter and the Sorcerer's Stone",
                "The Lord of the Rings",
                "A Game of Thrones",
                "The Name of the Wind",
                "The Girl with the Dragon Tattoo",
                "The Da Vinci Code",
                "The Hitchhiker's Guide to the Galaxy",
                "The Martian",
                "Brave New World"
            ],
            "genre": [
                "Fantasy",
                "Fantasy",
                "Fantasy",
                "Fantasy",
                "Fantasy",
                "Mystery",
                "Mystery",
                "Science Fiction",
                "Science Fiction",
                "Science Fiction",
            ],
            "units_sold": [875, 932, 756, 623, 445, 389, 678, 534, 712, 598],
        }
    )

    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(
            columns="units_sold",
            value=500,
            segments=("genre", pb.seg_group(["Fantasy", "Science Fiction"]))
        )
        .interrogate()
    )

    validation
    ```

    What's more, we can create multiple segments, combining the genres in different ways.

    ```python
    validation = (
        pb.Validate(data=tbl)
        .col_vals_gt(
            columns="units_sold",
            value=500,
            segments=("genre", pb.seg_group([
                ["Fantasy", "Science Fiction"],
                ["Fantasy", "Mystery"],
                ["Mystery", "Science Fiction"]
            ]))
        )
        .interrogate()
    )

    validation
    ```

    
## The Interrogation and Reporting family

The validation plan is put into action when `interrogate()` is called.
The workflow for performing a comprehensive validation is then: (1) `Validate()`, (2) adding
validation steps, (3) `interrogate()`. After interrogation of the data, we can view a validation
report table (by printing the object or using `get_tabular_report()`), extract key metrics, or we
can split the data based on the validation results (with `get_sundered_data()`).

interrogate(self, collect_extracts: 'bool' = True, collect_tbl_checked: 'bool' = True, get_first_n: 'int | None' = None, sample_n: 'int | None' = None, sample_frac: 'int | float | None' = None, extract_limit: 'int' = 500) -> 'Validate'

        Execute each validation step against the table and store the results.

        When a validation plan has been set with a series of validation steps, the interrogation
        process through `interrogate()` should then be invoked. Interrogation will evaluate each
        validation step against the table and store the results.

        The interrogation process will collect extracts of failing rows if the `collect_extracts=`
        option is set to `True` (the default). We can control the number of rows collected using the
        `get_first_n=`, `sample_n=`, and `sample_frac=` options. The `extract_limit=` option will
        enforce a hard limit on the number of rows collected when `collect_extracts=True`.

        After interrogation is complete, the `Validate` object will have gathered information, and
        we can use methods like [`n_passed()`](`pointblank.Validate.n_passed`),
        [`f_failed()`](`pointblank.Validate.f_failed`), etc., to understand how the table performed
        against the validation plan. A visual representation of the validation results can be viewed
        by printing the `Validate` object; this will display the validation table in an HTML viewing
        environment.

        Parameters
        ----------
        collect_extracts
            An option to collect rows of the input table that didn't pass a particular validation
            step. The default is `True` and further options (i.e., `get_first_n=`, `sample_*=`)
            allow for fine control of how these rows are collected.
        collect_tbl_checked
            The processed data frames produced by executing the validation steps is collected and
            stored in the `Validate` object if `collect_tbl_checked=True`. This information is
            necessary for some methods (e.g.,
            [`get_sundered_data()`](`pointblank.Validate.get_sundered_data`)), but it can
            potentially make the object grow to a large size. To opt out of attaching this data, set
            this to `False`.
        get_first_n
            If the option to collect rows where test units is chosen, there is the option here to
            collect the first `n` rows. Supply an integer number of rows to extract from the top of
            subset table containing non-passing rows (the ordering of data from the original table
            is retained).
        sample_n
            If the option to collect non-passing rows is chosen, this option allows for the
            sampling of `n` rows. Supply an integer number of rows to sample from the subset table.
            If `n` happens to be greater than the number of non-passing rows, then all such rows
            will be returned.
        sample_frac
            If the option to collect non-passing rows is chosen, this option allows for the sampling
            of a fraction of those rows. Provide a number in the range of `0` and `1`. The number of
            rows to return could be very large, however, the `extract_limit=` option will apply a
            hard limit to the returned rows.
        extract_limit
            A value that limits the possible number of rows returned when extracting non-passing
            rows. The default is `500` rows. This limit is applied after any sampling or limiting
            options are applied. If the number of rows to be returned is greater than this limit,
            then the number of rows returned will be limited to this value. This is useful for
            preventing the collection of too many rows when the number of non-passing rows is very
            large.

        Returns
        -------
        Validate
            The `Validate` object with the results of the interrogation.

        Examples
        --------
        Let's use a built-in dataset (`"game_revenue"`) to demonstrate some of the options of the
        interrogation process. A series of validation steps will populate our validation plan. After
        setting up the plan, the next step is to interrogate the table and see how well it aligns
        with our expectations. We'll use the `get_first_n=` option so that any extracts of failing
        rows are limited to the first `n` rows.

        ```python
        import pointblank as pb
        import polars as pl

        validation = (
            pb.Validate(data=pb.load_dataset(dataset="game_revenue"))
            .col_vals_lt(columns="item_revenue", value=200)
            .col_vals_gt(columns="item_revenue", value=0)
            .col_vals_gt(columns="session_duration", value=5)
            .col_vals_in_set(columns="item_type", set=["iap", "ad"])
            .col_vals_regex(columns="player_id", pattern=r"[A-Z]{12}[0-9]{3}")
        )

        validation.interrogate(get_first_n=10)
        ```

        The validation table shows that step 3 (checking for `session_duration` greater than `5`)
        has 18 failing test units. This means that 18 rows in the table are problematic. We'd like
        to see the rows that failed this validation step and we can do that with the
        [`get_data_extracts()`](`pointblank.Validate.get_data_extracts`) method.

        ```python
        pb.preview(validation.get_data_extracts(i=3, frame=True))
        ```

        The [`get_data_extracts()`](`pointblank.Validate.get_data_extracts`) method will return a
        Polars DataFrame here with the first 10 rows that failed the validation step (we passed that
        into the [`preview()`](`pointblank.preview`) function for a better display). There are
        actually 18 rows that failed but we limited the collection of extracts with
        `get_first_n=10`.
        

set_tbl(self, tbl: 'FrameT | Any', tbl_name: 'str | None' = None, label: 'str | None' = None) -> 'Validate'

        Set or replace the table associated with the Validate object.

        This method allows you to replace the table associated with a Validate object with a
        different (but presumably similar) table. This is useful when you want to apply the same
        validation plan to multiple tables or when you have a validation workflow defined but want
        to swap in a different data source.

        Parameters
        ----------
        tbl
            The table to replace the existing table with. This can be any supported table type
            including DataFrame objects, Ibis table objects, CSV file paths, Parquet file paths,
            GitHub URLs, or database connection strings. The same table type constraints apply as in
            the `Validate` constructor.
        tbl_name
            An optional name to assign to the new input table object. If no value is provided, the
            existing table name will be retained.
        label
            An optional label for the validation plan. If no value is provided, the existing label
            will be retained.

        Returns
        -------
        Validate
            A new `Validate` object with the replacement table.

        When to Use
        -----------
        The `set_tbl()` method is particularly useful in scenarios where you have:

        - multiple similar tables that need the same validation checks
        - a template validation workflow that should be applied to different data sources
        - YAML-defined validations where you want to override the table specified in the YAML

        The `set_tbl()` method creates a copy of the validation object with the new table, so the
        original validation object remains unchanged. This allows you to reuse validation plans
        across multiple tables without interference.

        Examples
        --------
        We will first create two similar tables for our future validation plans.

        ```python
        import pointblank as pb
        import polars as pl

        # Create two similar tables
        table_1 = pl.DataFrame({
            "x": [1, 2, 3, 4, 5],
            "y": [5, 4, 3, 2, 1],
            "z": ["a", "b", "c", "d", "e"]
        })

        table_2 = pl.DataFrame({
            "x": [2, 4, 6, 8, 10],
            "y": [10, 8, 6, 4, 2],
            "z": ["f", "g", "h", "i", "j"]
        })
        ```

        Create a validation plan with the first table.

        ```python
        validation_table_1 = (
            pb.Validate(
                data=table_1,
                tbl_name="Table 1",
                label="Validation applied to the first table"
            )
            .col_vals_gt(columns="x", value=0)
            .col_vals_lt(columns="y", value=10)
        )
        ```

        Now apply the same validation plan to the second table.

        ```python
        validation_table_2 = (
            validation_table_1
            .set_tbl(
                tbl=table_2,
                tbl_name="Table 2",
                label="Validation applied to the second table"
            )
        )
        ```

        Here is the interrogation of the first table:

        ```python
        validation_table_1.interrogate()
        ```

        And the second table:

        ```python
        validation_table_2.interrogate()
        ```
        

get_tabular_report(self, title: 'str | None' = ':default:', incl_header: 'bool' = None, incl_footer: 'bool' = None) -> 'GT'

        Validation report as a GT table.

        The `get_tabular_report()` method returns a GT table object that represents the validation
        report. This validation table provides a summary of the validation results, including the
        validation steps, the number of test units, the number of failing test units, and the
        fraction of failing test units. The table also includes status indicators for the 'warning',
        'error', and 'critical' levels.

        You could simply display the validation table without the use of the `get_tabular_report()`
        method. However, the method provides a way to customize the title of the report. In the
        future this method may provide additional options for customizing the report.

        Parameters
        ----------
        title
            Options for customizing the title of the report. The default is the `":default:"` value
            which produces a generic title. Another option is `":tbl_name:"`, and that presents the
            name of the table as the title for the report. If no title is wanted, then `":none:"`
            can be used. Aside from keyword options, text can be provided for the title. This will
            be interpreted as Markdown text and transformed internally to HTML.

        Returns
        -------
        GT
            A GT table object that represents the validation report.

        Examples
        --------
        Let's create a `Validate` object with a few validation steps and then interrogate the data
        table to see how it performs against the validation plan. We can then generate a tabular
        report to get a summary of the results.

        ```python
        import pointblank as pb
        import polars as pl

        # Create a Polars DataFrame
        tbl_pl = pl.DataFrame({"x": [1, 2, 3, 4], "y": [4, 5, 6, 7]})

        # Validate data using Polars DataFrame
        validation = (
            pb.Validate(data=tbl_pl, tbl_name="tbl_xy", thresholds=(2, 3, 4))
            .col_vals_gt(columns="x", value=1)
            .col_vals_lt(columns="x", value=3)
            .col_vals_le(columns="y", value=7)
            .interrogate()
        )

        # Look at the validation table
        validation
        ```

        The validation table is displayed with a default title ('Validation Report'). We can use the
        `get_tabular_report()` method to customize the title of the report. For example, we can set
        the title to the name of the table by using the `title=":tbl_name:"` option. This will use
        the string provided in the `tbl_name=` argument of the `Validate` object.

        ```python
        validation.get_tabular_report(title=":tbl_name:")
        ```

        The title of the report is now set to the name of the table, which is 'tbl_xy'. This can be
        useful if you have multiple tables and want to keep track of which table the validation
        report is for.

        Alternatively, you can provide your own title for the report.

        ```python
        validation.get_tabular_report(title="Report for Table XY")
        ```

        The title of the report is now set to 'Report for Table XY'. This can be useful if you want
        to provide a more descriptive title for the report.
        

get_step_report(self, i: 'int', columns_subset: 'str | list[str] | Column | None' = None, header: 'str' = ':default:', limit: 'int | None' = 10) -> 'GT'

        Get a detailed report for a single validation step.

        The `get_step_report()` method returns a report of what went well---or what failed
        spectacularly---for a given validation step. The report includes a summary of the validation
        step and a detailed breakdown of the interrogation results. The report is presented as a GT
        table object, which can be displayed in a notebook or exported to an HTML file.

        :::{.callout-warning}
        The `get_step_report()` method is still experimental. Please report any issues you encounter
        in the [Pointblank issue tracker](https://github.com/posit-dev/pointblank/issues).
        :::

        Parameters
        ----------
        i
            The step number for which to get the report.
        columns_subset
            The columns to display in a step report that shows errors in the input table. By default
            all columns are shown (`None`). If a subset of columns is desired, we can provide a list
            of column names, a string with a single column name, a `Column` object, or a
            `ColumnSelector` object. The last two options allow for more flexible column selection
            using column selector functions. Errors are raised if the column names provided don't
            match any columns in the table (when provided as a string or list of strings) or if
            column selector expressions don't resolve to any columns.
        header
            Options for customizing the header of the step report. The default is the `":default:"`
            value which produces a header with a standard title and set of details underneath. Aside
            from this default, free text can be provided for the header. This will be interpreted as
            Markdown text and transformed internally to HTML. You can provide one of two templating
            elements: `{title}` and `{details}`. The default header has the template
            `"{title}{details}"` so you can easily start from that and modify as you see fit. If you
            don't want a header at all, you can set `header=None` to remove it entirely.
        limit
            The number of rows to display for those validation steps that check values in rows (the
            `col_vals_*()` validation steps). The default is `10` rows and the limit can be removed
            entirely by setting `limit=None`.

        Returns
        -------
        GT
            A GT table object that represents the detailed report for the validation step.

        Types of Step Reports
        ---------------------
        The `get_step_report()` method produces a report based on the *type* of validation step.
        The following column-value or row-based validation step validation methods will produce a
        report that shows the rows of the data that failed:

        - [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`)
        - [`col_vals_ge()`](`pointblank.Validate.col_vals_ge`)
        - [`col_vals_lt()`](`pointblank.Validate.col_vals_lt`)
        - [`col_vals_le()`](`pointblank.Validate.col_vals_le`)
        - [`col_vals_eq()`](`pointblank.Validate.col_vals_eq`)
        - [`col_vals_ne()`](`pointblank.Validate.col_vals_ne`)
        - [`col_vals_between()`](`pointblank.Validate.col_vals_between`)
        - [`col_vals_outside()`](`pointblank.Validate.col_vals_outside`)
        - [`col_vals_in_set()`](`pointblank.Validate.col_vals_in_set`)
        - [`col_vals_not_in_set()`](`pointblank.Validate.col_vals_not_in_set`)
        - [`col_vals_increasing()`](`pointblank.Validate.col_vals_increasing`)
        - [`col_vals_decreasing()`](`pointblank.Validate.col_vals_decreasing`)
        - [`col_vals_null()`](`pointblank.Validate.col_vals_null`)
        - [`col_vals_not_null()`](`pointblank.Validate.col_vals_not_null`)
        - [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`)
        - [`col_vals_within_spec()`](`pointblank.Validate.col_vals_within_spec`)
        - [`col_vals_expr()`](`pointblank.Validate.col_vals_expr`)
        - [`conjointly()`](`pointblank.Validate.conjointly`)
        - [`prompt()`](`pointblank.Validate.prompt`)
        - [`rows_complete()`](`pointblank.Validate.rows_complete`)

        The [`rows_distinct()`](`pointblank.Validate.rows_distinct`) validation step will produce a
        report that shows duplicate rows (or duplicate values in one or a set of columns as defined
        in that method's `columns_subset=` parameter.

        The [`col_schema_match()`](`pointblank.Validate.col_schema_match`) validation step will
        produce a report that shows the schema of the data table and the schema of the validation
        step. The report will indicate whether the schemas match or not.

        Examples
        --------
        Let's create a validation plan with a few validation steps and interrogate the data. With
        that, we'll have a look at the validation reporting table for the entire collection of
        steps and what went well or what failed.

        ```python
        import pointblank as pb

        validation = (
            pb.Validate(
                data=pb.load_dataset(dataset="small_table", tbl_type="pandas"),
                tbl_name="small_table",
                label="Example for the get_step_report() method",
                thresholds=(1, 0.20, 0.40)
            )
            .col_vals_lt(columns="d", value=3500)
            .col_vals_between(columns="c", left=1, right=8)
            .col_vals_gt(columns="a", value=3)
            .col_vals_regex(columns="b", pattern=r"[0-9]-[a-z]{3}-[0-9]{3}")
            .interrogate()
        )

        validation
        ```

        There were four validation steps performed, where the first three steps had failing test
        units and the last step had no failures. Let's get a detailed report for the first step by
        using the `get_step_report()` method.

        ```python
        validation.get_step_report(i=1)
        ```

        The report for the first step is displayed. The report includes a summary of the validation
        step and a detailed breakdown of the interrogation results. The report provides details on
        what the validation step was checking, the extent to which the test units failed, and a
        table that shows the failing rows of the data with the column of interest highlighted.

        The second and third steps also had failing test units. Reports for those steps can be
        viewed by using `get_step_report(i=2)` and `get_step_report(i=3)` respectively.

        The final step did not have any failing test units. A report for the final step can still be
        viewed by using `get_step_report(i=4)`. The report will indicate that every test unit passed
        and a prview of the target table will be provided.

        ```python
        validation.get_step_report(i=4)
        ```

        If you'd like to trim down the number of columns shown in the report, you can provide a
        subset of columns to display. For example, if you only want to see the columns `a`, `b`, and
        `c`, you can provide those column names as a list.

        ```python
        validation.get_step_report(i=1, columns_subset=["a", "b", "c"])
        ```

        If you'd like to increase or reduce the maximum number of rows shown in the report, you can
        provide a different value for the `limit` parameter. For example, if you'd like to see only
        up to 5 rows, you can set `limit=5`.

        ```python
        validation.get_step_report(i=3, limit=5)
        ```

        Step 3 actually had 7 failing test units, but only the first 5 rows are shown in the step
        report because of the `limit=5` parameter.
        

get_json_report(self, use_fields: 'list[str] | None' = None, exclude_fields: 'list[str] | None' = None) -> 'str'

        Get a report of the validation results as a JSON-formatted string.

        The `get_json_report()` method provides a machine-readable report of validation results in
        JSON format. This is particularly useful for programmatic processing, storing validation
        results, or integrating with other systems. The report includes detailed information about
        each validation step, such as assertion type, columns validated, threshold values, test
        results, and more.

        By default, all available validation information fields are included in the report. However,
        you can customize the fields to include or exclude using the `use_fields=` and
        `exclude_fields=` parameters.

        Parameters
        ----------
        use_fields
            An optional list of specific fields to include in the report. If provided, only these
            fields will be included in the JSON output. If `None` (the default), all standard
            validation report fields are included. Have a look at the *Available Report Fields*
            section below for a list of fields that can be included in the report.
        exclude_fields
            An optional list of fields to exclude from the report. If provided, these fields will
            be omitted from the JSON output. If `None` (the default), no fields are excluded.
            This parameter cannot be used together with `use_fields=`. The *Available Report Fields*
            provides a listing of fields that can be excluded from the report.

        Returns
        -------
        str
            A JSON-formatted string representing the validation report, with each validation step
            as an object in the report array.

        Available Report Fields
        -----------------------
        The JSON report can include any of the standard validation report fields, including:

        - `i`: the step number (1-indexed)
        - `i_o`: the original step index from the validation plan (pre-expansion)
        - `assertion_type`: the type of validation assertion (e.g., `"col_vals_gt"`, etc.)
        - `column`: the column being validated (or columns used in certain validations)
        - `values`: the comparison values or parameters used in the validation
        - `inclusive`: whether the comparison is inclusive (for range-based validations)
        - `na_pass`: whether `NA`/`Null` values are considered passing (for certain validations)
        - `pre`: preprocessing function applied before validation
        - `segments`: data segments to which the validation was applied
        - `thresholds`: threshold level statement that was used for the validation step
        - `label`: custom label for the validation step
        - `brief`: a brief description of the validation step
        - `active`: whether the validation step is active
        - `all_passed`: whether all test units passed in the step
        - `n`: total number of test units
        - `n_passed`, `n_failed`: number of test units that passed and failed
        - `f_passed`, `f_failed`: Fraction of test units that passed and failed
        - `warning`, `error`, `critical`: whether the namesake threshold level was exceeded (is
        `null` if threshold not set)
        - `time_processed`: when the validation step was processed (ISO 8601 format)
        - `proc_duration_s`: the processing duration in seconds

        Examples
        --------
        Let's create a validation plan with a few validation steps and generate a JSON report of the
        results:

        ```python
        import pointblank as pb
        import polars as pl

        # Create a sample DataFrame
        tbl = pl.DataFrame({
            "a": [5, 7, 8, 9],
            "b": [3, 4, 2, 1]
        })

        # Create and execute a validation plan
        validation = (
            pb.Validate(data=tbl)
            .col_vals_gt(columns="a", value=6)
            .col_vals_lt(columns="b", value=4)
            .interrogate()
        )

        # Get the full JSON report
        json_report = validation.get_json_report()

        print(json_report)
        ```

        You can also customize which fields to include:

        ```python
        json_report = validation.get_json_report(
            use_fields=["i", "assertion_type", "column", "n_passed", "n_failed"]
        )

        print(json_report)
        ```

        Or which fields to exclude:

        ```python
        json_report = validation.get_json_report(
            exclude_fields=[
                "i_o", "thresholds", "pre", "segments", "values",
                "na_pass", "inclusive", "label", "brief", "active",
                "time_processed", "proc_duration_s"
            ]
        )

        print(json_report)
        ```

        The JSON output can be further processed or analyzed programmatically:

        ```python
        import json

        # Parse the JSON report
        report_data = json.loads(validation.get_json_report())

        # Extract and analyze validation results
        failing_steps = [step for step in report_data if step["n_failed"] > 0]
        print(f"Number of failing validation steps: {len(failing_steps)}")
        ```

        See Also
        --------
        - [`get_tabular_report()`](`pointblank.Validate.get_tabular_report`): Get a formatted HTML
        report as a GT table
        - [`get_data_extracts()`](`pointblank.Validate.get_data_extracts`): Get rows that
        failed validation
        

get_sundered_data(self, type='pass') -> 'FrameT'

        Get the data that passed or failed the validation steps.

        Validation of the data is one thing but, sometimes, you want to use the best part of the
        input dataset for something else. The `get_sundered_data()` method works with a `Validate`
        object that has been interrogated (i.e., the
        [`interrogate()`](`pointblank.Validate.interrogate`) method was used). We can get either the
        'pass' data piece (rows with no failing test units across all column-value based validation
        functions), or, the 'fail' data piece (rows with at least one failing test unit across the
        same series of validations).

        Details
        -------
        There are some caveats to sundering. The validation steps considered for this splitting will
        only involve steps where:

        - of certain check types, where test units are cells checked down a column (e.g., the
        `col_vals_*()` methods)
        - `active=` is not set to `False`
        - `pre=` has not been given an expression for modifying the input table

        So long as these conditions are met, the data will be split into two constituent tables: one
        with the rows that passed all validation steps and another with the rows that failed at
        least one validation step.

        Parameters
        ----------
        type
            The type of data to return. Options are `"pass"` or `"fail"`, where the former returns
            a table only containing rows where test units always passed validation steps, and the
            latter returns a table only containing rows had test units that failed in at least one
            validation step.

        Returns
        -------
        FrameT
            A table containing the data that passed or failed the validation steps.

        Examples
        --------
        Let's create a `Validate` object with three validation steps and then interrogate the data.

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [7, 6, 9, 7, 3, 2],
                "b": [9, 8, 10, 5, 10, 6],
                "c": ["c", "d", "a", "b", "a", "b"]
            }
        )

        validation = (
            pb.Validate(data=tbl)
            .col_vals_gt(columns="a", value=5)
            .col_vals_in_set(columns="c", set=["a", "b"])
            .interrogate()
        )

        validation
        ```

        From the validation table, we can see that the first and second steps each had 4 passing
        test units. A failing test unit will mark the entire row as failing in the context of the
        `get_sundered_data()` method. We can use this method to get the rows of data that passed the
        during interrogation.

        ```python
        pb.preview(validation.get_sundered_data())
        ```

        The returned DataFrame contains the rows that passed all validation steps (we passed this
        object to [`preview()`](`pointblank.preview`) to show it in an HTML view). From the six-row
        input DataFrame, the first two rows and the last two rows had test units that failed
        validation. Thus the middle two rows are the only ones that passed all validation steps and
        that's what we see in the returned DataFrame.
        

get_data_extracts(self, i: 'int | list[int] | None' = None, frame: 'bool' = False) -> 'dict[int, FrameT | None] | FrameT | None'

        Get the rows that failed for each validation step.

        After the [`interrogate()`](`pointblank.Validate.interrogate`) method has been called, the
        `get_data_extracts()` method can be used to extract the rows that failed in each
        column-value or row-based validation step (e.g.,
        [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`),
        [`rows_distinct()`](`pointblank.Validate.rows_distinct`), etc.). The method returns a
        dictionary of tables containing the rows that failed in every validation step. If
        `frame=True` and `i=` is a scalar, the value is conveniently returned as a table (forgoing
        the dictionary structure).

        Parameters
        ----------
        i
            The validation step number(s) from which the failed rows are obtained. Can be provided
            as a list of integers or a single integer. If `None`, all steps are included.
        frame
            If `True` and `i=` is a scalar, return the value as a DataFrame instead of a dictionary.

        Returns
        -------
        dict[int, FrameT | None] | FrameT | None
            A dictionary of tables containing the rows that failed in every compatible validation
            step. Alternatively, it can be a DataFrame if `frame=True` and `i=` is a scalar.

        Compatible Validation Methods for Yielding Extracted Rows
        ---------------------------------------------------------
        The following validation methods operate on column values and will have rows extracted when
        there are failing test units.

        - [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`)
        - [`col_vals_ge()`](`pointblank.Validate.col_vals_ge`)
        - [`col_vals_lt()`](`pointblank.Validate.col_vals_lt`)
        - [`col_vals_le()`](`pointblank.Validate.col_vals_le`)
        - [`col_vals_eq()`](`pointblank.Validate.col_vals_eq`)
        - [`col_vals_ne()`](`pointblank.Validate.col_vals_ne`)
        - [`col_vals_between()`](`pointblank.Validate.col_vals_between`)
        - [`col_vals_outside()`](`pointblank.Validate.col_vals_outside`)
        - [`col_vals_in_set()`](`pointblank.Validate.col_vals_in_set`)
        - [`col_vals_not_in_set()`](`pointblank.Validate.col_vals_not_in_set`)
        - [`col_vals_increasing()`](`pointblank.Validate.col_vals_increasing`)
        - [`col_vals_decreasing()`](`pointblank.Validate.col_vals_decreasing`)
        - [`col_vals_null()`](`pointblank.Validate.col_vals_null`)
        - [`col_vals_not_null()`](`pointblank.Validate.col_vals_not_null`)
        - [`col_vals_regex()`](`pointblank.Validate.col_vals_regex`)
        - [`col_vals_within_spec()`](`pointblank.Validate.col_vals_within_spec`)
        - [`col_vals_expr()`](`pointblank.Validate.col_vals_expr`)
        - [`conjointly()`](`pointblank.Validate.conjointly`)
        - [`prompt()`](`pointblank.Validate.prompt`)

        An extracted row for these validation methods means that a test unit failed for that row in
        the validation step.

        These row-based validation methods will also have rows extracted should there be failing
        rows:

        - [`rows_distinct()`](`pointblank.Validate.rows_distinct`)
        - [`rows_complete()`](`pointblank.Validate.rows_complete`)

        The extracted rows are a subset of the original table and are useful for further analysis
        or for understanding the nature of the failing test units.

        Examples
        --------
        Let's perform a series of validation steps on a Polars DataFrame. We'll use the
        [`col_vals_gt()`](`pointblank.Validate.col_vals_gt`) in the first step,
        [`col_vals_lt()`](`pointblank.Validate.col_vals_lt`) in the second step, and
        [`col_vals_ge()`](`pointblank.Validate.col_vals_ge`) in the third step. The
        [`interrogate()`](`pointblank.Validate.interrogate`) method executes the validation; then,
        we can extract the rows that failed for each validation step.

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [5, 6, 5, 3, 6, 1],
                "b": [1, 2, 1, 5, 2, 6],
                "c": [3, 7, 2, 6, 3, 1],
            }
        )

        validation = (
            pb.Validate(data=tbl)
            .col_vals_gt(columns="a", value=4)
            .col_vals_lt(columns="c", value=5)
            .col_vals_ge(columns="b", value=1)
            .interrogate()
        )

        validation.get_data_extracts()
        ```

        The `get_data_extracts()` method returns a dictionary of tables, where each table contains
        a subset of rows from the table. These are the rows that failed for each validation step.

        In the first step, the[`col_vals_gt()`](`pointblank.Validate.col_vals_gt`) method was used
        to check if the values in column `a` were greater than `4`. The extracted table shows the
        rows where this condition was not met; look at the `a` column: all values are less than `4`.

        In the second step, the [`col_vals_lt()`](`pointblank.Validate.col_vals_lt`) method was
        used to check if the values in column `c` were less than `5`. In the extracted two-row
        table, we see that the values in column `c` are greater than `5`.

        The third step ([`col_vals_ge()`](`pointblank.Validate.col_vals_ge`)) checked if the values
        in column `b` were greater than or equal to `1`. There were no failing test units, so the
        extracted table is empty (i.e., has columns but no rows).

        The `i=` argument can be used to narrow down the extraction to one or more steps. For
        example, to extract the rows that failed in the first step only:

        ```python
        validation.get_data_extracts(i=1)
        ```

        Note that the first validation step is indexed at `1` (not `0`). This 1-based indexing is
        in place here to match the step numbers reported in the validation table. What we get back
        is still a dictionary, but it only contains one table (the one for the first step).

        If you want to get the extracted table as a DataFrame, set `frame=True` and provide a scalar
        value for `i`. For example, to get the extracted table for the second step as a DataFrame:

        ```python
        pb.preview(validation.get_data_extracts(i=2, frame=True))
        ```

        The extracted table is now a DataFrame, which can serve as a more convenient format for
        further analysis or visualization. We further used the [`preview()`](`pointblank.preview`)
        function to show the DataFrame in an HTML view.
        

all_passed(self) -> 'bool'

        Determine if every validation step passed perfectly, with no failing test units.

        The `all_passed()` method determines if every validation step passed perfectly, with no
        failing test units. This method is useful for quickly checking if the table passed all
        validation steps with flying colors. If there's even a single failing test unit in any
        validation step, this method will return `False`.

        This validation metric might be overly stringent for some validation plans where failing
        test units are generally expected (and the strategy is to monitor data quality over time).
        However, the value of `all_passed()` could be suitable for validation plans designed to
        ensure that every test unit passes perfectly (e.g., checks for column presence,
        null-checking tests, etc.).

        Returns
        -------
        bool
            `True` if all validation steps had no failing test units, `False` otherwise.

        Examples
        --------
        In the example below, we'll use a simple Polars DataFrame with three columns (`a`, `b`, and
        `c`). There will be three validation steps, and the second step will have a failing test
        unit (the value `10` isn't less than `9`). After interrogation, the `all_passed()` method is
        used to determine if all validation steps passed perfectly.

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [1, 2, 9, 5],
                "b": [5, 6, 10, 3],
                "c": ["a", "b", "a", "a"],
            }
        )

        validation = (
            pb.Validate(data=tbl)
            .col_vals_gt(columns="a", value=0)
            .col_vals_lt(columns="b", value=9)
            .col_vals_in_set(columns="c", set=["a", "b"])
            .interrogate()
        )

        validation.all_passed()
        ```

        The returned value is `False` since the second validation step had a failing test unit. If
        it weren't for that one failing test unit, the return value would have been `True`.
        

assert_passing(self) -> 'None'

        Raise an `AssertionError` if all tests are not passing.

        The `assert_passing()` method will raise an `AssertionError` if a test does not pass. This
        method simply wraps `all_passed` for more ready use in test suites. The step number and
        assertion made is printed in the `AssertionError` message if a failure occurs, ensuring
        some details are preserved.

        If the validation has not yet been interrogated, this method will automatically call
        [`interrogate()`](`pointblank.Validate.interrogate`) with default parameters before checking
        for passing tests.

        Raises
        -------
        AssertionError
            If any validation step has failing test units.

        Examples
        --------
        In the example below, we'll use a simple Polars DataFrame with three columns (`a`, `b`, and
        `c`). There will be three validation steps, and the second step will have a failing test
        unit (the value `10` isn't less than `9`). The `assert_passing()` method is used to assert
        that all validation steps passed perfectly, automatically performing the interrogation if
        needed.

        ```python
        #| error: True

        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
            "a": [1, 2, 9, 5],
            "b": [5, 6, 10, 3],
            "c": ["a", "b", "a", "a"],
            }
        )

        validation = (
            pb.Validate(data=tbl)
            .col_vals_gt(columns="a", value=0)
            .col_vals_lt(columns="b", value=9) # this assertion is false
            .col_vals_in_set(columns="c", set=["a", "b"])
        )

        # No need to call [`interrogate()`](`pointblank.Validate.interrogate`) explicitly
        validation.assert_passing()
        ```
        

assert_below_threshold(self, level: 'str' = 'warning', i: 'int | None' = None, message: 'str | None' = None) -> 'None'

        Raise an `AssertionError` if validation steps exceed a specified threshold level.

        The `assert_below_threshold()` method checks whether validation steps' failure rates are
        below a given threshold level (`"warning"`, `"error"`, or `"critical"`). This is
        particularly useful in automated testing environments where you want to ensure your data
        quality meets minimum standards before proceeding.

        If any validation step exceeds the specified threshold level, an `AssertionError` will be
        raised with details about which steps failed. If the validation has not yet been
        interrogated, this method will automatically call
        [`interrogate()`](`pointblank.Validate.interrogate`) with default parameters.

        Parameters
        ----------
        level
            The threshold level to check against, which could be any of `"warning"` (the default),
            `"error"`, or `"critical"`. An `AssertionError` will be raised if any validation step
            exceeds this level.
        i
            Specific validation step number(s) to check. Can be provided as a single integer or a
            list of integers. If `None` (the default), all steps are checked.
        message
            Custom error message to use if assertion fails. If `None`, a default message will be
            generated that lists the specific steps that exceeded the threshold.

        Returns
        -------
        None

        Raises
        ------
        AssertionError
            If any specified validation step exceeds the given threshold level.
        ValueError
            If an invalid threshold level is provided.

        Examples
        --------
        Below are some examples of how to use the `assert_below_threshold()` method. First, we'll
        create a simple Polars DataFrame with two columns (`a` and `b`).

        ```python
        import polars as pl

        tbl = pl.DataFrame({
            "a": [7, 4, 9, 7, 12],
            "b": [9, 8, 10, 5, 10]
        })
        ```

        Then a validation plan will be created with thresholds (`warning=0.1`, `error=0.2`,
        `critical=0.3`). After interrogating, we display the validation report table:

        ```python
        import pointblank as pb

        validation = (
            pb.Validate(data=tbl, thresholds=(0.1, 0.2, 0.3))
            .col_vals_gt(columns="a", value=5)   # 1 failing test unit
            .col_vals_lt(columns="b", value=10)  # 2 failing test units
            .interrogate()
        )

        validation
        ```

        Using `assert_below_threshold(level="warning")` will raise an `AssertionError` if any step
        exceeds the 'warning' threshold:

        Check a specific step against the 'critical' threshold using the `i=` parameter:

        ```python
        validation.assert_below_threshold(level="critical", i=1)  # Won't raise an error
        ```

        As the first step is below the 'critical' threshold (it exceeds the 'warning' and 'error'
        thresholds), no error is raised and nothing is printed.

        We can also provide a custom error message with the `message=` parameter. Let's try that
        here:

        ```python
        try:
            validation.assert_below_threshold(
                level="error",
                message="Data quality too low for processing!"
            )
        except AssertionError as e:
            print(f"Custom error: {e}")
        ```

        See Also
        --------
        - [`warning()`](`pointblank.Validate.warning`): get the 'warning' status for each validation
        step
        - [`error()`](`pointblank.Validate.error`): get the 'error' status for each validation step
        - [`critical()`](`pointblank.Validate.critical`): get the 'critical' status for each
        validation step
        - [`assert_passing()`](`pointblank.Validate.assert_passing`): assert all validations pass
        completely
        

above_threshold(self, level: 'str' = 'warning', i: 'int | None' = None) -> 'bool'

        Check if any validation steps exceed a specified threshold level.

        The `above_threshold()` method checks whether validation steps exceed a given threshold
        level. This provides a non-exception-based alternative to
        [`assert_below_threshold()`](`pointblank.Validate.assert_below_threshold`) for conditional
        workflow control based on validation results.

        This method is useful in scenarios where you want to check if any validation steps failed
        beyond a certain threshold without raising an exception, allowing for more flexible
        programmatic responses to validation issues.

        Parameters
        ----------
        level
            The threshold level to check against. Valid options are: `"warning"` (the least severe
            threshold level), `"error"` (the middle severity threshold level), and `"critical"` (the
            most severe threshold level). The default is `"warning"`.
        i
            Specific validation step number(s) to check. If a single integer, checks only that step.
            If a list of integers, checks all specified steps. If `None` (the default), checks all
            validation steps. Step numbers are 1-based (first step is `1`, not `0`).

        Returns
        -------
        bool
            `True` if any of the specified validation steps exceed the given threshold level,
            `False` otherwise.

        Raises
        ------
        ValueError
            If an invalid threshold level is provided.

        Examples
        --------
        Below are some examples of how to use the `above_threshold()` method. First, we'll create a
        simple Polars DataFrame with a single column (`values`).

        Then a validation plan will be created with thresholds (`warning=0.1`, `error=0.2`,
        `critical=0.3`). After interrogating, we display the validation report table:

        ```python
        import pointblank as pb

        validation = (
            pb.Validate(data=tbl, thresholds=(0.1, 0.2, 0.3))
            .col_vals_gt(columns="values", value=0)
            .col_vals_lt(columns="values", value=10)
            .col_vals_between(columns="values", left=0, right=5)
            .interrogate()
        )

        validation
        ```

        Let's check if any steps exceed the 'warning' threshold with the `above_threshold()` method.
        A message will be printed if that's the case:

        ```python
        if validation.above_threshold(level="warning"):
            print("Some steps have exceeded the warning threshold")
        ```

        Check if only steps 2 and 3 exceed the 'error' threshold through use of the `i=` argument:

        ```python
        if validation.above_threshold(level="error", i=[2, 3]):
            print("Steps 2 and/or 3 have exceeded the error threshold")
        ```

        You can use this in a workflow to conditionally trigger processes. Here's a snippet of how
        you might use this in a function:

        ```python
        def process_data(validation_obj):
            # Only continue processing if validation passes critical thresholds
            if not validation_obj.above_threshold(level="critical"):
                # Continue with processing
                print("Data meets critical quality thresholds, proceeding...")
                return True
            else:
                # Log failure and stop processing
                print("Data fails critical quality checks, aborting...")
                return False
        ```

        Note that this is just a suggestion for how to implement conditional workflow processes. You
        should adapt this pattern to your specific requirements, which might include  different
        threshold levels, custom logging mechanisms, or integration with your organization's data
        pipelines and notification systems.

        See Also
        --------
        - [`assert_below_threshold()`](`pointblank.Validate.assert_below_threshold`): a similar
        method that raises an exception if thresholds are exceeded
        - [`warning()`](`pointblank.Validate.warning`): get the 'warning' status for each validation
        step
        - [`error()`](`pointblank.Validate.error`): get the 'error' status for each validation step
        - [`critical()`](`pointblank.Validate.critical`): get the 'critical' status for each
        validation step
        

n(self, i: 'int | list[int] | None' = None, scalar: 'bool' = False) -> 'dict[int, int] | int'

        Provides a dictionary of the number of test units for each validation step.

        The `n()` method provides the number of test units for each validation step. This is the
        total number of test units that were evaluated in the validation step. It is always an
        integer value.

        Test units are the atomic units of the validation process. Different validations can have
        different numbers of test units. For example, a validation that checks for the presence of
        a column in a table will have a single test unit. A validation that checks for the presence
        of a value in a column will have as many test units as there are rows in the table.

        The method provides a dictionary of the number of test units for each validation step. If
        the `scalar=True` argument is provided and `i=` is a scalar, the value is returned as a
        scalar instead of a dictionary. The total number of test units for a validation step is the
        sum of the number of passing and failing test units (i.e., `n = n_passed + n_failed`).

        Parameters
        ----------
        i
            The validation step number(s) from which the number of test units is obtained.
            Can be provided as a list of integers or a single integer. If `None`, all steps are
            included.
        scalar
            If `True` and `i=` is a scalar, return the value as a scalar instead of a dictionary.

        Returns
        -------
        dict[int, int] | int
            A dictionary of the number of test units for each validation step or a scalar value.

        Examples
        --------
        Different types of validation steps can have different numbers of test units. In the example
        below, we'll use a simple Polars DataFrame with three columns (`a`, `b`, and `c`). There
        will be three validation steps, and the number of test units for each step will be a little
        bit different.

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [1, 2, 9, 5],
                "b": [5, 6, 10, 3],
                "c": ["a", "b", "a", "a"],
            }
        )

        # Define a preprocessing function
        def filter_by_a_gt_1(df):
            return df.filter(pl.col("a") > 1)

        validation = (
            pb.Validate(data=tbl)
            .col_vals_gt(columns="a", value=0)
            .col_exists(columns="b")
            .col_vals_lt(columns="b", value=9, pre=filter_by_a_gt_1)
            .interrogate()
        )
        ```

        The first validation step checks that all values in column `a` are greater than `0`. Let's
        use the `n()` method to determine the number of test units this validation step.

        ```python
        validation.n(i=1, scalar=True)
        ```

        The returned value of `4` is the number of test units for the first validation step. This
        value is the same as the number of rows in the table.

        The second validation step checks for the existence of column `b`. Using the `n()` method
        we can get the number of test units for this the second step.

        ```python
        validation.n(i=2, scalar=True)
        ```

        There's a single test unit here because the validation step is checking for the presence of
        a single column.

        The third validation step checks that all values in column `b` are less than `9` after
        filtering the table to only include rows where the value in column `a` is greater than `1`.
        Because the table is filtered, the number of test units will be less than the total number
        of rows in the input table. Let's prove this by using the `n()` method.

        ```python
        validation.n(i=3, scalar=True)
        ```

        The returned value of `3` is the number of test units for the third validation step. When
        using the `pre=` argument, the input table can be mutated before performing the validation.
        The `n()` method is a good way to determine whether the mutation performed as expected.

        In all of these examples, the `scalar=True` argument was used to return the value as a
        scalar integer value. If `scalar=False`, the method will return a dictionary with an entry
        for the validation step number (from the `i=` argument) and the number of test units.
        Futhermore, leaving out the `i=` argument altogether will return a dictionary with filled
        with the number of test units for each validation step. Here's what that looks like:

        ```python
        validation.n()
        ```
        

n_passed(self, i: 'int | list[int] | None' = None, scalar: 'bool' = False) -> 'dict[int, int] | int'

        Provides a dictionary of the number of test units that passed for each validation step.

        The `n_passed()` method provides the number of test units that passed for each validation
        step. This is the number of test units that passed in the the validation step. It is always
        some integer value between `0` and the total number of test units.

        Test units are the atomic units of the validation process. Different validations can have
        different numbers of test units. For example, a validation that checks for the presence of
        a column in a table will have a single test unit. A validation that checks for the presence
        of a value in a column will have as many test units as there are rows in the table.

        The method provides a dictionary of the number of passing test units for each validation
        step. If the `scalar=True` argument is provided and `i=` is a scalar, the value is returned
        as a scalar instead of a dictionary. Furthermore, a value obtained here will be the
        complement to the analogous value returned by the
        [`n_passed()`](`pointblank.Validate.n_passed`) method (i.e., `n - n_failed`).

        Parameters
        ----------
        i
            The validation step number(s) from which the number of passing test units is obtained.
            Can be provided as a list of integers or a single integer. If `None`, all steps are
            included.
        scalar
            If `True` and `i=` is a scalar, return the value as a scalar instead of a dictionary.

        Returns
        -------
        dict[int, int] | int
            A dictionary of the number of passing test units for each validation step or a scalar
            value.

        Examples
        --------
        In the example below, we'll use a simple Polars DataFrame with three columns (`a`, `b`, and
        `c`). There will be three validation steps and, as it turns out, all of them will have
        failing test units. After interrogation, the `n_passed()` method is used to determine the
        number of passing test units for each validation step.

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [7, 4, 9, 7, 12],
                "b": [9, 8, 10, 5, 10],
                "c": ["a", "b", "c", "a", "b"]
            }
        )

        validation = (
            pb.Validate(data=tbl)
            .col_vals_gt(columns="a", value=5)
            .col_vals_gt(columns="b", value=pb.col("a"))
            .col_vals_in_set(columns="c", set=["a", "b"])
            .interrogate()
        )

        validation.n_passed()
        ```

        The returned dictionary shows that all validation steps had no passing test units (each
        value was less than `5`, which is the total number of test units for each step).

        If we wanted to check the number of passing test units for a single validation step, we can
        provide the step number. Also, we could forego the dictionary and get a scalar value by
        setting `scalar=True` (ensuring that `i=` is a scalar).

        ```python
        validation.n_passed(i=1)
        ```

        The returned value of `4` is the number of passing test units for the first validation step.
        

n_failed(self, i: 'int | list[int] | None' = None, scalar: 'bool' = False) -> 'dict[int, int] | int'

        Provides a dictionary of the number of test units that failed for each validation step.

        The `n_failed()` method provides the number of test units that failed for each validation
        step. This is the number of test units that did not pass in the the validation step. It is
        always some integer value between `0` and the total number of test units.

        Test units are the atomic units of the validation process. Different validations can have
        different numbers of test units. For example, a validation that checks for the presence of
        a column in a table will have a single test unit. A validation that checks for the presence
        of a value in a column will have as many test units as there are rows in the table.

        The method provides a dictionary of the number of failing test units for each validation
        step. If the `scalar=True` argument is provided and `i=` is a scalar, the value is returned
        as a scalar instead of a dictionary. Furthermore, a value obtained here will be the
        complement to the analogous value returned by the
        [`n_passed()`](`pointblank.Validate.n_passed`) method (i.e., `n - n_passed`).

        Parameters
        ----------
        i
            The validation step number(s) from which the number of failing test units is obtained.
            Can be provided as a list of integers or a single integer. If `None`, all steps are
            included.
        scalar
            If `True` and `i=` is a scalar, return the value as a scalar instead of a dictionary.

        Returns
        -------
        dict[int, int] | int
            A dictionary of the number of failing test units for each validation step or a scalar
            value.

        Examples
        --------
        In the example below, we'll use a simple Polars DataFrame with three columns (`a`, `b`, and
        `c`). There will be three validation steps and, as it turns out, all of them will have
        failing test units. After interrogation, the `n_failed()` method is used to determine the
        number of failing test units for each validation step.

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [7, 4, 9, 7, 12],
                "b": [9, 8, 10, 5, 10],
                "c": ["a", "b", "c", "a", "b"]
            }
        )

        validation = (
            pb.Validate(data=tbl)
            .col_vals_gt(columns="a", value=5)
            .col_vals_gt(columns="b", value=pb.col("a"))
            .col_vals_in_set(columns="c", set=["a", "b"])
            .interrogate()
        )

        validation.n_failed()
        ```

        The returned dictionary shows that all validation steps had failing test units.

        If we wanted to check the number of failing test units for a single validation step, we can
        provide the step number. Also, we could forego the dictionary and get a scalar value by
        setting `scalar=True` (ensuring that `i=` is a scalar).

        ```python
        validation.n_failed(i=1)
        ```

        The returned value of `1` is the number of failing test units for the first validation step.
        

f_passed(self, i: 'int | list[int] | None' = None, scalar: 'bool' = False) -> 'dict[int, float] | float'

        Provides a dictionary of the fraction of test units that passed for each validation step.

        A measure of the fraction of test units that passed is provided by the `f_passed` attribute.
        This is the fraction of test units that passed the validation step over the total number of
        test units. Given this is a fractional value, it will always be in the range of `0` to `1`.

        Test units are the atomic units of the validation process. Different validations can have
        different numbers of test units. For example, a validation that checks for the presence of
        a column in a table will have a single test unit. A validation that checks for the presence
        of a value in a column will have as many test units as there are rows in the table.

        This method provides a dictionary of the fraction of passing test units for each validation
        step. If the `scalar=True` argument is provided and `i=` is a scalar, the value is returned
        as a scalar instead of a dictionary. Furthermore, a value obtained here will be the
        complement to the analogous value returned by the
        [`f_failed()`](`pointblank.Validate.f_failed`) method (i.e., `1 - f_failed()`).

        Parameters
        ----------
        i
            The validation step number(s) from which the fraction of passing test units is obtained.
            Can be provided as a list of integers or a single integer. If `None`, all steps are
            included.
        scalar
            If `True` and `i=` is a scalar, return the value as a scalar instead of a dictionary.

        Returns
        -------
        dict[int, float] | float
            A dictionary of the fraction of passing test units for each validation step or a scalar
            value.

        Examples
        --------
        In the example below, we'll use a simple Polars DataFrame with three columns (`a`, `b`, and
        `c`). There will be three validation steps, all having some failing test units. After
        interrogation, the `f_passed()` method is used to determine the fraction of passing test
        units for each validation step.

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [7, 4, 9, 7, 12, 3, 10],
                "b": [9, 8, 10, 5, 10, 6, 2],
                "c": ["a", "b", "c", "a", "b", "d", "c"]
            }
        )

        validation = (
            pb.Validate(data=tbl)
            .col_vals_gt(columns="a", value=5)
            .col_vals_gt(columns="b", value=pb.col("a"))
            .col_vals_in_set(columns="c", set=["a", "b"])
            .interrogate()
        )

        validation.f_passed()
        ```

        The returned dictionary shows the fraction of passing test units for each validation step.
        The values are all less than `1` since there were failing test units in each step.

        If we wanted to check the fraction of passing test units for a single validation step, we
        can provide the step number. Also, we could have the value returned as a scalar by setting
        `scalar=True` (ensuring that `i=` is a scalar).

        ```python
        validation.f_passed(i=1)
        ```

        The returned value is the proportion of passing test units for the first validation step
        (5 passing test units out of 7 total test units).
        

f_failed(self, i: 'int | list[int] | None' = None, scalar: 'bool' = False) -> 'dict[int, float] | float'

        Provides a dictionary of the fraction of test units that failed for each validation step.

        A measure of the fraction of test units that failed is provided by the `f_failed` attribute.
        This is the fraction of test units that failed the validation step over the total number of
        test units. Given this is a fractional value, it will always be in the range of `0` to `1`.

        Test units are the atomic units of the validation process. Different validations can have
        different numbers of test units. For example, a validation that checks for the presence of
        a column in a table will have a single test unit. A validation that checks for the presence
        of a value in a column will have as many test units as there are rows in the table.

        This method provides a dictionary of the fraction of failing test units for each validation
        step. If the `scalar=True` argument is provided and `i=` is a scalar, the value is returned
        as a scalar instead of a dictionary. Furthermore, a value obtained here will be the
        complement to the analogous value returned by the
        [`f_passed()`](`pointblank.Validate.f_passed`) method (i.e., `1 - f_passed()`).

        Parameters
        ----------
        i
            The validation step number(s) from which the fraction of failing test units is obtained.
            Can be provided as a list of integers or a single integer. If `None`, all steps are
            included.
        scalar
            If `True` and `i=` is a scalar, return the value as a scalar instead of a dictionary.

        Returns
        -------
        dict[int, float] | float
            A dictionary of the fraction of failing test units for each validation step or a scalar
            value.

        Examples
        --------
        In the example below, we'll use a simple Polars DataFrame with three columns (`a`, `b`, and
        `c`). There will be three validation steps, all having some failing test units. After
        interrogation, the `f_failed()` method is used to determine the fraction of failing test
        units for each validation step.

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [7, 4, 9, 7, 12, 3, 10],
                "b": [9, 8, 10, 5, 10, 6, 2],
                "c": ["a", "b", "c", "a", "b", "d", "c"]
            }
        )

        validation = (
            pb.Validate(data=tbl)
            .col_vals_gt(columns="a", value=5)
            .col_vals_gt(columns="b", value=pb.col("a"))
            .col_vals_in_set(columns="c", set=["a", "b"])
            .interrogate()
        )

        validation.f_failed()
        ```

        The returned dictionary shows the fraction of failing test units for each validation step.
        The values are all greater than `0` since there were failing test units in each step.

        If we wanted to check the fraction of failing test units for a single validation step, we
        can provide the step number. Also, we could have the value returned as a scalar by setting
        `scalar=True` (ensuring that `i=` is a scalar).

        ```python
        validation.f_failed(i=1)
        ```

        The returned value is the proportion of failing test units for the first validation step
        (2 failing test units out of 7 total test units).
        

warning(self, i: 'int | list[int] | None' = None, scalar: 'bool' = False) -> 'dict[int, bool] | bool'

        Get the 'warning' level status for each validation step.

        The 'warning' status for a validation step is `True` if the fraction of failing test units
        meets or exceeds the threshold for the 'warning' level. Otherwise, the status is `False`.

        The ascribed name of 'warning' is semantic and does not imply that a warning message is
        generated, it is simply a status indicator that could be used to trigger some action to be
        taken. Here's how it fits in with other status indicators:

        - 'warning': the status obtained by calling 'warning()', least severe
        - 'error': the status obtained by calling [`error()`](`pointblank.Validate.error`), middle
        severity
        - 'critical': the status obtained by calling [`critical()`](`pointblank.Validate.critical`),
        most severe

        This method provides a dictionary of the 'warning' status for each validation step. If the
        `scalar=True` argument is provided and `i=` is a scalar, the value is returned as a scalar
        instead of a dictionary.

        Parameters
        ----------
        i
            The validation step number(s) from which the 'warning' status is obtained. Can be
            provided as a list of integers or a single integer. If `None`, all steps are included.
        scalar
            If `True` and `i=` is a scalar, return the value as a scalar instead of a dictionary.

        Returns
        -------
        dict[int, bool] | bool
            A dictionary of the 'warning' status for each validation step or a scalar value.

        Examples
        --------
        In the example below, we'll use a simple Polars DataFrame with three columns (`a`, `b`, and
        `c`). There will be three validation steps, and the first step will have some failing test
        units, the rest will be completely passing. We've set thresholds here for each of the steps
        by using `thresholds=(2, 4, 5)`, which means:

        - the 'warning' threshold is `2` failing test units
        - the 'error' threshold is `4` failing test units
        - the 'critical' threshold is `5` failing test units

        After interrogation, the `warning()` method is used to determine the 'warning' status for
        each validation step.

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [7, 4, 9, 7, 12, 3, 10],
                "b": [9, 8, 10, 5, 10, 6, 2],
                "c": ["a", "b", "a", "a", "b", "b", "a"]
            }
        )

        validation = (
            pb.Validate(data=tbl, thresholds=(2, 4, 5))
            .col_vals_gt(columns="a", value=5)
            .col_vals_lt(columns="b", value=15)
            .col_vals_in_set(columns="c", set=["a", "b"])
            .interrogate()
        )

        validation.warning()
        ```

        The returned dictionary provides the 'warning' status for each validation step. The first
        step has a `True` value since the number of failing test units meets the threshold for the
        'warning' level. The second and third steps have `False` values since the number of failing
        test units was `0`, which is below the threshold for the 'warning' level.

        We can also visually inspect the 'warning' status across all steps by viewing the validation
        table:

        ```python
        validation
        ```

        We can see that there's a filled gray circle in the first step (look to the far right side,
        in the `W` column) indicating that the 'warning' threshold was met. The other steps have
        empty gray circles. This means that thresholds were 'set but not met' in those steps.

        If we wanted to check the 'warning' status for a single validation step, we can provide the
        step number. Also, we could have the value returned as a scalar by setting `scalar=True`
        (ensuring that `i=` is a scalar).

        ```python
        validation.warning(i=1)
        ```

        The returned value is `True`, indicating that the first validation step had met the
        'warning' threshold.
        

error(self, i: 'int | list[int] | None' = None, scalar: 'bool' = False) -> 'dict[int, bool] | bool'

        Get the 'error' level status for each validation step.

        The 'error' status for a validation step is `True` if the fraction of failing test units
        meets or exceeds the threshold for the 'error' level. Otherwise, the status is `False`.

        The ascribed name of 'error' is semantic and does not imply that the validation process
        is halted, it is simply a status indicator that could be used to trigger some action to be
        taken. Here's how it fits in with other status indicators:

        - 'warning': the status obtained by calling [`warning()`](`pointblank.Validate.warning`),
        least severe
        - 'error': the status obtained by calling `error()`, middle severity
        - 'critical': the status obtained by calling [`critical()`](`pointblank.Validate.critical`),
        most severe

        This method provides a dictionary of the 'error' status for each validation step. If the
        `scalar=True` argument is provided and `i=` is a scalar, the value is returned as a scalar
        instead of a dictionary.

        Parameters
        ----------
        i
            The validation step number(s) from which the 'error' status is obtained. Can be
            provided as a list of integers or a single integer. If `None`, all steps are included.
        scalar
            If `True` and `i=` is a scalar, return the value as a scalar instead of a dictionary.

        Returns
        -------
        dict[int, bool] | bool
            A dictionary of the 'error' status for each validation step or a scalar value.

        Examples
        --------
        In the example below, we'll use a simple Polars DataFrame with three columns (`a`, `b`, and
        `c`). There will be three validation steps, and the first step will have some failing test
        units, the rest will be completely passing. We've set thresholds here for each of the steps
        by using `thresholds=(2, 4, 5)`, which means:

        - the 'warning' threshold is `2` failing test units
        - the 'error' threshold is `4` failing test units
        - the 'critical' threshold is `5` failing test units

        After interrogation, the `error()` method is used to determine the 'error' status for each
        validation step.

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [3, 4, 9, 7, 2, 3, 8],
                "b": [9, 8, 10, 5, 10, 6, 2],
                "c": ["a", "b", "a", "a", "b", "b", "a"]
            }
        )

        validation = (
            pb.Validate(data=tbl, thresholds=(2, 4, 5))
            .col_vals_gt(columns="a", value=5)
            .col_vals_lt(columns="b", value=15)
            .col_vals_in_set(columns="c", set=["a", "b"])
            .interrogate()
        )

        validation.error()
        ```

        The returned dictionary provides the 'error' status for each validation step. The first step
        has a `True` value since the number of failing test units meets the threshold for the
        'error' level. The second and third steps have `False` values since the number of failing
        test units was `0`, which is below the threshold for the 'error' level.

        We can also visually inspect the 'error' status across all steps by viewing the validation
        table:

        ```python
        validation
        ```

        We can see that there are filled gray and yellow circles in the first step (far right side,
        in the `W` and `E` columns) indicating that the 'warning' and 'error' thresholds were met.
        The other steps have empty gray and yellow circles. This means that thresholds were 'set but
        not met' in those steps.

        If we wanted to check the 'error' status for a single validation step, we can provide the
        step number. Also, we could have the value returned as a scalar by setting `scalar=True`
        (ensuring that `i=` is a scalar).

        ```python
        validation.error(i=1)
        ```

        The returned value is `True`, indicating that the first validation step had the 'error'
        threshold met.
        

critical(self, i: 'int | list[int] | None' = None, scalar: 'bool' = False) -> 'dict[int, bool] | bool'

        Get the 'critical' level status for each validation step.

        The 'critical' status for a validation step is `True` if the fraction of failing test units
        meets or exceeds the threshold for the 'critical' level. Otherwise, the status is `False`.

        The ascribed name of 'critical' is semantic and is thus simply a status indicator that could
        be used to trigger some action to be take. Here's how it fits in with other status
        indicators:

        - 'warning': the status obtained by calling [`warning()`](`pointblank.Validate.warning`),
        least severe
        - 'error': the status obtained by calling [`error()`](`pointblank.Validate.error`), middle
        severity
        - 'critical': the status obtained by calling `critical()`, most severe

        This method provides a dictionary of the 'critical' status for each validation step. If the
        `scalar=True` argument is provided and `i=` is a scalar, the value is returned as a scalar
        instead of a dictionary.

        Parameters
        ----------
        i
            The validation step number(s) from which the 'critical' status is obtained. Can be
            provided as a list of integers or a single integer. If `None`, all steps are included.
        scalar
            If `True` and `i=` is a scalar, return the value as a scalar instead of a dictionary.

        Returns
        -------
        dict[int, bool] | bool
            A dictionary of the 'critical' status for each validation step or a scalar value.

        Examples
        --------
        In the example below, we'll use a simple Polars DataFrame with three columns (`a`, `b`, and
        `c`). There will be three validation steps, and the first step will have many failing test
        units, the rest will be completely passing. We've set thresholds here for each of the steps
        by using `thresholds=(2, 4, 5)`, which means:

        - the 'warning' threshold is `2` failing test units
        - the 'error' threshold is `4` failing test units
        - the 'critical' threshold is `5` failing test units

        After interrogation, the `critical()` method is used to determine the 'critical' status for
        each validation step.

        ```python
        import pointblank as pb
        import polars as pl

        tbl = pl.DataFrame(
            {
                "a": [2, 4, 4, 7, 2, 3, 8],
                "b": [9, 8, 10, 5, 10, 6, 2],
                "c": ["a", "b", "a", "a", "b", "b", "a"]
            }
        )

        validation = (
            pb.Validate(data=tbl, thresholds=(2, 4, 5))
            .col_vals_gt(columns="a", value=5)
            .col_vals_lt(columns="b", value=15)
            .col_vals_in_set(columns="c", set=["a", "b"])
            .interrogate()
        )

        validation.critical()
        ```

        The returned dictionary provides the 'critical' status for each validation step. The first
        step has a `True` value since the number of failing test units meets the threshold for the
        'critical' level. The second and third steps have `False` values since the number of failing
        test units was `0`, which is below the threshold for the 'critical' level.

        We can also visually inspect the 'critical' status across all steps by viewing the
        validation table:

        ```python
        validation
        ```

        We can see that there are filled gray, yellow, and red circles in the first step (far right
        side, in the `W`, `E`, and `C` columns) indicating that the 'warning', 'error', and
        'critical' thresholds were met. The other steps have empty gray, yellow, and red circles.
        This means that thresholds were 'set but not met' in those steps.

        If we wanted to check the 'critical' status for a single validation step, we can provide the
        step number. Also, we could have the value returned as a scalar by setting `scalar=True`
        (ensuring that `i=` is a scalar).

        ```python
        validation.critical(i=1)
        ```

        The returned value is `True`, indicating that the first validation step had the 'critical'
        threshold met.
        

## The Inspection and Assistance family

The *Inspection and Assistance* group contains functions that are helpful for
getting to grips on a new data table. Use the `DataScan` class to get a quick overview of the data,
`preview()` to see the first and last few rows of a table, `col_summary_tbl()` for a column-level
summary of a table, `missing_vals_tbl()` to see where there are missing values in a table, and
`get_column_count()`/`get_row_count()` to get the number of columns and rows in a table. Several
datasets included in the package can be accessed via the `load_dataset()` function. Finally, the
`config()` utility lets us set global configuration parameters. Want to chat with an assistant? Use
the `assistant()` function to get help with Pointblank.

DataScan(data: 'IntoFrameT', tbl_name: 'str | None' = None) -> 'None'

    Get a summary of a dataset.

    The `DataScan` class provides a way to get a summary of a dataset. The summary includes the
    following information:

    - the name of the table (if provided)
    - the type of the table (e.g., `"polars"`, `"pandas"`, etc.)
    - the number of rows and columns in the table
    - column-level information, including:
        - the column name
        - the column type
        - measures of missingness and distinctness
        - measures of negative, zero, and positive values (for numerical columns)
        - a sample of the data (the first 5 values)
        - statistics (if the column contains numbers, strings, or datetimes)

    To obtain a dictionary representation of the summary, you can use the `to_dict()` method. To
    get a JSON representation of the summary, you can use the `to_json()` method. To save the JSON
    text to a file, the `save_to_json()` method could be used.

    :::{.callout-warning}
    The `DataScan()` class is still experimental. Please report any issues you encounter in the
    [Pointblank issue tracker](https://github.com/posit-dev/pointblank/issues).
    :::

    Parameters
    ----------
    data
        The data to scan and summarize. This could be a DataFrame object, an Ibis table object,
        a CSV file path, a Parquet file path, a GitHub URL pointing to a CSV or Parquet file,
        or a database connection string.
    tbl_name
        Optionally, the name of the table could be provided as `tbl_name`.

    Measures of Missingness and Distinctness
    ----------------------------------------
    For each column, the following measures are provided:

    - `n_missing_values`: the number of missing values in the column
    - `f_missing_values`: the fraction of missing values in the column
    - `n_unique_values`: the number of unique values in the column
    - `f_unique_values`: the fraction of unique values in the column

    The fractions are calculated as the ratio of the measure to the total number of rows in the
    dataset.

    Counts and Fractions of Negative, Zero, and Positive Values
    -----------------------------------------------------------
    For numerical columns, the following measures are provided:

    - `n_negative_values`: the number of negative values in the column
    - `f_negative_values`: the fraction of negative values in the column
    - `n_zero_values`: the number of zero values in the column
    - `f_zero_values`: the fraction of zero values in the column
    - `n_positive_values`: the number of positive values in the column
    - `f_positive_values`: the fraction of positive values in the column

    The fractions are calculated as the ratio of the measure to the total number of rows in the
    dataset.

    Statistics for Numerical and String Columns
    -------------------------------------------
    For numerical and string columns, several statistical measures are provided. Please note that
    for string columms, the statistics are based on the lengths of the strings in the column.

    The following descriptive statistics are provided:

    - `mean`: the mean of the column
    - `std_dev`: the standard deviation of the column

    Additionally, the following quantiles are provided:

    - `min`: the minimum value in the column
    - `p05`: the 5th percentile of the column
    - `q_1`: the first quartile of the column
    - `med`: the median of the column
    - `q_3`: the third quartile of the column
    - `p95`: the 95th percentile of the column
    - `max`: the maximum value in the column
    - `iqr`: the interquartile range of the column

    Statistics for Date and Datetime Columns
    ----------------------------------------
    For date/datetime columns, the following statistics are provided:

    - `min`: the minimum date/datetime in the column
    - `max`: the maximum date/datetime in the column

    Returns
    -------
    DataScan
        A DataScan object.
    

preview(data: 'FrameT | Any', columns_subset: 'str | list[str] | Column | None' = None, n_head: 'int' = 5, n_tail: 'int' = 5, limit: 'int' = 50, show_row_numbers: 'bool' = True, max_col_width: 'int' = 250, min_tbl_width: 'int' = 500, incl_header: 'bool' = None) -> 'GT'

    Display a table preview that shows some rows from the top, some from the bottom.

    To get a quick look at the data in a table, we can use the `preview()` function to display a
    preview of the table. The function shows a subset of the rows from the start and end of the
    table, with the number of rows from the start and end determined by the `n_head=` and `n_tail=`
    parameters (set to `5` by default). This function works with any table that is supported by the
    `pointblank` library, including Pandas, Polars, and Ibis backend tables (e.g., DuckDB, MySQL,
    PostgreSQL, SQLite, Parquet, etc.).

    The view is optimized for readability, with column names and data types displayed in a compact
    format. The column widths are sized to fit the column names, dtypes, and column content up to
    a configurable maximum width of `max_col_width=` pixels. The table can be scrolled horizontally
    to view even very large datasets. Since the output is a Great Tables (`GT`) object, it can be
    further customized using the `great_tables` API.

    Parameters
    ----------
    data
        The table to preview, which could be a DataFrame object, an Ibis table object, a CSV
        file path, a Parquet file path, or a database connection string. When providing a CSV or
        Parquet file path (as a string or `pathlib.Path` object), the file will be automatically
        loaded using an available DataFrame library (Polars or Pandas). Parquet input also supports
        glob patterns, directories containing .parquet files, and Spark-style partitioned datasets.
        Connection strings enable direct database access via Ibis with optional table specification
        using the `::table_name` suffix. Read the *Supported Input Table Types* section for details
        on the supported table types.
    columns_subset
        The columns to display in the table, by default `None` (all columns are shown). This can
        be a string, a list of strings, a `Column` object, or a `ColumnSelector` object. The latter
        two options allow for more flexible column selection using column selector functions. Errors
        are raised if the column names provided don't match any columns in the table (when provided
        as a string or list of strings) or if column selector expressions don't resolve to any
        columns.
    n_head
        The number of rows to show from the start of the table. Set to `5` by default.
    n_tail
        The number of rows to show from the end of the table. Set to `5` by default.
    limit
        The limit value for the sum of `n_head=` and `n_tail=` (the total number of rows shown).
        If the sum of `n_head=` and `n_tail=` exceeds the limit, an error is raised. The default
        value is `50`.
    show_row_numbers
        Should row numbers be shown? The numbers shown reflect the row numbers of the head and tail
        in the input `data=` table. By default, this is set to `True`.
    max_col_width
        The maximum width of the columns (in pixels) before the text is truncated. The default value
        is `250` (`"250px"`).
    min_tbl_width
        The minimum width of the table in pixels. If the sum of the column widths is less than this
        value, the all columns are sized up to reach this minimum width value. The default value is
        `500` (`"500px"`).
    incl_header
        Should the table include a header with the table type and table dimensions? Set to `True` by
        default.

    Returns
    -------
    GT
        A GT object that displays the preview of the table.

    Supported Input Table Types
    ---------------------------
    The `data=` parameter can be given any of the following table types:

    - Polars DataFrame (`"polars"`)
    - Pandas DataFrame (`"pandas"`)
    - PySpark table (`"pyspark"`)
    - DuckDB table (`"duckdb"`)*
    - MySQL table (`"mysql"`)*
    - PostgreSQL table (`"postgresql"`)*
    - SQLite table (`"sqlite"`)*
    - Microsoft SQL Server table (`"mssql"`)*
    - Snowflake table (`"snowflake"`)*
    - Databricks table (`"databricks"`)*
    - BigQuery table (`"bigquery"`)*
    - Parquet table (`"parquet"`)*
    - CSV files (string path or `pathlib.Path` object with `.csv` extension)
    - Parquet files (string path, `pathlib.Path` object, glob pattern, directory with `.parquet`
    extension, or partitioned dataset)
    - Database connection strings (URI format with optional table specification)

    The table types marked with an asterisk need to be prepared as Ibis tables (with type of
    `ibis.expr.types.relations.Table`). Furthermore, using `preview()` with these types of tables
    requires the Ibis library (`v9.5.0` or above) to be installed. If the input table is a Polars or
    Pandas DataFrame, the availability of Ibis is not needed.

    To use a CSV file, ensure that a string or `pathlib.Path` object with a `.csv` extension is
    provided. The file will be automatically detected and loaded using the best available DataFrame
    library. The loading preference is Polars first, then Pandas as a fallback.

    Connection strings follow database URL formats and must also specify a table using the
    `::table_name` suffix. Examples include:

    ```
    "duckdb:///path/to/database.ddb::table_name"
    "sqlite:///path/to/database.db::table_name"
    "postgresql://user:password@localhost:5432/database::table_name"
    "mysql://user:password@localhost:3306/database::table_name"
    "bigquery://project/dataset::table_name"
    "snowflake://user:password@account/database/schema::table_name"
    ```

    When using connection strings, the Ibis library with the appropriate backend driver is required.

    Examples
    --------
    It's easy to preview a table using the `preview()` function. Here's an example using the
    `small_table` dataset (itself loaded using the [`load_dataset()`](`pointblank.load_dataset`)
    function):

    This table is a Polars DataFrame, but the `preview()` function works with any table supported
    by `pointblank`, including Pandas DataFrames and Ibis backend tables. Here's an example using
    a DuckDB table handled by Ibis:

    ```python
    small_table_duckdb = pb.load_dataset("small_table", tbl_type="duckdb")

    pb.preview(small_table_duckdb)
    ```

    The blue dividing line marks the end of the first `n_head=` rows and the start of the last
    `n_tail=` rows.

    We can adjust the number of rows shown from the start and end of the table by setting the
    `n_head=` and `n_tail=` parameters. Let's enlarge each of these to `10`:

    ```python
    pb.preview(small_table_polars, n_head=10, n_tail=10)
    ```

    In the above case, the entire dataset is shown since the sum of `n_head=` and `n_tail=` is
    greater than the number of rows in the table (which is 13).

    The `columns_subset=` parameter can be used to show only specific columns in the table. You can
    provide a list of column names to make the selection. Let's try that with the `"game_revenue"`
    dataset as a Pandas DataFrame:

    ```python
    game_revenue_pandas = pb.load_dataset("game_revenue", tbl_type="pandas")

    pb.preview(game_revenue_pandas, columns_subset=["player_id", "item_name", "item_revenue"])
    ```

    Alternatively, we can use column selector functions like
    [`starts_with()`](`pointblank.starts_with`) and [`matches()`](`pointblank.matches`)` to select
    columns based on text or patterns:

    ```python
    pb.preview(game_revenue_pandas, n_head=2, n_tail=2, columns_subset=pb.starts_with("session"))
    ```

    Multiple column selector functions can be combined within [`col()`](`pointblank.col`) using
    operators like `|` and `&`:

    ```python
    pb.preview(
      game_revenue_pandas,
      n_head=2,
      n_tail=2,
      columns_subset=pb.col(pb.starts_with("item") | pb.matches("player"))
    )
    ```

    ### Working with CSV Files

    The `preview()` function can directly accept CSV file paths, making it easy to preview data
    stored in CSV files without manual loading:

    You can also use a Path object to specify the CSV file:

    ### Working with Parquet Files

    The `preview()` function can directly accept Parquet files and datasets in various formats:

    You can also use glob patterns and directories:

    ```python
    # Multiple Parquet files with glob patterns
    pb.preview("data/sales_*.parquet")

    # Directory containing Parquet files
    pb.preview("parquet_data/")

    # Partitioned Parquet dataset
    pb.preview("sales_data/")  # Auto-discovers partition columns
    ```

    ### Working with Database Connection Strings

    The `preview()` function supports database connection strings for direct preview of database
    tables. Connection strings must specify a table using the `::table_name` suffix:

    For comprehensive documentation on supported connection string formats, error handling, and
    installation requirements, see the [`connect_to_table()`](`pointblank.connect_to_table`)
    function.
    

col_summary_tbl(data: 'FrameT | Any', tbl_name: 'str | None' = None) -> 'GT'

    Generate a column-level summary table of a dataset.

    The `col_summary_tbl()` function generates a summary table of a dataset, focusing on providing
    column-level information about the dataset. The summary includes the following information:

    - the type of the table (e.g., `"polars"`, `"pandas"`, etc.)
    - the number of rows and columns in the table
    - column-level information, including:
        - the column name
        - the column type
        - measures of missingness and distinctness
        - descriptive stats and quantiles
        - statistics for datetime columns

    The summary table is returned as a GT object, which can be displayed in a notebook or saved to
    an HTML file.

    :::{.callout-warning}
    The `col_summary_tbl()` function is still experimental. Please report any issues you encounter
    in the [Pointblank issue tracker](https://github.com/posit-dev/pointblank/issues).
    :::

    Parameters
    ----------
    data
        The table to summarize, which could be a DataFrame object, an Ibis table object, a CSV
        file path, a Parquet file path, or a database connection string. Read the *Supported Input
        Table Types* section for details on the supported table types.
    tbl_name
        Optionally, the name of the table could be provided as `tbl_name=`.

    Returns
    -------
    GT
        A GT object that displays the column-level summaries of the table.

    Supported Input Table Types
    ---------------------------
    The `data=` parameter can be given any of the following table types:

    - Polars DataFrame (`"polars"`)
    - Pandas DataFrame (`"pandas"`)
    - DuckDB table (`"duckdb"`)*
    - MySQL table (`"mysql"`)*
    - PostgreSQL table (`"postgresql"`)*
    - SQLite table (`"sqlite"`)*
    - Parquet table (`"parquet"`)*
    - CSV files (string path or `pathlib.Path` object with `.csv` extension)
    - Parquet files (string path, `pathlib.Path` object, glob pattern, directory with `.parquet`
    extension, or partitioned dataset)
    - GitHub URLs (direct links to CSV or Parquet files on GitHub)
    - Database connection strings (URI format with optional table specification)

    The table types marked with an asterisk need to be prepared as Ibis tables (with type of
    `ibis.expr.types.relations.Table`). Furthermore, using `col_summary_tbl()` with these types of
    tables requires the Ibis library (`v9.5.0` or above) to be installed. If the input table is a
    Polars or Pandas DataFrame, the availability of Ibis is not needed.

    Examples
    --------
    It's easy to get a column-level summary of a table using the `col_summary_tbl()` function.
    Here's an example using the `small_table` dataset (itself loaded using the
    [`load_dataset()`](`pointblank.load_dataset`) function):

    This table used above was a Polars DataFrame, but the `col_summary_tbl()` function works with
    any table supported by `pointblank`, including Pandas DataFrames and Ibis backend tables.
    Here's an example using a DuckDB table handled by Ibis:

    ```python
    nycflights = pb.load_dataset(dataset="nycflights", tbl_type="duckdb")

    pb.col_summary_tbl(data=nycflights, tbl_name="nycflights")
    ```
    

missing_vals_tbl(data: 'FrameT | Any') -> 'GT'

    Display a table that shows the missing values in the input table.

    The `missing_vals_tbl()` function generates a table that shows the missing values in the input
    table. The table is displayed using the Great Tables API, which allows for further customization
    of the table's appearance if so desired.

    Parameters
    ----------
    data
        The table for which to display the missing values. This could be a DataFrame object, an
        Ibis table object, a CSV file path, a Parquet file path, or a database connection string.
        Read the *Supported Input Table Types* section for details on the supported table types.

    Returns
    -------
    GT
        A GT object that displays the table of missing values in the input table.

    Supported Input Table Types
    ---------------------------
    The `data=` parameter can be given any of the following table types:

    - Polars DataFrame (`"polars"`)
    - Pandas DataFrame (`"pandas"`)
    - PySpark table (`"pyspark"`)
    - DuckDB table (`"duckdb"`)*
    - MySQL table (`"mysql"`)*
    - PostgreSQL table (`"postgresql"`)*
    - SQLite table (`"sqlite"`)*
    - Microsoft SQL Server table (`"mssql"`)*
    - Snowflake table (`"snowflake"`)*
    - Databricks table (`"databricks"`)*
    - BigQuery table (`"bigquery"`)*
    - Parquet table (`"parquet"`)*
    - CSV files (string path or `pathlib.Path` object with `.csv` extension)
    - Parquet files (string path, `pathlib.Path` object, glob pattern, directory with `.parquet`
    extension, or partitioned dataset)
    - Database connection strings (URI format with optional table specification)

    The table types marked with an asterisk need to be prepared as Ibis tables (with type of
    `ibis.expr.types.relations.Table`). Furthermore, using `missing_vals_tbl()` with these types of
    tables requires the Ibis library (`v9.5.0` or above) to be installed. If the input table is a
    Polars or Pandas DataFrame, the availability of Ibis is not needed.

    The Missing Values Table
    ------------------------
    The missing values table shows the proportion of missing values in each column of the input
    table. The table is divided into sectors, with each sector representing a range of rows in the
    table. The proportion of missing values in each sector is calculated for each column. The table
    is displayed using the Great Tables API, which allows for further customization of the table's
    appearance.

    To ensure that the table can scale to tables with many columns, each row in the reporting table
    represents a column in the input table. There are 10 sectors shown in the table, where the first
    sector represents the first 10% of the rows, the second sector represents the next 10% of the
    rows, and so on. Any sectors that are light blue indicate that there are no missing values in
    that sector. If there are missing values, the proportion of missing values is shown by a gray
    color (light gray for low proportions, dark gray to black for very high proportions).

    Examples
    --------
    The `missing_vals_tbl()` function is useful for quickly identifying columns with missing values
    in a table. Here's an example using the `nycflights` dataset (loaded as a Polars DataFrame using
    the [`load_dataset()`](`pointblank.load_dataset`) function):

    The table shows the proportion of missing values in each column of the `nycflights` dataset. The
    table is divided into sectors, with each sector representing a range of rows in the table (with
    around 34,000 rows per sector). The proportion of missing values in each sector is calculated
    for each column. The various shades of gray indicate the proportion of missing values in each
    sector. Many columns have no missing values at all, and those sectors are colored light blue.
    

assistant(model: 'str', data: 'FrameT | Any | None' = None, tbl_name: 'str | None' = None, api_key: 'str | None' = None, display: 'str | None' = None) -> 'None'

    Chat with the PbA (Pointblank Assistant) about your data validation needs.

    The `assistant()` function provides an interactive chat session with the PbA (Pointblank
    Assistant) to help you with your data validation needs. The PbA can help you with constructing
    validation plans, suggesting validation methods, and providing code snippets for using the
    Pointblank Python package. Feel free to ask the PbA about any aspect of the Pointblank package
    and it will do its best to assist you.

    The PbA can also help you with constructing validation plans for your data tables. If you
    provide a data table to the PbA, it will internally generate a JSON summary of the table and
    use that information to suggest validation methods that can be used with the Pointblank package.
    If using a Polars table as the data source, the PbA will be knowledgeable about the Polars API
    and can smartly suggest validation steps that use aggregate measures with up-to-date Polars
    methods.

    The PbA can be used with models from the following providers:

    - Anthropic
    - OpenAI
    - Ollama
    - Amazon Bedrock

    The PbA can be displayed in a browser (the default) or in the terminal. You can choose one or
    the other by setting the `display=` parameter to `"browser"` or `"terminal"`.

    :::{.callout-warning}
    The `assistant()` function is still experimental. Please report any issues you encounter in
    the [Pointblank issue tracker](https://github.com/posit-dev/pointblank/issues).
    :::

    Parameters
    ----------
    model
        The model to be used. This should be in the form of `provider:model` (e.g.,
        `"anthropic:claude-sonnet-4-5"`). Supported providers are `"anthropic"`, `"openai"`,
        `"ollama"`, and `"bedrock"`.
    data
        An optional data table to focus on during discussion with the PbA, which could be a
        DataFrame object, an Ibis table object, a CSV file path, a Parquet file path, or a database
        connection string. Read the *Supported Input Table Types* section for details on the
        supported table types.
    tbl_name : str, optional
        The name of the data table. This is optional and is only used to provide a more detailed
        prompt to the PbA.
    api_key : str, optional
        The API key to be used for the model.
    display : str, optional
        The display mode to use for the chat session. Supported values are `"browser"` and
        `"terminal"`. If not provided, the default value is `"browser"`.

    Returns
    -------
    None
        Nothing is returned. Rather, you get an an interactive chat session with the PbA, which is
        displayed in a browser or in the terminal.

    Constructing the `model` Argument
    ---------------------------------
    The `model=` argument should be constructed using the provider and model name separated by a
    colon (`provider:model`). The provider text can any of:

    - `"anthropic"` (Anthropic)
    - `"openai"` (OpenAI)
    - `"ollama"` (Ollama)
    - `"bedrock"` (Amazon Bedrock)

    The model name should be the specific model to be used from the provider. Model names are
    subject to change so consult the provider's documentation for the most up-to-date model names.

    Notes on Authentication
    -----------------------
    Providing a valid API key as a string in the `api_key` argument is adequate for getting started
    but you should consider using a more secure method for handling API keys.

    One way to do this is to load the API key from an environent variable and retrieve it using the
    `os` module (specifically the `os.getenv()` function). Places to store the API key might
    include `.bashrc`, `.bash_profile`, `.zshrc`, or `.zsh_profile`.

    Another solution is to store one or more model provider API keys in an `.env` file (in the root
    of your project). If the API keys have correct names (e.g., `ANTHROPIC_API_KEY` or
    `OPENAI_API_KEY`) then DraftValidation will automatically load the API key from the `.env` file
    and there's no need to provide the `api_key` argument. An `.env` file might look like this:

    ```plaintext
    ANTHROPIC_API_KEY="your_anthropic_api_key_here"
    OPENAI_API_KEY="your_openai_api_key_here"
    ```

    There's no need to have the `python-dotenv` package installed when using `.env` files in this
    way.

    Notes on Data Sent to the Model Provider
    ----------------------------------------
    If `data=` is provided then that data is sent to the model provider is a JSON summary of the
    table. This data summary is generated internally by use of the `DataScan` class. The summary
    includes the following information:

    - the number of rows and columns in the table
    - the type of dataset (e.g., Polars, DuckDB, Pandas, etc.)
    - the column names and their types
    - column level statistics such as the number of missing values, min, max, mean, and median, etc.
    - a short list of data values in each column

    The JSON summary is used to provide the model with the necessary information be knowledgable
    about the data table. Compared to the size of the entire table, the JSON summary is quite small
    and can be safely sent to the model provider.

    The Amazon Bedrock provider is a special case since it is a self-hosted model and security
    controls are in place to ensure that data is kept within the user's AWS environment. If using an
    Ollama model all data is handled locally.

    Supported Input Table Types
    ---------------------------
    The `data=` parameter can be given any of the following table types:

    - Polars DataFrame (`"polars"`)
    - Pandas DataFrame (`"pandas"`)
    - PySpark table (`"pyspark"`)
    - DuckDB table (`"duckdb"`)*
    - MySQL table (`"mysql"`)*
    - PostgreSQL table (`"postgresql"`)*
    - SQLite table (`"sqlite"`)*
    - Microsoft SQL Server table (`"mssql"`)*
    - Snowflake table (`"snowflake"`)*
    - Databricks table (`"databricks"`)*
    - BigQuery table (`"bigquery"`)*
    - Parquet table (`"parquet"`)*
    - CSV files (string path or `pathlib.Path` object with `.csv` extension)
    - Parquet files (string path, `pathlib.Path` object, glob pattern, directory with `.parquet`
    extension, or partitioned dataset)
    - Database connection strings (URI format with optional table specification)

    The table types marked with an asterisk need to be prepared as Ibis tables (with type of
    `ibis.expr.types.relations.Table`). Furthermore, using `assistant()` with these types of tables
    requires the Ibis library (`v9.5.0` or above) to be installed. If the input table is a Polars or
    Pandas DataFrame, the availability of Ibis is not needed.

    To use a CSV file, ensure that a string or `pathlib.Path` object with a `.csv` extension is
    provided. The file will be automatically detected and loaded using the best available DataFrame
    library. The loading preference is Polars first, then Pandas as a fallback.
    

load_dataset(dataset: "Literal['small_table', 'game_revenue', 'nycflights', 'global_sales']" = 'small_table', tbl_type: "Literal['polars', 'pandas', 'duckdb']" = 'polars') -> 'FrameT | Any'

    Load a dataset hosted in the library as specified table type.

    The Pointblank library includes several datasets that can be loaded using the `load_dataset()`
    function. The datasets can be loaded as a Polars DataFrame, a Pandas DataFrame, or as a DuckDB
    table (which uses the Ibis library backend). These datasets are used throughout the
    documentation's examples to demonstrate the functionality of the library. They're also useful
    for experimenting with the library and trying out different validation scenarios.

    Parameters
    ----------
    dataset
        The name of the dataset to load. Current options are `"small_table"`, `"game_revenue"`,
        `"nycflights"`, and `"global_sales"`.
    tbl_type
        The type of table to generate from the dataset. The named options are `"polars"`,
        `"pandas"`, and `"duckdb"`.

    Returns
    -------
    FrameT | Any
        The dataset for the `Validate` object. This could be a Polars DataFrame, a Pandas DataFrame,
        or a DuckDB table as an Ibis table.

    Included Datasets
    -----------------
    There are three included datasets that can be loaded using the `load_dataset()` function:

    - `"small_table"`: A small dataset with 13 rows and 8 columns. This dataset is useful for
    testing and demonstration purposes.
    - `"game_revenue"`: A dataset with 2000 rows and 11 columns. Provides revenue data for a game
    development company. For the particular game, there are records of player sessions, the items
    they purchased, ads viewed, and the revenue generated.
    - `"nycflights"`: A dataset with 336,776 rows and 18 columns. This dataset provides information
    about flights departing from New York City airports (JFK, LGA, or EWR) in 2013.
    - `"global_sales"`: A dataset with 50,000 rows and 20 columns. Provides information about
    global sales of products across different regions and countries.

    Supported DataFrame Types
    -------------------------
    The `tbl_type=` parameter can be set to one of the following:

    - `"polars"`: A Polars DataFrame.
    - `"pandas"`: A Pandas DataFrame.
    - `"duckdb"`: An Ibis table for a DuckDB database.

    Examples
    --------
    Load the `"small_table"` dataset as a Polars DataFrame by calling `load_dataset()` with
    `dataset="small_table"` and `tbl_type="polars"`:

    Note that the `"small_table"` dataset is a Polars DataFrame and using the
    [`preview()`](`pointblank.preview`) function will display the table in an HTML viewing
    environment.

    The `"game_revenue"` dataset can be loaded as a Pandas DataFrame by specifying the dataset name
    and setting `tbl_type="pandas"`:

    ```python
    game_revenue = pb.load_dataset(dataset="game_revenue", tbl_type="pandas")

    pb.preview(game_revenue)
    ```

    The `"game_revenue"` dataset is a more real-world dataset with a mix of data types, and it's
    significantly larger than the `small_table` dataset at 2000 rows and 11 columns.

    The `"nycflights"` dataset can be loaded as a DuckDB table by specifying the dataset name and
    setting `tbl_type="duckdb"`:

    ```python
    nycflights = pb.load_dataset(dataset="nycflights", tbl_type="duckdb")

    pb.preview(nycflights)
    ```

    The `"nycflights"` dataset is a large dataset with 336,776 rows and 18 columns. This dataset is
    truly a real-world dataset and provides information about flights originating from New York City
    airports in 2013.

    Finally, the `"global_sales"` dataset can be loaded as a Polars table by specifying the dataset
    name. Since `tbl_type=` is set to `"polars"` by default, we don't need to specify it:

    ```python
    global_sales = pb.load_dataset(dataset="global_sales")

    pb.preview(global_sales)
    ```

    The `"global_sales"` dataset is a large dataset with 50,000 rows and 20 columns. Each record
    describes the sales of a particular product to a customer located in one of three global
    regions: North America, Europe, or Asia.
    

get_data_path(dataset: "Literal['small_table', 'game_revenue', 'nycflights', 'global_sales']" = 'small_table', file_type: "Literal['csv', 'parquet', 'duckdb']" = 'csv') -> 'str'

    Get the file path to a dataset included with the Pointblank package.

    This function provides direct access to the file paths of datasets included with Pointblank.
    These paths can be used in examples and documentation to demonstrate file-based data loading
    without requiring the actual data files. The returned paths can be used with
    `Validate(data=path)` to demonstrate CSV and Parquet file loading capabilities.

    Parameters
    ----------
    dataset
        The name of the dataset to get the path for. Current options are `"small_table"`,
        `"game_revenue"`, `"nycflights"`, and `"global_sales"`.
    file_type
        The file format to get the path for. Options are `"csv"`, `"parquet"`, or `"duckdb"`.

    Returns
    -------
    str
        The file path to the requested dataset file.

    Included Datasets
    -----------------
    The available datasets are the same as those in [`load_dataset()`](`pointblank.load_dataset`):

    - `"small_table"`: A small dataset with 13 rows and 8 columns. Ideal for testing and examples.
    - `"game_revenue"`: A dataset with 2000 rows and 11 columns. Revenue data for a game company.
    - `"nycflights"`: A dataset with 336,776 rows and 18 columns. Flight data from NYC airports.
    - `"global_sales"`: A dataset with 50,000 rows and 20 columns. Global sales data across regions.

    File Types
    ----------
    Each dataset is available in multiple formats:

    - `"csv"`: Comma-separated values file (`.csv`)
    - `"parquet"`: Parquet file (`.parquet`)
    - `"duckdb"`: DuckDB database file (`.ddb`)

    Examples
    --------
    Get the path to a CSV file and use it with `Validate`:

    ```python
    import pointblank as pb

    # Get path to the small_table CSV file
    csv_path = pb.get_data_path("small_table", "csv")
    print(csv_path)

    # Use the path directly with Validate
    validation = (
        pb.Validate(data=csv_path)
        .col_exists(["a", "b", "c"])
        .col_vals_gt(columns="d", value=0)
        .interrogate()
    )

    validation
    ```

    Get a Parquet file path for validation examples:

    ```python
    # Get path to the game_revenue Parquet file
    parquet_path = pb.get_data_path(dataset="game_revenue", file_type="parquet")

    # Validate the Parquet file directly
    validation = (
        pb.Validate(data=parquet_path, label="Game Revenue Data Validation")
        .col_vals_not_null(columns=["player_id", "session_id"])
        .col_vals_gt(columns="item_revenue", value=0)
        .interrogate()
    )

    validation
    ```

    This is particularly useful for documentation examples where you want to demonstrate
    file-based workflows without requiring users to have specific data files:

    ```python
    # Example showing CSV file validation
    sales_csv = pb.get_data_path(dataset="global_sales", file_type="csv")

    validation = (
        pb.Validate(data=sales_csv, label="Sales Data Validation")
        .col_exists(["customer_id", "product_id", "amount"])
        .col_vals_regex(columns="customer_id", pattern=r"CUST_[0-9]{6}")
        .interrogate()
    )
    ```

    See Also
    --------
    [`load_dataset()`](`pointblank.load_dataset`) for loading datasets directly as table objects.
    

connect_to_table(connection_string: 'str') -> 'Any'

    Connect to a database table using a connection string.

    This utility function tests whether a connection string leads to a valid table and returns
    the table object if successful. It provides helpful error messages when no table is specified
    or when backend dependencies are missing.

    Parameters
    ----------
    connection_string
        A database connection string with a required table specification using the `::table_name`
        suffix. Supported formats are outlined in the *Supported Connection String Formats* section.

    Returns
    -------
    Any
        An Ibis table object for the specified database table.

    Supported Connection String Formats
    -----------------------------------
    The `connection_string` parameter must include a valid connection string with a table name
    specified using the `::` syntax. Here are some examples on how to format connection strings
    for various backends:

    ```
    DuckDB:     "duckdb:///path/to/database.ddb::table_name"
    SQLite:     "sqlite:///path/to/database.db::table_name"
    PostgreSQL: "postgresql://user:password@localhost:5432/database::table_name"
    MySQL:      "mysql://user:password@localhost:3306/database::table_name"
    BigQuery:   "bigquery://project/dataset::table_name"
    Snowflake:  "snowflake://user:password@account/database/schema::table_name"
    ```

    If the connection string does not include a table name, the function will attempt to connect to
    the database and list available tables, providing guidance on how to specify a table.

    Examples
    --------
    Connect to a DuckDB table:

    ```python
    import pointblank as pb

    # Get path to a DuckDB database file from package data
    duckdb_path = pb.get_data_path("game_revenue", "duckdb")

    # Connect to the `game_revenue` table in the DuckDB database
    game_revenue = pb.connect_to_table(f"duckdb:///{duckdb_path}::game_revenue")

    # Use with the `preview()` function
    pb.preview(game_revenue)
    ```

    Here are some backend-specific connection examples:

    ```python
    # PostgreSQL
    pg_table = pb.connect_to_table(
        "postgresql://user:password@localhost:5432/warehouse::customer_data"
    )

    # SQLite
    sqlite_table = pb.connect_to_table("sqlite:///local_data.db::products")

    # BigQuery
    bq_table = pb.connect_to_table("bigquery://my-project/analytics::daily_metrics")
    ```

    This function requires the Ibis library with appropriate backend drivers:

    ```bash
    # You can install a set of common backends:
    pip install 'ibis-framework[duckdb,postgres,mysql,sqlite]'

    # ...or specific backends as needed:
    pip install 'ibis-framework[duckdb]'    # for DuckDB
    pip install 'ibis-framework[postgres]'  # for PostgreSQL
    ```
    

## The YAML family

The *YAML* group contains functions that allow for the use of YAML to orchestrate
validation workflows. The `yaml_interrogate()` function can be used to run a validation workflow
from YAML strings or files. The `validate_yaml()` function checks if the YAML configuration passes
its own validity checks. The `yaml_to_python()` function converts YAML configuration to equivalent
Python code.

yaml_interrogate(yaml: 'Union[str, Path]', set_tbl: 'Union[FrameT, Any, None]' = None, namespaces: 'Optional[Union[Iterable[str], Mapping[str, str]]]' = None) -> 'Validate'
Execute a YAML-based validation workflow.

    This is the main entry point for YAML-based validation workflows. It takes YAML configuration
    (as a string or file path) and returns a validated `Validate` object with interrogation results.

    The YAML configuration defines the data source, validation steps, and optional settings like
    thresholds and labels. This function automatically loads the data, builds the validation plan,
    executes all validation steps, and returns the interrogated results.

    Parameters
    ----------
    yaml
        YAML configuration as string or file path. Can be: (1) a YAML string containing the
        validation configuration, or (2) a Path object or string path to a YAML file.
    set_tbl
        An optional table to override the table specified in the YAML configuration. This allows you
        to apply a YAML-defined validation workflow to a different table than what's specified in
        the configuration. If provided, this table will replace the table defined in the YAML's
        `tbl` field before executing the validation workflow. This can be any supported table type
        including DataFrame objects, Ibis table objects, CSV file paths, Parquet file paths, GitHub
        URLs, or database connection strings.
    namespaces
        Optional module namespaces to make available for Python code execution in YAML
        configurations. Can be a dictionary mapping aliases to module names or a list of module
        names. See the "Using Namespaces" section below for detailed examples.

    Returns
    -------
    Validate
        An instance of the `Validate` class that has been configured based on the YAML input. This
        object contains the results of the validation steps defined in the YAML configuration. It
        includes metadata like table name, label, language, and thresholds if specified.

    Raises
    ------
    YAMLValidationError
        If the YAML is invalid, malformed, or execution fails. This includes syntax errors, missing
        required fields, unknown validation methods, or data loading failures.

    Using Namespaces
    ----------------
    The `namespaces=` parameter enables custom Python modules and functions in YAML configurations.
    This is particularly useful for custom action functions and advanced Python expressions.

    **Namespace formats:**

    - Dictionary format: `{"alias": "module.name"}` maps aliases to module names
    - List format: `["module.name", "another.module"]` imports modules directly

    **Option 1: Inline expressions (no namespaces needed)**

    ```python
    import pointblank as pb

    # Simple inline custom action
    yaml_config = '''
    tbl: small_table
    thresholds:
      warning: 0.01
    actions:
      warning:
        python: "lambda: print('Custom warning triggered')"
    steps:
    - col_vals_gt:
        columns: [a]
        value: 1000
    '''

    result = pb.yaml_interrogate(yaml_config)
    result
    ```

    **Option 2: External functions with namespaces**

    ```python
    # Define a custom action function
    def my_custom_action():
        print("Data validation failed: please check your data.")

    # Add to current module for demo
    import sys
    sys.modules[__name__].my_custom_action = my_custom_action

    # YAML that references the external function
    yaml_config = '''
    tbl: small_table
    thresholds:
      warning: 0.01
    actions:
      warning:
        python: actions.my_custom_action
    steps:
    - col_vals_gt:
        columns: [a]
        value: 1000  # This will fail
    '''

    # Use namespaces to make the function available
    result = pb.yaml_interrogate(yaml_config, namespaces={'actions': '__main__'})
    result
    ```

    This approach enables modular, reusable validation workflows with custom business logic.

    Examples
    --------
    For the examples here, we'll use YAML configurations to define validation workflows. Let's start
    with a basic YAML workflow that validates the built-in `small_table` dataset.

    ```python
    import pointblank as pb

    # Define a basic YAML validation workflow
    yaml_config = '''
    tbl: small_table
    steps:
    - rows_distinct
    - col_exists:
        columns: [date, a, b]
    '''

    # Execute the validation workflow
    result = pb.yaml_interrogate(yaml_config)
    result
    ```

    The validation table shows the results of our YAML-defined workflow. We can see that the
    `rows_distinct()` validation failed (because there are duplicate rows in the table), while the
    column existence checks passed.

    Now let's create a more comprehensive validation workflow with thresholds and metadata:

    ```python
    # Advanced YAML configuration with thresholds and metadata
    yaml_config = '''
    tbl: small_table
    tbl_name: small_table_demo
    label: Comprehensive data validation
    thresholds:
      warning: 0.1
      error: 0.25
      critical: 0.35
    steps:
    - col_vals_gt:
        columns: [d]
        value: 100
    - col_vals_regex:
        columns: [b]
        pattern: '[0-9]-[a-z]{3}-[0-9]{3}'
    - col_vals_not_null:
        columns: [date, a]
    '''

    # Execute the validation workflow
    result = pb.yaml_interrogate(yaml_config)
    print(f"Table name: {result.tbl_name}")
    print(f"Label: {result.label}")
    print(f"Total validation steps: {len(result.validation_info)}")
    ```

    The validation results now include our custom table name and label. The thresholds we defined
    will determine when validation steps are marked as warnings, errors, or critical failures.

    You can also load YAML configurations from files. Here's how you would work with a YAML file:

    ```python
    from pathlib import Path
    import tempfile

    # Create a temporary YAML file for demonstration
    yaml_content = '''
    tbl: small_table
    tbl_name: File-based Validation
    steps:
    - col_vals_between:
        columns: [c]
        left: 1
        right: 10
    - col_vals_in_set:
        columns: [f]
        set: [low, mid, high]
    '''

    with tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) as f:
        f.write(yaml_content)
        yaml_file_path = Path(f.name)

    # Load and execute validation from file
    result = pb.yaml_interrogate(yaml_file_path)
    result
    ```

    This approach is particularly useful for storing validation configurations as part of your data
    pipeline or version control system, allowing you to maintain validation rules alongside your
    code.

    ### Using `set_tbl=` to Override the Table

    The `set_tbl=` parameter allows you to override the table specified in the YAML configuration.
    This is useful when you have a template validation workflow but want to apply it to different
    tables:

    ```python
    import polars as pl

    # Create a test table with similar structure to small_table
    test_table = pl.DataFrame({
        "date": ["2023-01-01", "2023-01-02", "2023-01-03"],
        "a": [1, 2, 3],
        "b": ["1-abc-123", "2-def-456", "3-ghi-789"],
        "d": [150, 200, 250]
    })

    # Use the same YAML config but apply it to our test table
    yaml_config = '''
    tbl: small_table  # This will be overridden
    tbl_name: Test Table  # This name will be used
    steps:
    - col_exists:
        columns: [date, a, b, d]
    - col_vals_gt:
        columns: [d]
        value: 100
    '''

    # Execute with table override
    result = pb.yaml_interrogate(yaml_config, set_tbl=test_table)
    print(f"Validation applied to: {result.tbl_name}")
    result
    ```

    This feature makes YAML configurations more reusable and flexible, allowing you to define
    validation logic once and apply it to multiple similar tables.
    

validate_yaml(yaml: 'Union[str, Path]') -> 'None'
Validate YAML configuration against the expected structure.

    This function validates that a YAML configuration conforms to the expected structure for
    validation workflows. It checks for required fields, proper data types, and valid
    validation method names. This is useful for validating configurations before execution or
    for building configuration editors and validators.

    The function performs comprehensive validation including:

    - required fields ('tbl' and 'steps')
    - proper data types for all fields
    - valid threshold configurations
    - known validation method names
    - proper step configuration structure

    Parameters
    ----------
    yaml
        YAML configuration as string or file path. Can be: (1) a YAML string containing the
        validation configuration, or (2) a Path object or string path to a YAML file.

    Raises
    ------
    YAMLValidationError
        If the YAML is invalid, malformed, or execution fails. This includes syntax errors,
        missing required fields, unknown validation methods, or data loading failures.

    Examples
    --------
    For the examples here, we'll demonstrate how to validate YAML configurations before using them
    with validation workflows. This is particularly useful for building robust data validation
    systems where you want to catch configuration errors early.

    Let's start with validating a basic configuration:

    ```python
    import pointblank as pb

    # Define a basic YAML validation configuration
    yaml_config = '''
    tbl: small_table
    steps:
    - rows_distinct
    - col_exists:
        columns: [a, b]
    '''

    # Validate the configuration: no exception means it's valid
    pb.validate_yaml(yaml_config)
    print("Basic YAML configuration is valid")
    ```

    The function completed without raising an exception, which means our configuration is valid and
    follows the expected structure.

    Now let's validate a more complex configuration with thresholds and metadata:

    ```python
    # Complex YAML configuration with all optional fields
    yaml_config = '''
    tbl: small_table
    tbl_name: My Dataset
    label: Quality check
    lang: en
    locale: en
    thresholds:
      warning: 0.1
      error: 0.25
      critical: 0.35
    steps:
    - rows_distinct
    - col_vals_gt:
        columns: [d]
        value: 100
    - col_vals_regex:
        columns: [b]
        pattern: '[0-9]-[a-z]{3}-[0-9]{3}'
    '''

    # Validate the configuration
    pb.validate_yaml(yaml_config)
    print("Complex YAML configuration is valid")

    # Count the validation steps
    import pointblank.yaml as pby
    config = pby.load_yaml_config(yaml_config)
    print(f"Configuration has {len(config['steps'])} validation steps")
    ```

    This configuration includes all the optional metadata fields and complex validation steps,
    demonstrating that the validation handles the full range of supported options.

    Let's see what happens when we try to validate an invalid configuration:

    ```python
    # Invalid YAML configuration: missing required 'tbl' field
    invalid_yaml = '''
    steps:
    - rows_distinct
    '''

    try:
        pb.validate_yaml(invalid_yaml)
    except pb.yaml.YAMLValidationError as e:
        print(f"Validation failed: {e}")
    ```

    The validation correctly identifies that our configuration is missing the required `'tbl'`
    field.

    Here's a practical example of using validation in a workflow builder:

    ```python
    def safe_yaml_interrogate(yaml_config):
        """Safely execute a YAML configuration after validation."""
        try:
            # Validate the YAML configuration first
            pb.validate_yaml(yaml_config)
            print("✓ YAML configuration is valid")

            # Then execute the workflow
            result = pb.yaml_interrogate(yaml_config)
            print(f"Validation completed with {len(result.validation_info)} steps")
            return result

        except pb.yaml.YAMLValidationError as e:
            print(f"Configuration error: {e}")
            return None

    # Test with a valid YAML configuration
    test_yaml = '''
    tbl: small_table
    steps:
    - col_vals_between:
        columns: [c]
        left: 1
        right: 10
    '''

    result = safe_yaml_interrogate(test_yaml)
    ```

    This pattern of validating before executing helps build more reliable data validation pipelines
    by catching configuration errors early in the process.

    Note that this function only validates the structure and does not check if the specified data
    source ('tbl') exists or is accessible. Data source validation occurs during execution with
    `yaml_interrogate()`.

    See Also
    --------
    yaml_interrogate : execute YAML-based validation workflows
    

yaml_to_python(yaml: 'Union[str, Path]') -> 'str'
Convert YAML validation configuration to equivalent Python code.

    This function takes a YAML validation configuration and generates the equivalent Python code
    that would produce the same validation workflow. This is useful for documentation, code
    generation, or learning how to translate YAML workflows into programmatic workflows.

    The generated Python code includes all necessary imports, data loading, validation steps,
    and interrogation execution, formatted as executable Python code.

    Parameters
    ----------
    yaml
        YAML configuration as string or file path. Can be: (1) a YAML string containing the
        validation configuration, or (2) a Path object or string path to a YAML file.

    Returns
    -------
    str
        A formatted Python code string enclosed in markdown code blocks that replicates the YAML
        workflow. The code includes import statements, data loading, validation method calls, and
        interrogation execution.

    Raises
    ------
    YAMLValidationError
        If the YAML is invalid, malformed, or contains unknown validation methods.

    Examples
    --------
    Convert a basic YAML configuration to Python code:

    ```python
    import pointblank as pb

    # Define a YAML validation workflow
    yaml_config = '''
    tbl: small_table
    tbl_name: Data Quality Check
    steps:
    - col_vals_not_null:
        columns: [a, b]
    - col_vals_gt:
        columns: [c]
        value: 0
    '''

    # Generate equivalent Python code
    python_code = pb.yaml_to_python(yaml_config)
    print(python_code)
    ```

    The generated Python code shows exactly how to replicate the YAML workflow programmatically.
    This is particularly useful when transitioning from YAML-based workflows to code-based
    workflows, or when generating documentation that shows both YAML and Python approaches.

    For more complex workflows with thresholds and metadata:

    ```python
    # Advanced YAML configuration
    yaml_config = '''
    tbl: small_table
    tbl_name: Advanced Validation
    label: Production data check
    thresholds:
      warning: 0.1
      error: 0.2
    steps:
    - col_vals_between:
        columns: [c]
        left: 1
        right: 10
    - col_vals_regex:
        columns: [b]
        pattern: '[0-9]-[a-z]{3}-[0-9]{3}'
    '''

    # Generate the equivalent Python code
    python_code = pb.yaml_to_python(yaml_config)
    print(python_code)
    ```

    The generated code includes all configuration parameters, thresholds, and maintains the exact
    same validation logic as the original YAML workflow.

    This function is also useful for educational purposes, helping users understand how YAML
    configurations map to the underlying Python API calls.
    

## The Utility Functions family

The Utility Functions group contains functions that are useful for accessing
metadata about the target data. Use `get_column_count()` or `get_row_count()` to get the number of
columns or rows in a table. The `get_action_metadata()` function is useful when building custom
actions since it returns metadata about the validation step that's triggering the action. Lastly,
the `config()` utility lets us set global configuration parameters.

get_column_count(data: 'FrameT | Any') -> 'int'

    Get the number of columns in a table.

    The `get_column_count()` function returns the number of columns in a table. The function works
    with any table that is supported by the `pointblank` library, including Pandas, Polars, and Ibis
    backend tables (e.g., DuckDB, MySQL, PostgreSQL, SQLite, Parquet, etc.). It also supports
    direct input of CSV files, Parquet files, and database connection strings.

    Parameters
    ----------
    data
        The table for which to get the column count, which could be a DataFrame object, an Ibis
        table object, a CSV file path, a Parquet file path, or a database connection string.
        Read the *Supported Input Table Types* section for details on the supported table types.

    Returns
    -------
    int
        The number of columns in the table.

    Supported Input Table Types
    ---------------------------
    The `data=` parameter can be given any of the following table types:

    - Polars DataFrame (`"polars"`)
    - Pandas DataFrame (`"pandas"`)
    - PySpark table (`"pyspark"`)
    - DuckDB table (`"duckdb"`)*
    - MySQL table (`"mysql"`)*
    - PostgreSQL table (`"postgresql"`)*
    - SQLite table (`"sqlite"`)*
    - Microsoft SQL Server table (`"mssql"`)*
    - Snowflake table (`"snowflake"`)*
    - Databricks table (`"databricks"`)*
    - BigQuery table (`"bigquery"`)*
    - Parquet table (`"parquet"`)*
    - CSV files (string path or `pathlib.Path` object with `.csv` extension)
    - Parquet files (string path, `pathlib.Path` object, glob pattern, directory with `.parquet`
    extension, or partitioned dataset)
    - Database connection strings (URI format with optional table specification)

    The table types marked with an asterisk need to be prepared as Ibis tables (with type of
    `ibis.expr.types.relations.Table`). Furthermore, using `get_column_count()` with these types of
    tables requires the Ibis library (`v9.5.0` or above) to be installed. If the input table is a
    Polars or Pandas DataFrame, the availability of Ibis is not needed.

    To use a CSV file, ensure that a string or `pathlib.Path` object with a `.csv` extension is
    provided. The file will be automatically detected and loaded using the best available DataFrame
    library. The loading preference is Polars first, then Pandas as a fallback.

    GitHub URLs pointing to CSV or Parquet files are automatically detected and converted to raw
    content URLs for downloading. The URL format should be:
    `https://github.com/user/repo/blob/branch/path/file.csv` or
    `https://github.com/user/repo/blob/branch/path/file.parquet`

    Connection strings follow database URL formats and must also specify a table using the
    `::table_name` suffix. Examples include:

    ```
    "duckdb:///path/to/database.ddb::table_name"
    "sqlite:///path/to/database.db::table_name"
    "postgresql://user:password@localhost:5432/database::table_name"
    "mysql://user:password@localhost:3306/database::table_name"
    "bigquery://project/dataset::table_name"
    "snowflake://user:password@account/database/schema::table_name"
    ```

    When using connection strings, the Ibis library with the appropriate backend driver is required.

    Examples
    --------
    To get the number of columns in a table, we can use the `get_column_count()` function. Here's an
    example using the `small_table` dataset (itself loaded using the
    [`load_dataset()`](`pointblank.load_dataset`) function):

    This table is a Polars DataFrame, but the `get_column_count()` function works with any table
    supported by `pointblank`, including Pandas DataFrames and Ibis backend tables. Here's an
    example using a DuckDB table handled by Ibis:

    ```python
    small_table_duckdb = pb.load_dataset("small_table", tbl_type="duckdb")

    pb.get_column_count(small_table_duckdb)
    ```

    #### Working with CSV Files

    The `get_column_count()` function can directly accept CSV file paths:

    #### Working with Parquet Files

    The function supports various Parquet input formats:

    You can also use glob patterns and directories:

    ```python
    # Multiple Parquet files with glob patterns
    pb.get_column_count("data/sales_*.parquet")

    # Directory containing Parquet files
    pb.get_column_count("parquet_data/")

    # Partitioned Parquet dataset
    pb.get_column_count("sales_data/")  # Auto-discovers partition columns
    ```

    #### Working with Database Connection Strings

    The function supports database connection strings for direct access to database tables:

    The function always returns the number of columns in the table as an integer value, which is
    `8` for the `small_table` dataset.
    

get_row_count(data: 'FrameT | Any') -> 'int'

    Get the number of rows in a table.

    The `get_row_count()` function returns the number of rows in a table. The function works with
    any table that is supported by the `pointblank` library, including Pandas, Polars, and Ibis
    backend tables (e.g., DuckDB, MySQL, PostgreSQL, SQLite, Parquet, etc.). It also supports
    direct input of CSV files, Parquet files, and database connection strings.

    Parameters
    ----------
    data
        The table for which to get the row count, which could be a DataFrame object, an Ibis table
        object, a CSV file path, a Parquet file path, or a database connection string.
        Read the *Supported Input Table Types* section for details on the supported table types.

    Returns
    -------
    int
        The number of rows in the table.

    Supported Input Table Types
    ---------------------------
    The `data=` parameter can be given any of the following table types:

    - Polars DataFrame (`"polars"`)
    - Pandas DataFrame (`"pandas"`)
    - PySpark table (`"pyspark"`)
    - DuckDB table (`"duckdb"`)*
    - MySQL table (`"mysql"`)*
    - PostgreSQL table (`"postgresql"`)*
    - SQLite table (`"sqlite"`)*
    - Microsoft SQL Server table (`"mssql"`)*
    - Snowflake table (`"snowflake"`)*
    - Databricks table (`"databricks"`)*
    - BigQuery table (`"bigquery"`)*
    - Parquet table (`"parquet"`)*
    - CSV files (string path or `pathlib.Path` object with `.csv` extension)
    - Parquet files (string path, `pathlib.Path` object, glob pattern, directory with `.parquet`
    extension, or partitioned dataset)
    - GitHub URLs (direct links to CSV or Parquet files on GitHub)
    - Database connection strings (URI format with optional table specification)

    The table types marked with an asterisk need to be prepared as Ibis tables (with type of
    `ibis.expr.types.relations.Table`). Furthermore, using `get_row_count()` with these types of
    tables requires the Ibis library (`v9.5.0` or above) to be installed. If the input table is a
    Polars or Pandas DataFrame, the availability of Ibis is not needed.

    To use a CSV file, ensure that a string or `pathlib.Path` object with a `.csv` extension is
    provided. The file will be automatically detected and loaded using the best available DataFrame
    library. The loading preference is Polars first, then Pandas as a fallback.

    GitHub URLs pointing to CSV or Parquet files are automatically detected and converted to raw
    content URLs for downloading. The URL format should be:
    `https://github.com/user/repo/blob/branch/path/file.csv` or
    `https://github.com/user/repo/blob/branch/path/file.parquet`

    Connection strings follow database URL formats and must also specify a table using the
    `::table_name` suffix. Examples include:

    ```
    "duckdb:///path/to/database.ddb::table_name"
    "sqlite:///path/to/database.db::table_name"
    "postgresql://user:password@localhost:5432/database::table_name"
    "mysql://user:password@localhost:3306/database::table_name"
    "bigquery://project/dataset::table_name"
    "snowflake://user:password@account/database/schema::table_name"
    ```

    When using connection strings, the Ibis library with the appropriate backend driver is required.

    Examples
    --------
    Getting the number of rows in a table is easily done by using the `get_row_count()` function.
    Here's an example using the `game_revenue` dataset (itself loaded using the
    [`load_dataset()`](`pointblank.load_dataset`) function):

    This table is a Polars DataFrame, but the `get_row_count()` function works with any table
    supported by `pointblank`, including Pandas DataFrames and Ibis backend tables. Here's an
    example using a DuckDB table handled by Ibis:

    ```python
    game_revenue_duckdb = pb.load_dataset("game_revenue", tbl_type="duckdb")

    pb.get_row_count(game_revenue_duckdb)
    ```

    #### Working with CSV Files

    The `get_row_count()` function can directly accept CSV file paths:

    #### Working with Parquet Files

    The function supports various Parquet input formats:

    You can also use glob patterns and directories:

    ```python
    # Multiple Parquet files with glob patterns
    pb.get_row_count("data/sales_*.parquet")

    # Directory containing Parquet files
    pb.get_row_count("parquet_data/")

    # Partitioned Parquet dataset
    pb.get_row_count("sales_data/")  # Auto-discovers partition columns
    ```

    #### Working with Database Connection Strings

    The function supports database connection strings for direct access to database tables:

    The function always returns the number of rows in the table as an integer value, which is `2000`
    for the `game_revenue` dataset.
    

get_action_metadata() -> 'dict | None'
Access step-level metadata when authoring custom actions.

    Get the metadata for the validation step where an action was triggered. This can be called by
    user functions to get the metadata for the current action. This function can only be used within
    callables crafted for the [`Actions`](`pointblank.Actions`) class.

    Returns
    -------
    dict | None
        A dictionary containing the metadata for the current step. If called outside of an action
        (i.e., when no action is being executed), this function will return `None`.

    Description of the Metadata Fields
    ----------------------------------
    The metadata dictionary contains the following fields for a given validation step:

    - `step`: The step number.
    - `column`: The column name.
    - `value`: The value being compared (only available in certain validation steps).
    - `type`: The assertion type (e.g., `"col_vals_gt"`, etc.).
    - `time`: The time the validation step was executed (in ISO format).
    - `level`: The severity level (`"warning"`, `"error"`, or `"critical"`).
    - `level_num`: The severity level as a numeric value (`30`, `40`, or `50`).
    - `autobrief`: A localized and brief statement of the expectation for the step.
    - `failure_text`: Localized text that explains how the validation step failed.

    Examples
    --------
    When creating a custom action, you can access the metadata for the current step using the
    `get_action_metadata()` function. Here's an example of a custom action that logs the metadata
    for the current step:

    ```python
    import pointblank as pb

    def log_issue():
        metadata = pb.get_action_metadata()
        print(f"Type: {metadata['type']}, Step: {metadata['step']}")

    validation = (
        pb.Validate(
            data=pb.load_dataset(dataset="game_revenue", tbl_type="duckdb"),
            thresholds=pb.Thresholds(warning=0.05, error=0.10, critical=0.15),
            actions=pb.Actions(warning=log_issue),
        )
        .col_vals_regex(columns="player_id", pattern=r"[A-Z]{12}[0-9]{3}")
        .col_vals_gt(columns="item_revenue", value=0.05)
        .col_vals_gt(
            columns="session_duration",
            value=15,
        )
        .interrogate()
    )

    validation
    ```

    Key pieces to note in the above example:

    - `log_issue()` (the custom action) collects `metadata` by calling `get_action_metadata()`
    - the `metadata` is a dictionary that is used to craft the log message
    - the action is passed as a bare function to the `Actions` object within the `Validate` object
    (placing it within `Validate(actions=)` ensures it's set as an action for every validation step)

    See Also
    --------
    Have a look at [`Actions`](`pointblank.Actions`) for more information on how to create custom
    actions for validation steps that exceed a set threshold value.
    

get_validation_summary() -> 'dict | None'
Access validation summary information when authoring final actions.

    This function provides a convenient way to access summary information about the validation
    process within a final action. It returns a dictionary with key metrics from the validation
    process. This function can only be used within callables crafted for the
    [`FinalActions`](`pointblank.FinalActions`) class.

    Returns
    -------
    dict | None
        A dictionary containing validation metrics. If called outside of an final action context,
        this function will return `None`.

    Description of the Summary Fields
    --------------------------------
    The summary dictionary contains the following fields:

    - `n_steps` (`int`): The total number of validation steps.
    - `n_passing_steps` (`int`): The number of validation steps where all test units passed.
    - `n_failing_steps` (`int`): The number of validation steps that had some failing test units.
    - `n_warning_steps` (`int`): The number of steps that exceeded a 'warning' threshold.
    - `n_error_steps` (`int`): The number of steps that exceeded an 'error' threshold.
    - `n_critical_steps` (`int`): The number of steps that exceeded a 'critical' threshold.
    - `list_passing_steps` (`list[int]`): List of step numbers where all test units passed.
    - `list_failing_steps` (`list[int]`): List of step numbers for steps having failing test units.
    - `dict_n` (`dict`): The number of test units for each validation step.
    - `dict_n_passed` (`dict`): The number of test units that passed for each validation step.
    - `dict_n_failed` (`dict`): The number of test units that failed for each validation step.
    - `dict_f_passed` (`dict`): The fraction of test units that passed for each validation step.
    - `dict_f_failed` (`dict`): The fraction of test units that failed for each validation step.
    - `dict_warning` (`dict`): The 'warning' level status for each validation step.
    - `dict_error` (`dict`): The 'error' level status for each validation step.
    - `dict_critical` (`dict`): The 'critical' level status for each validation step.
    - `all_passed` (`bool`): Whether or not every validation step had no failing test units.
    - `highest_severity` (`str`): The highest severity level encountered during validation. This can
      be one of the following: `"warning"`, `"error"`, or `"critical"`, `"some failing"`, or
      `"all passed"`.
    - `tbl_row_count` (`int`): The number of rows in the target table.
    - `tbl_column_count` (`int`): The number of columns in the target table.
    - `tbl_name` (`str`): The name of the target table.
    - `validation_duration` (`float`): The duration of the validation in seconds.

    Note that the summary dictionary is only available within the context of a final action. If
    called outside of a final action (i.e., when no final action is being executed), this function
    will return `None`.

    Examples
    --------
    Final actions are executed after the completion of all validation steps. They provide an
    opportunity to take appropriate actions based on the overall validation results. Here's an
    example of a final action function (`send_report()`) that sends an alert when critical
    validation failures are detected:

    ```python
    import pointblank as pb

    def send_report():
        summary = pb.get_validation_summary()
        if summary["highest_severity"] == "critical":
            # Send an alert email
            send_alert_email(
                subject=f"CRITICAL validation failures in {summary['tbl_name']}",
                body=f"{summary['n_critical_steps']} steps failed with critical severity."
            )

    validation = (
        pb.Validate(
            data=my_data,
            final_actions=pb.FinalActions(send_report)
        )
        .col_vals_gt(columns="revenue", value=0)
        .interrogate()
    )
    ```

    Note that `send_alert_email()` in the example above is a placeholder function that would be
    implemented by the user to send email alerts. This function is not provided by the Pointblank
    package.

    The `get_validation_summary()` function can also be used to create custom reporting for
    validation results:

    ```python
    def log_validation_results():
        summary = pb.get_validation_summary()

        print(f"Validation completed with status: {summary['highest_severity'].upper()}")
        print(f"Steps: {summary['n_steps']} total")
        print(f"  - {summary['n_passing_steps']} passing, {summary['n_failing_steps']} failing")
        print(
            f"  - Severity: {summary['n_warning_steps']} warnings, "
            f"{summary['n_error_steps']} errors, "
            f"{summary['n_critical_steps']} critical"
        )

        if summary['highest_severity'] in ["error", "critical"]:
            print("⚠️ Action required: Please review failing validation steps!")
    ```

    Final actions work well with both simple logging and more complex notification systems, allowing
    you to integrate validation results into your broader data quality workflows.

    See Also
    --------
    Have a look at [`FinalActions`](`pointblank.FinalActions`) for more information on how to create
    custom actions that are executed after all validation steps have been completed.
    

write_file(validation: 'Validate', filename: 'str', path: 'str | None' = None, keep_tbl: 'bool' = False, keep_extracts: 'bool' = False, quiet: 'bool' = False) -> 'None'

    Write a Validate object to disk as a serialized file.

    Writing a validation object to disk with `write_file()` can be useful for keeping data
    validation results close at hand for later retrieval (with `read_file()`). By default, any data
    table that the validation object holds will be removed before writing to disk (not applicable if
    no data table is present). This behavior can be changed by setting `keep_tbl=True`, but this
    only works when the table is not of a database type (e.g., DuckDB, PostgreSQL, etc.), as
    database connections cannot be serialized.

    Extract data from failing validation steps can also be preserved by setting
    `keep_extracts=True`, which is useful for later analysis of data quality issues.

    The serialized file uses Python's pickle format for storage of the validation object state,
    including all validation results, metadata, and optionally the source data.

    **Important note.** If your validation uses custom preprocessing functions (via the `pre=`
    parameter), these functions must be defined at the module level (not interactively or as lambda
    functions) to ensure they can be properly restored when loading the validation in a different
    Python session. Read the *Creating Serializable Validations* section below for more information.

    :::{.callout-warning}
    The `write_file()` function is currently experimental. Please report any issues you encounter in
    the [Pointblank issue tracker](https://github.com/posit-dev/pointblank/issues).
    :::

    Parameters
    ----------
    validation
        The `Validate` object to write to disk.
    filename
        The filename to create on disk for the validation object. Should not include the file
        extension as `.pkl` will be added automatically.
    path
        An optional directory path where the file should be saved. If not provided, the file will be
        saved in the current working directory. The directory will be created if it doesn't exist.
    keep_tbl
        An option to keep the data table that is associated with the validation object. The default
        is `False` where the data table is removed before writing to disk. For database tables
        (e.g., Ibis tables with database backends), the table is always removed even if
        `keep_tbl=True`, as database connections cannot be serialized.
    keep_extracts
        An option to keep any collected extract data for failing rows from validation steps. By
        default, this is `False` (i.e., extract data is removed to save space).
    quiet
        Should the function not inform when the file is written? By default, this is `False`, so a
        message will be printed when the file is successfully written.

    Returns
    -------
    None
        This function doesn't return anything but saves the validation object to disk.

    Creating Serializable Validations
    ---------------------------------
    To ensure your validations work reliably across different Python sessions, the recommended
    approach is to use module-Level functions. So, create a separate Python file for your
    preprocessing functions:

    ```python
    # preprocessing_functions.py
    import polars as pl

    def multiply_by_100(df):
        return df.with_columns(pl.col("value") * 100)

    def add_computed_column(df):
        return df.with_columns(computed=pl.col("value") * 2 + 10)
    ```

    Then import and use them in your validation:

    ```python
    # your_main_script.py
    import pointblank as pb
    from preprocessing_functions import multiply_by_100, add_computed_column

    validation = (
        pb.Validate(data=my_data)
        .col_vals_gt(columns="value", value=500, pre=multiply_by_100)
        .col_vals_between(columns="computed", left=50, right=1000, pre=add_computed_column)
        .interrogate()
    )

    # Save validation and it will work reliably across sessions
    pb.write_file(validation, "my_validation", keep_tbl=True)
    ```

    ### Problematic Patterns to Avoid

    Don't use lambda functions as they will cause immediate errors.

    Don't use interactive function definitions (as they may fail when loading).

    ```python
    def my_function(df):  # Defined in notebook/REPL
        return df.with_columns(pl.col("value") * 2)

    validation = pb.Validate(data).col_vals_gt(
        columns="value", value=100, pre=my_function
    )
    ```

    ### Automatic Analysis and Guidance

    When you call `write_file()`, it automatically analyzes your validation and provides:

    - confirmation when all functions will work reliably
    - warnings for functions that may cause cross-session issues
    - clear errors for unsupported patterns (lambda functions)
    - specific recommendations and code examples
    - loading instructions tailored to your validation

    ### Loading Your Validation

    To load a saved validation in a new Python session:

    ```python
    # In a new Python session
    import pointblank as pb

    # Import the same preprocessing functions used when creating the validation
    from preprocessing_functions import multiply_by_100, add_computed_column

    # Upon loading the validation, functions will be automatically restored
    validation = pb.read_file("my_validation.pkl")
    ```

    ** Testing Your Validation:**

    To verify your validation works across sessions:

    1. save your validation in one Python session
    2. start a fresh Python session (restart kernel/interpreter)
    3. import required preprocessing functions
    4. load the validation using `read_file()`
    5. test that preprocessing functions work as expected

    ### Performance and Storage

    - use `keep_tbl=False` (default) to reduce file size when you don't need the original data
    - use `keep_extracts=False` (default) to save space by excluding extract data
    - set `quiet=True` to suppress guidance messages in automated scripts
    - files are saved using pickle's highest protocol for optimal performance

    Examples
    --------
    Let's create a simple validation and save it to disk:

    ```python
    import pointblank as pb

    # Create a validation
    validation = (
        pb.Validate(data=pb.load_dataset("small_table"), label="My validation")
        .col_vals_gt(columns="d", value=100)
        .col_vals_regex(columns="b", pattern=r"[0-9]-[a-z]{3}-[0-9]{3}")
        .interrogate()
    )

    # Save to disk (without the original table data)
    pb.write_file(validation, "my_validation")
    ```

    To keep the original table data for later analysis:

    ```python
    # Save with the original table data included
    pb.write_file(validation, "my_validation_with_data", keep_tbl=True)
    ```

    You can also specify a custom directory and keep extract data:

    ```python
    pb.write_file(
        validation,
        filename="detailed_validation",
        path="/path/to/validations",
        keep_tbl=True,
        keep_extracts=True
    )
    ```

    ### Working with Preprocessing Functions

    For validations that use preprocessing functions to be portable across sessions, define your
    functions in a separate `.py` file:

    ```python
    # In `preprocessing_functions.py`

    import polars as pl

    def multiply_by_100(df):
        return df.with_columns(pl.col("value") * 100)

    def add_computed_column(df):
        return df.with_columns(computed=pl.col("value") * 2 + 10)
    ```

    Then import and use them in your validation:

    ```python
    # In your main script

    import pointblank as pb
    from preprocessing_functions import multiply_by_100, add_computed_column

    validation = (
        pb.Validate(data=my_data)
        .col_vals_gt(columns="value", value=500, pre=multiply_by_100)
        .col_vals_between(columns="computed", left=50, right=1000, pre=add_computed_column)
        .interrogate()
    )

    # This validation can now be saved and loaded reliably
    pb.write_file(validation, "my_validation", keep_tbl=True)
    ```

    When you load this validation in a new session, simply import the preprocessing functions
    again and they will be automatically restored.

    See Also
    --------
    Use the [`read_file()`](`pointblank.read_file`) function to load a validation object that was
    previously saved with `write_file()`.
    

read_file(filepath: 'str | Path') -> 'Validate'

    Read a Validate object from disk that was previously saved with `write_file()`.

    This function loads a validation object that was previously serialized to disk using the
    `write_file()` function. The validation object will be restored with all its validation results,
    metadata, and optionally the source data (if it was saved with `keep_tbl=True`).

    :::{.callout-warning}
    The `read_file()` function is currently experimental. Please report any issues you encounter in
    the [Pointblank issue tracker](https://github.com/posit-dev/pointblank/issues).
    :::

    Parameters
    ----------
    filepath
        The path to the saved validation file. Can be a string or Path object.

    Returns
    -------
    Validate
        The restored validation object with all its original state, validation results, and
        metadata.

    Examples
    --------
    Load a validation object that was previously saved:

    ```python
    import pointblank as pb

    # Load a validation object from disk
    validation = pb.read_file("my_validation.pkl")

    # View the validation results
    validation
    ```

    You can also load using just the filename (without extension):

    ```python
    # This will automatically look for "my_validation.pkl"
    validation = pb.read_file("my_validation")
    ```

    The loaded validation object retains all its functionality:

    ```python
    # Get validation summary
    summary = validation.get_json_report()

    # Get sundered data (if original table was saved)
    if validation.data is not None:
        failing_rows = validation.get_sundered_data(type="fail")
    ```

    See Also
    --------
    Use the [`write_file()`](`pointblank.Validate.write_file`) method to save a validation object
    to disk for later retrieval with this function.
    

config(report_incl_header: 'bool' = True, report_incl_footer: 'bool' = True, preview_incl_header: 'bool' = True) -> 'PointblankConfig'

    Configuration settings for the Pointblank library.

    Parameters
    ----------
    report_incl_header
        This controls whether the header should be present in the validation table report. The
        header contains the table name, label information, and might contain global failure
        threshold levels (if set).
    report_incl_footer
        Should the footer of the validation table report be displayed? The footer contains the
        starting and ending times of the interrogation.
    preview_incl_header
        Whether the header should be present in any preview table (generated via the
        [`preview()`](`pointblank.preview`) function).

    Returns
    -------
    PointblankConfig
        A `PointblankConfig` object with the specified configuration settings.
    

## The Prebuilt Actions family

The Prebuilt Actions group contains a function that can be used to
send a Slack notification when validation steps exceed failure threshold levels or just to provide a
summary of the validation results, including the status, number of steps, passing and failing steps,
table information, and timing details.

send_slack_notification(webhook_url: 'str | None' = None, step_msg: 'str | None' = None, summary_msg: 'str | None' = None, debug: 'bool' = False) -> 'Callable'

    Create a Slack notification function using a webhook URL.

    This function can be used in two ways:

    1. With [`Actions`](`pointblank.Actions`) to notify about individual validation step failures
    2. With [`FinalActions`](`pointblank.FinalActions`) to provide a summary notification after all
    validation steps have undergone interrogation

    The function creates a callable that sends notifications through a Slack webhook. Message
    formatting can be customized using templates for both individual steps and summary reports.

    Parameters
    ----------
    webhook_url
        The Slack webhook URL. If `None` (and `debug=True`), a dry run is performed (see the
        *Offline Testing* section below for information on this).
    step_msg
        Template string for step notifications. Some of the available variables include: `"{step}"`,
        `"{column}"`, `"{value}"`, `"{type}"`, `"{time}"`, `"{level}"`, etc. See the *Available
        Template Variables for Step Notifications* section below for more details. If not provided,
        a default step message template will be used.
    summary_msg
        Template string for summary notifications. Some of the available variables are:
        `"{n_steps}"`, `"{n_passing_steps}"`, `"{n_failing_steps}"`, `"{all_passed}"`,
        `"{highest_severity}"`, etc. See the *Available Template Variables for Summary
        Notifications* section below for more details. If not provided, a default summary message
        template will be used.
    debug
        Print debug information if `True`. This includes the message content and the response from
        Slack. This is useful for testing and debugging the notification function. If `webhook_url`
        is `None`, the function will print the message to the console instead of sending it to
        Slack. This is useful for debugging and ensuring that your templates are formatted
        correctly.

    Returns
    -------
    Callable
        A function that sends notifications to Slack.

    Available Template Variables for Step Notifications
    ---------------------------------------------------
    When creating a custom template for validation step alerts (`step_msg=`), the following
    templating strings can be used:

    - `"{step}"`: The step number.
    - `"{column}"`: The column name.
    - `"{value}"`: The value being compared (only available in certain validation steps).
    - `"{type}"`: The assertion type (e.g., `"col_vals_gt"`, etc.).
    - `"{level}"`: The severity level (`"warning"`, `"error"`, or `"critical"`).
    - `"{level_num}"`: The severity level as a numeric value (`30`, `40`, or `50`).
    - `"{autobrief}"`: A localized and brief statement of the expectation for the step.
    - `"{failure_text}"`: Localized text that explains how the validation step failed.
    - `"{time}"`: The time of the notification.

    Here's an example of how to construct a `step_msg=` template:

    ```python
    step_msg = '''🚨 *Validation Step Alert*
    • Step Number: {step}
    • Column: {column}
    • Test Type: {type}
    • Value Tested: {value}
    • Severity: {level} (level {level_num})
    • Brief: {autobrief}
    • Details: {failure_text}
    • Time: {time}'''
    ```

    This template will be filled with the relevant information when a validation step fails. The
    placeholders will be replaced with actual values when the Slack notification is sent.

    Available Template Variables for Summary Notifications
    ------------------------------------------------------
    When creating a custom template for a validation summary (`summary_msg=`), the following
    templating strings can be used:

    - `"{n_steps}"`: The total number of validation steps.
    - `"{n_passing_steps}"`: The number of validation steps where all test units passed.
    - `"{n_failing_steps}"`: The number of validation steps that had some failing test units.
    - `"{n_warning_steps}"`: The number of steps that exceeded a 'warning' threshold.
    - `"{n_error_steps}"`: The number of steps that exceeded an 'error' threshold.
    - `"{n_critical_steps}"`: The number of steps that exceeded a 'critical' threshold.
    - `"{all_passed}"`: Whether or not every validation step had no failing test units.
    - `"{highest_severity}"`: The highest severity level encountered during validation. This can be
    one of the following: `"warning"`, `"error"`, or `"critical"`, `"some failing"`, or
    `"all passed"`.
    - `"{tbl_row_count}"`: The number of rows in the target table.
    - `"{tbl_column_count}"`: The number of columns in the target table.
    - `"{tbl_name}"`: The name of the target table.
    - `"{validation_duration}"`: The duration of the validation in seconds.
    - `"{time}"`: The time of the notification.

    Here's an example of how to put together a `summary_msg=` template:

    ```python
    summary_msg = '''📊 *Validation Summary Report*
    *Overview*
    • Status: {highest_severity}
    • All Passed: {all_passed}
    • Total Steps: {n_steps}

    *Step Results*
    • Passing Steps: {n_passing_steps}
    • Failing Steps: {n_failing_steps}
    • Warning Level: {n_warning_steps}
    • Error Level: {n_error_steps}
    • Critical Level: {n_critical_steps}

    *Table Info*
    • Table Name: {tbl_name}
    • Row Count: {tbl_row_count}
    • Column Count: {tbl_column_count}

    *Timing*
    • Duration: {validation_duration}s
    • Completed: {time}'''
    ```

    This template will be filled with the relevant information when the validation summary is
    generated. The placeholders will be replaced with actual values when the Slack notification is
    sent.

    Offline Testing
    ---------------
    If you want to test the function without sending actual notifications, you can leave the
    `webhook_url=` as `None` and set `debug=True`. This will print the message to the console
    instead of sending it to Slack. This is useful for debugging and ensuring that your templates
    are formatted correctly. Furthermore, the function could be run globally (i.e., outside of the
    context of a validation plan) to show the message templates with all possible variables. Here's
    an example of how to do this:

    ```python
    import pointblank as pb

    # Create a Slack notification function
    notify_slack = pb.send_slack_notification(
        webhook_url=None,  # Leave as None for dry run
        debug=True,  # Enable debug mode to print message previews
    )
    # Call the function to see the message previews
    notify_slack()
    ```

    This will print the step and summary message previews to the console, allowing you to see how
    the templates will look when filled with actual data. You can then adjust your templates as
    needed before using them in a real validation plan.

    When `step_msg=` and `summary_msg=` are not provided, the function will use default templates.
    However, you can customize the templates to include additional information or change the format
    to better suit your needs. Iterating on the templates can help you create more informative and
    visually appealing messages. Here's an example of that:

    ```python
    import pointblank as pb

    # Create a Slack notification function with custom templates
    notify_slack = pb.send_slack_notification(
        webhook_url=None, # Leave as None for dry run
        step_msg='''*Data Validation Alert*
        • Type: {type}
        • Level: {level}
        • Step: {step}
        • Column: {column}
        • Time: {time}''',
        summary_msg='''*Data Validation Summary*
        • Highest Severity: {highest_severity}
        • Total Steps: {n_steps}
        • Failed Steps: {n_failing_steps}
        • Time: {time}''',
        debug=True,  # Enable debug mode to print message previews
    )
    ```

    These templates will be used with sample data when the function is called. The combination of
    `webhook_url=None` and `debug=True` allows you to test your custom templates without having to
    send actual notifications to Slack.

    Examples
    --------
    When using an action with one or more validation steps, you typically provide callables that
    fire when a matched threshold of failed test units is exceeded. The callable can be
    a function or a lambda. The `send_slack_notification()` function creates a callable that sends
    a Slack notification when the validation step fails. Here is how it can be set up to work for
    multiple validation steps by using of [Actions](`pointblank.Actions`):

    ```python
    import pointblank as pb

    # Create a Slack notification function
    notify_slack = pb.send_slack_notification(
        webhook_url="https://hooks.slack.com/services/your/webhook/url"
    )
    # Create a validation plan
    validation = (
        pb.Validate(
            data=pb.load_dataset(dataset="game_revenue", tbl_type="duckdb"),
            thresholds=pb.Thresholds(warning=0.05, error=0.10, critical=0.15),
            actions=pb.Actions(critical=notify_slack),
        )
        .col_vals_regex(columns="player_id", pattern=r"[A-Z]{12}[0-9]{3}")
        .col_vals_gt(columns="item_revenue", value=0.05)
        .col_vals_gt(columns="session_duration", value=15)
        .interrogate()
    )

    validation
    ```

    By placing the `notify_slack()` function in the `Validate(actions=Actions(critical=))` argument,
    you can ensure that the notification is sent whenever the 'critical' threshold is reached (as
    set here, when 15% or more of the test units fail). The notification will include information
    about the validation step that triggered the alert.

    When using a [`FinalActions`](`pointblank.FinalActions`) object, the notification will be sent
    after all validation steps have been completed. This is useful for providing a summary of the
    validation process. Here is an example of how to set up a summary notification:

    ```python
    import pointblank as pb

    # Create a Slack notification function
    notify_slack = pb.send_slack_notification(
        webhook_url="https://hooks.slack.com/services/your/webhook/url"
    )
    # Create a validation plan
    validation = (
        pb.Validate(
            data=pb.load_dataset(dataset="game_revenue", tbl_type="duckdb"),
            thresholds=pb.Thresholds(warning=0.05, error=0.10, critical=0.15),
            final_actions=pb.FinalActions(notify_slack),
        )
        .col_vals_regex(columns="player_id", pattern=r"[A-Z]{12}[0-9]{3}")
        .col_vals_gt(columns="item_revenue", value=0.05)
        .col_vals_gt(columns="session_duration", value=15)
        .interrogate()
    )
    ```

    In this case, the same `notify_slack()` function is used, but it is placed in
    `Validate(final_actions=FinalActions())`. This results in the summary notification being sent
    after all validation steps are completed, regardless of whether any steps failed or not.

    This simplicity is possible because the `send_slack_notification()` function creates a callable
    that can be used in both contexts. The function will automatically determine whether to send a
    step notification or a summary notification based on the context in which it is called.

    We can customize the message templates for both step and summary notifications. In that way,
    it's possible to create a more informative and visually appealing message. For example, we can
    use Markdown formatting to make the message more readable and visually appealing. Here is an
    example of how to customize the templates:

    ```python
    import pointblank as pb
    # Create a Slack notification function

    notify_slack = pb.send_slack_notification(
        webhook_url="https://hooks.slack.com/services/your/webhook/url",
        step_msg='''
        🚨 *Validation Step Alert*
        • Step Number: {step}
        • Column: {column}
        • Test Type: {type}
        • Value Tested: {value}
        • Severity: {level} (level {level_num})
        • Brief: {autobrief}
        • Details: {failure_text}
        • Time: {time}''',
        summary_msg='''
        📊 *Validation Summary Report*
        *Overview*
        • Status: {highest_severity}
        • All Passed: {all_passed}
        • Total Steps: {n_steps}

        *Step Results*
        • Passing Steps: {n_passing_steps}
        • Failing Steps: {n_failing_steps}
        • Warning Level: {n_warning_steps}
        • Error Level: {n_error_steps}
        • Critical Level: {n_critical_steps}

        *Table Info*
        • Table Name: {tbl_name}
        • Row Count: {tbl_row_count}
        • Column Count: {tbl_column_count}

        *Timing*
        • Duration: {validation_duration}s
        • Completed: {time}''',
    )

    # Create a validation plan
    validation = (
        pb.Validate(
            data=pb.load_dataset(dataset="game_revenue", tbl_type="duckdb"),
            thresholds=pb.Thresholds(warning=0.05, error=0.10, critical=0.15),
            actions=pb.Actions(default=notify_slack),
            final_actions=pb.FinalActions(notify_slack),
        )
        .col_vals_regex(columns="player_id", pattern=r"[A-Z]{12}[0-9]{3}")
        .col_vals_gt(columns="item_revenue", value=0.05)
        .col_vals_gt(columns="session_duration", value=15)
        .interrogate()
    )
    ```

    In this example, we have customized the templates for both step and summary notifications. The
    step notification includes details about the validation step, including the step number, column
    name, test type, value tested, severity level, brief description, and time of the notification.
    The summary notification includes an overview of the validation process, including the status,
    number of steps, passing and failing steps, table information, and timing details.
    

----------------------------------------------------------------------
This is a set of examples for the Pointblank library.
----------------------------------------------------------------------

### Starter Validation (https://posit-dev.github.io/pointblank/demos/01-starter/)

A validation with the basics

```python
import pointblank as pb

validation = (
    pb.Validate( # Use pb.Validate to start
        data=pb.load_dataset(dataset="small_table", tbl_type="polars"),
        tbl_name="small_table",
        label="A starter validation"
    )
    .col_vals_gt(columns="d", value=1000)       # STEP 1 |
    .col_vals_le(columns="c", value=5)          # STEP 2 | <-- Build up a validation plan
    .col_exists(columns=["date", "date_time"])  # STEP 3 |
    .interrogate()  # This will execute all validation steps and collect intel
)

validation
```

### Advanced Validation (https://posit-dev.github.io/pointblank/demos/02-advanced/)

A validation with a comprehensive set of rules

```python
import pointblank as pb
import polars as pl

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="game_revenue", tbl_type="polars"),
        tbl_name="game_revenue",
        label="Comprehensive validation example",
        thresholds=pb.Thresholds(warning=0.10, error=0.25, critical=0.35),
    )
    .col_vals_regex(columns="player_id", pattern=r"^[A-Z]{12}[0-9]{3}$")        # STEP 1
    .col_vals_gt(columns="session_duration", value=5)                           # STEP 2
    .col_vals_ge(columns="item_revenue", value=0.02)                            # STEP 3
    .col_vals_in_set(columns="item_type", set=["iap", "ad"])                    # STEP 4
    .col_vals_in_set(                                                           # STEP 5
        columns="acquisition",
        set=["google", "facebook", "organic", "crosspromo", "other_campaign"]
    )
    .col_vals_not_in_set(columns="country", set=["Mongolia", "Germany"])        # STEP 6
    .col_vals_between(                                                          # STEP 7
        columns="session_duration",
        left=10, right=50,
        pre = lambda df: df.select(pl.median("session_duration"))
    )
    .rows_distinct(columns_subset=["player_id", "session_id", "time"])          # STEP 8
    .row_count_match(count=2000)                                                # STEP 9
    .col_count_match(count=11)                                                  # STEP 10
    .col_vals_not_null(columns=pb.starts_with("item"))                          # STEPS 11-13
    .col_exists(columns="start_day")                                            # STEP 14
    .interrogate()
)

validation
```

### Data Extracts (https://posit-dev.github.io/pointblank/demos/03-data-extracts/)

Pulling out data extracts that highlight rows with validation failures

```python
import pointblank as pb

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="game_revenue"),
        tbl_name="game_revenue",
        label="Validation with test unit failures available as an extract"
    )
    .col_vals_gt(columns="item_revenue", value=0)      # STEP 1: no test unit failures
    .col_vals_ge(columns="session_duration", value=5)  # STEP 2: 14 test unit failures -> extract
    .interrogate()
)
```

```python
pb.preview(validation.get_data_extracts(i=2, frame=True), n_head=20, n_tail=20)
```

### Sundered Data (https://posit-dev.github.io/pointblank/demos/04-sundered-data/)

Splitting your data into 'pass' and 'fail' subsets

```python
import pointblank as pb
import polars as pl

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="pandas"),
        tbl_name="small_table",
        label="Sundering Data"
    )
    .col_vals_gt(columns="d", value=1000)
    .col_vals_le(columns="c", value=5)
    .interrogate()
)

validation
```

```python
pb.preview(validation.get_sundered_data(type="pass"))
```

### Step Report: Column Data Checks (https://posit-dev.github.io/pointblank/demos/05-step-report-column-check/)

A step report for column checks shows what went wrong

```python
import pointblank as pb

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table"),
        tbl_name="small_table",
        label="Step reports for column data checks"
    )
    .col_vals_ge(columns="c", value=4, na_pass=True)                # has failing test units
    .col_vals_regex(columns="b", pattern=r"\d-[a-z]{3}-\d{3}")      # no failing test units
    .interrogate()
)

validation
```

```python
validation.get_step_report(i=1)
```

```python
validation.get_step_report(i=2)
```

### Step Report: Schema Check (https://posit-dev.github.io/pointblank/demos/06-step-report-schema-check/)

When a schema doesn't match, a step report gives you the details

```python
import pointblank as pb

# Create a schema for the target table (`small_table` as a DuckDB table)
schema = pb.Schema(
    columns=[
        ("date_time", "timestamp"),     # this dtype doesn't match
        ("dates", "date"),              # this column name doesn't match
        ("a", "int64"),
        ("b",),                         # omit dtype to not check for it
        ("c",),                         # ""   ""   ""  ""
        ("d", "float64"),
        ("e", ["bool", "boolean"]),     # try several dtypes (second one matches)
        ("f", "str"),                   # this dtype doesn't match
    ]
)

# Use the `col_schema_match()` validation method to perform a schema check
validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="duckdb"),
        tbl_name="small_table",
        label="Step report for a schema check"
    )
    .col_schema_match(schema=schema)
    .interrogate()
)

validation
```

```python
validation.get_step_report(i=1)
```

### Apply Validation Rules to Multiple Columns (https://posit-dev.github.io/pointblank/demos/apply-checks-to-several-columns/)

Create multiple validation steps by using a list of column names with `columns=`

```python
import pointblank as pb

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars")
    )
    .col_vals_ge(columns=["a", "c", "d"], value=0)   # check values in 'a', 'c', and 'd'
    .col_exists(columns=["date_time", "date"])       # check for the existence of two columns
    .interrogate()
)

validation
```

### Verifying Row and Column Counts (https://posit-dev.github.io/pointblank/demos/check-row-column-counts/)

Check the dimensions of the table with the `*_count_match()` validation methods

```python
import pointblank as pb

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="game_revenue", tbl_type="duckdb")
    )
    .col_count_match(count=11)                       # expect 11 columns in the table
    .row_count_match(count=2000)                     # expect 2,000 rows in the table
    .row_count_match(count=0, inverse=True)          # expect that the table has rows
    .col_count_match(                                # compare column count against
        count=pb.load_dataset(                       # that of another table
            dataset="game_revenue", tbl_type="pandas"
        )
    )
    .interrogate()
)

validation
```

### Checks for Missing Values (https://posit-dev.github.io/pointblank/demos/checks-for-missing/)

Perform validations that check whether missing/NA/Null values are present

```python
import pointblank as pb

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars")
    )
    .col_vals_not_null(columns="a")                  # expect no Null values
    .col_vals_not_null(columns="b")                  # "" ""
    .col_vals_not_null(columns="c")                  # "" ""
    .col_vals_not_null(columns="d")                  # "" ""
    .col_vals_null(columns="a")                      # expect all values to be Null
    .interrogate()
)

validation
```

### Custom Expression for Checking Column Values (https://posit-dev.github.io/pointblank/demos/col-vals-custom-expr/)

A column expression can be used to check column values. Just use `col_vals_expr()` for this

```python
import pointblank as pb

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="pandas")
    )
    .col_vals_expr(expr=lambda df: (df["d"] % 1 != 0) & (df["a"] < 10))  # Pandas column expr
    .interrogate()
)

validation
```

### Column Selector Functions: Easily Pick Columns (https://posit-dev.github.io/pointblank/demos/column-selector-functions/)

Use column selector functions in the `columns=` argument to conveniently choose columns

```python
import pointblank as pb
import narwhals.selectors as ncs

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="game_revenue", tbl_type="polars")
    )
    .col_vals_ge(
        columns=pb.matches("rev|dur"),  # check values in columns having 'rev' or 'dur' in name
        value=0
    )
    .col_vals_regex(
        columns=pb.ends_with("_id"),    # check values in columns with names ending in '_id'
        pattern=r"^[A-Z]{12}\d{3}"
    )
    .col_vals_not_null(
        columns=pb.last_n(2)            # check that the last two columns don't have Null values
    )
    .col_vals_regex(
        columns=ncs.string(),           # check that all string columns are non-empty strings
        pattern=r"(.|\s)*\S(.|\s)*"
    )
    .interrogate()
)

validation
```

### Comparison Checks Across Columns (https://posit-dev.github.io/pointblank/demos/comparisons-across-columns/)

Perform comparisons of values in columns to values in other columns

```python
import pointblank as pb

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars")
    )
    .col_vals_lt(columns="a", value=pb.col("c"))     # values in 'a' > values in 'c'
    .col_vals_between(
        columns="d",                                 # values in 'd' are between values
        left=pb.col("c"),                            # in 'c' and the fixed value of 12,000;
        right=12000,                                 # any missing values encountered result
        na_pass=True                                 # in a passing test unit
    )
    .interrogate()
)

validation
```

### Expect No Duplicate Rows (https://posit-dev.github.io/pointblank/demos/expect-no-duplicate-rows/)

We can check for duplicate rows in the table with `rows_distinct()`

```python
import pointblank as pb

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars")
    )
    .rows_distinct()    # expect no duplicate rows
    .interrogate()
)

validation
```

### Checking for Duplicate Values (https://posit-dev.github.io/pointblank/demos/expect-no-duplicate-values/)

To check for duplicate values down a column, use `rows_distinct()` with a `columns_subset=` value

```python
import pointblank as pb

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars")
    )
    .rows_distinct(columns_subset="b")   # expect no duplicate values in 'b'
    .interrogate()
)

validation
```

### Expectations with a Text Pattern (https://posit-dev.github.io/pointblank/demos/expect-text-pattern/)

With the `col_vals_regex()`, check for conformance to a regular expression

```python
import pointblank as pb

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars")
    )
    .col_vals_regex(columns="b", pattern=r"^\d-[a-z]{3}-\d{3}$")  # check pattern in 'b'
    .col_vals_regex(columns="f", pattern=r"high|low|mid")         # check pattern in 'f'
    .interrogate()
)

validation
```

### Set Failure Threshold Levels (https://posit-dev.github.io/pointblank/demos/failure-thresholds/)

Set threshold levels to better gauge adverse data quality

```python
import pointblank as pb

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="game_revenue", tbl_type="duckdb"),
        thresholds=pb.Thresholds(  # setting relative threshold defaults for all steps
            warning=0.05,          # 5% failing test units: warning threshold (gray)
            error=0.10,            # 10% failed test units: error threshold (yellow)
            critical=0.15          # 15% failed test units: critical threshold (red)
        ),
    )
    .col_vals_in_set(columns="item_type", set=["iap", "ad"])
    .col_vals_regex(columns="player_id", pattern=r"[A-Z]{12}\d{3}")
    .col_vals_gt(columns="item_revenue", value=0.05)
    .col_vals_gt(
        columns="session_duration",
        value=4,
        thresholds=(5, 10, 20)     # setting absolute thresholds for *this* step (W, E, C)
    )
    .col_exists(columns="end_day")
    .interrogate()
)

validation
```

### Mutate the Table in a Validation Step (https://posit-dev.github.io/pointblank/demos/mutate-table-in-step/)

For far more specialized validations, modify the table with the `pre=` argument before checking it

```python
import pointblank as pb
import polars as pl
import narwhals as nw

# Define preprocessing functions
def get_median_a(df):
    """Use a Polars expression to aggregate column `a`."""
    return df.select(pl.median("a"))

def add_b_length_column(df):
    """Use Narwhals to add a string length column `b_len`."""
    return (
        nw.from_native(df)
        .with_columns(b_len=nw.col("b").str.len_chars())
    )

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars")
    )
    .col_vals_between(
        columns="a",
        left=3, right=6,
        pre=get_median_a
    )
    .col_vals_eq(
        columns="b_len",
        value=9,
        pre=add_b_length_column
    )
    .interrogate()
)

validation
```

### Numeric Comparisons (https://posit-dev.github.io/pointblank/demos/numeric-comparisons/)

Perform comparisons of values in columns to fixed values

```python
import pointblank as pb

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars")
    )
    .col_vals_gt(columns="d", value=1000)            # values in 'd' > 1000
    .col_vals_lt(columns="d", value=10000)           # values in 'd' < 10000
    .col_vals_ge(columns="a", value=1)               # values in 'a' >= 1
    .col_vals_le(columns="c", value=5)               # values in 'c' <= 5
    .col_vals_ne(columns="a", value=7)               # values in 'a' not equal to 7
    .col_vals_between(columns="c", left=0, right=15) # 0 <= 'c' values <= 15
    .interrogate()
)

validation
```

### Check the Schema of a Table (https://posit-dev.github.io/pointblank/demos/schema-check/)

The schema of a table can be flexibly defined with `Schema` and verified with `col_schema_match()`

```python
import pointblank as pb
import polars as pl

tbl = pl.DataFrame(
    {
        "a": ["apple", "banana", "cherry", "date"],
        "b": [1, 6, 3, 5],
        "c": [1.1, 2.2, 3.3, 4.4],
    }
)

# Use the Schema class to define the column schema as loosely or rigorously as required
schema = pb.Schema(
    columns=[
        ("a", "String"),          # Column 'a' has dtype 'String'
        ("b", ["Int", "Int64"]),  # Column 'b' has dtype 'Int' or 'Int64'
        ("c", )                   # Column 'c' follows 'b' but we don't specify a dtype here
    ]
)

# Use the `col_schema_match()` validation method to perform the schema check
validation = (
    pb.Validate(data=tbl)
    .col_schema_match(schema=schema)
    .interrogate()
)

validation
```

### Set Membership (https://posit-dev.github.io/pointblank/demos/set-membership/)

Perform validations that check whether values are part of a set (or *not* part of one)

```python
import pointblank as pb

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="small_table", tbl_type="polars")
    )
    .col_vals_in_set(columns="f", set=["low", "mid", "high"])    # part of this set
    .col_vals_not_in_set(columns="f", set=["zero", "infinity"])  # not part of this set
    .interrogate()
)

validation
```

### Using Parquet Data (https://posit-dev.github.io/pointblank/demos/using-parquet-data/)

A Parquet dataset can be used for data validation, thanks to Ibis

```python
import pointblank as pb
import ibis

game_revenue = ibis.read_parquet("data/game_revenue.parquet")

validation = (
    pb.Validate(data=game_revenue, label="Example using a Parquet dataset.")
    .col_vals_lt(columns="item_revenue", value=200)
    .col_vals_gt(columns="item_revenue", value=0)
    .col_vals_gt(columns="session_duration", value=5)
    .col_vals_in_set(columns="item_type", set=["iap", "ad"])
    .col_vals_regex(columns="player_id", pattern=r"[A-Z]{12}\d{3}")
    .interrogate()
)

validation
```