assistant

assistant(model, data=None, tbl_name=None, api_key=None, display=None)

Chat with the PbA (Pointblank Assistant) about your data validation needs.

The assistant() function provides an interactive chat session with the PbA (Pointblank Assistant) to help you with your data validation needs. The PbA can help you with constructing validation plans, suggesting validation methods, and providing code snippets for using the Pointblank Python package. Feel free to ask the PbA about any aspect of the Pointblank package and it will do its best to assist you.

The PbA can also help you with constructing validation plans for your data tables. If you provide a data table to the PbA, it will internally generate a JSON summary of the table and use that information to suggest validation methods that can be used with the Pointblank package. If using a Polars table as the data source, the PbA will be knowledgeable about the Polars API and can smartly suggest validation steps that use aggregate measures with up-to-date Polars methods.

The PbA can be used with models from the following providers:

  • Anthropic
  • OpenAI
  • Ollama
  • Amazon Bedrock

The PbA can be displayed in a browser (the default) or in the terminal. You can choose one or the other by setting the display= parameter to "browser" or "terminal".

Warning

The assistant() function is still experimental. Please report any issues you encounter in the Pointblank issue tracker.

Parameters

model : str

The model to be used. This should be in the form of provider:model (e.g., "anthropic:claude-3-5-sonnet-latest"). Supported providers are "anthropic", "openai", "ollama", and "bedrock".

data : FrameT | Any | None = None

An optional data table to focus on during discussion with the PbA, which could be a DataFrame object or an Ibis table object. Read the Supported Input Table Types section for details on the supported table types.

tbl_name : str = None

The name of the data table. This is optional and is only used to provide a more detailed prompt to the PbA.

api_key : str = None

The API key to be used for the model.

display : str = None

The display mode to use for the chat session. Supported values are "browser" and "terminal". If not provided, the default value is "browser".

Returns

: None

Nothing is returned. Rather, you get an an interactive chat session with the PbA, which is displayed in a browser or in the terminal.

Constructing the model Argument

The model= argument should be constructed using the provider and model name separated by a colon (provider:model). The provider text can any of:

  • "anthropic" (Anthropic)
  • "openai" (OpenAI)
  • "ollama" (Ollama)
  • "bedrock" (Amazon Bedrock)

The model name should be the specific model to be used from the provider. Model names are subject to change so consult the provider’s documentation for the most up-to-date model names.

Notes on Authentication

Providing a valid API key as a string in the api_key argument is adequate for getting started but you should consider using a more secure method for handling API keys.

One way to do this is to load the API key from an environent variable and retrieve it using the os module (specifically the os.getenv() function). Places to store the API key might include .bashrc, .bash_profile, .zshrc, or .zsh_profile.

Another solution is to store one or more model provider API keys in an .env file (in the root of your project). If the API keys have correct names (e.g., ANTHROPIC_API_KEY or OPENAI_API_KEY) then DraftValidation will automatically load the API key from the .env file and there’s no need to provide the api_key argument. An .env file might look like this:

ANTHROPIC_API_KEY="your_anthropic_api_key_here"
OPENAI_API_KEY="your_openai_api_key_here"

There’s no need to have the python-dotenv package installed when using .env files in this way.

Notes on Data Sent to the Model Provider

If data= is provided then that data is sent to the model provider is a JSON summary of the table. This data summary is generated internally by use of the DataScan class. The summary includes the following information:

  • the number of rows and columns in the table
  • the type of dataset (e.g., Polars, DuckDB, Pandas, etc.)
  • the column names and their types
  • column level statistics such as the number of missing values, min, max, mean, and median, etc.
  • a short list of data values in each column

The JSON summary is used to provide the model with the necessary information be knowledgable about the data table. Compared to the size of the entire table, the JSON summary is quite small and can be safely sent to the model provider.

The Amazon Bedrock provider is a special case since it is a self-hosted model and security controls are in place to ensure that data is kept within the user’s AWS environment. If using an Ollama model all data is handled locally.

Supported Input Table Types

The data= parameter can be given any of the following table types:

  • Polars DataFrame ("polars")
  • Pandas DataFrame ("pandas")
  • DuckDB table ("duckdb")*
  • MySQL table ("mysql")*
  • PostgreSQL table ("postgresql")*
  • SQLite table ("sqlite")*
  • Parquet table ("parquet")*

The table types marked with an asterisk need to be prepared as Ibis tables (with type of ibis.expr.types.relations.Table). Furthermore, using preview() with these types of tables requires the Ibis library (v9.5.0 or above) to be installed. If the input table is a Polars or Pandas DataFrame, the availability of Ibis is not needed.