Provide context

querychat automatically gathers information about your table to help the LLM write accurate SQL queries. This includes column names and types, numerical ranges, and categorical value examples.1

Importantly, we are not sending your raw data to the LLM and asking it to do complicated math. The LLM only needs to understand the structure and schema of your data in order to write SQL queries.

You can get even better results by customizing the system prompt in three ways:

  1. Add a data description to provide more context about what the data represents
  2. Add custom instructions to guide the LLM’s behavior
  3. Use a fully custom prompt template if you want complete control (useful if you want to be certain the model cannot see any literal values from your data)

Default prompt

For full visibility into the system prompt that querychat generates for the LLM, you can inspect the system_prompt property. This is useful for debugging and understanding exactly what context the LLM is using:

from querychat import QueryChat
from querychat.data import titanic

qc = QueryChat(titanic(), "titanic")
print(qc.system_prompt)

By default, the system prompt contains the following components:

  1. The basic set of behaviors and guidelines the LLM must follow in order for querychat to work properly, including how to use tools to execute queries and update the app.
  2. The SQL schema of the data frame you provided. This includes:
    • Column names
    • Data types (integer, float, boolean, datetime, text)
    • For text columns with less than 10 unique values, we assume they are categorical variables and include the list of values
    • For integer and float columns, we include the range
  3. A data description (if provided via data_description)
  4. Additional instructions you want to use to guide querychat’s behavior (if provided via extra_instructions).

Data description

If your column names are descriptive, querychat may already work well without additional context. However, if your columns are named x, V1, value, etc., you should provide a data description. Use the data_description parameter for this:

titanic-app.py
from pathlib import Path
from querychat import QueryChat

qc = QueryChat(
    titanic,
    "titanic",
    data_description=Path("data_description.md")
)
app = qc.app()

querychat doesn’t need this information in any particular format – just provide what a human would find helpful:

data_description.md
This dataset contains information about Titanic passengers, collected for predicting survival.

- survived: Survival (0 = No, 1 = Yes)
- pclass: Ticket class (1 = 1st, 2 = 2nd, 3 = 3rd)
- sex: Sex of passenger
- age: Age in years
- sibsp: Number of siblings/spouses aboard
- parch: Number of parents/children aboard
- fare: Passenger fare
- embarked: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)

Additional instructions

You can add custom instructions to guide the LLM’s behavior using the extra_instructions parameter:

qc = QueryChat(
    titanic,
    "titanic",
    extra_instructions=Path("instructions.md")
)

Or as a string:

instructions = """
- Use British spelling conventions
- Stay on topic and only discuss the data dashboard
- Refuse to answer unrelated questions
"""

qc = QueryChat(titanic, "titanic", extra_instructions=instructions)
Warning

LLMs may not always follow your instructions perfectly. Test extensively when changing instructions or models.

Custom template

If you want more control over the system prompt, you can provide a custom prompt template using the prompt_template parameter. This is for more advanced users who want to fully customize the LLM’s behavior. See the API reference for details on the available template variables.

Footnotes

  1. All of this information is provided to the LLM as part of the system prompt – a string of text containing instructions and context for the LLM to consider when responding to user queries.↩︎