querychat: Chat with Shiny apps (Python)

Imagine typing questions like these directly into your Shiny dashboard, and seeing the results in realtime:

querychat is a drop-in component for Shiny that allows users to query a data frame using natural language. The results are available as a reactive data frame, so they can be easily used from Shiny outputs, reactive expressions, downloads, etc.

This is not as terrible an idea as you might think! We need to be very careful when bringing LLMs into data analysis, as we all know that they are prone to hallucinations and other classes of errors. querychat is designed to excel in reliability, transparency, and reproducibility by using this one technique: denying it raw access to the data, and forcing it to write SQL queries instead. See the section below on “How it works” for more.

Installation

pip install querychat

How to use

First, you’ll need access to an LLM that supports tools/function calling. querychat uses chatlas to interface with various providers.

Here’s a very minimal example that shows the three function calls you need to make:

import chatlas
from seaborn import load_dataset
from shiny import App, render, ui

import querychat as qc

titanic = load_dataset("titanic")

# 1. Configure querychat.
#    This is where you specify the dataset and can also
#    override options like the greeting message, system prompt, model, etc.


def use_github_models(system_prompt: str) -> chatlas.Chat:
    # GitHub models give us free rate-limited access to the latest LLMs
    # you will need to have GITHUB_PAT defined in your environment
    return chatlas.ChatGithub(
        model="gpt-4.1",
        system_prompt=system_prompt,
    )


querychat_config = qc.init(
    data_source=titanic,
    table_name="titanic",
    create_chat_callback=use_github_models,
)

# Create UI
app_ui = ui.page_sidebar(
    # 2. Use qc.sidebar(id) in a ui.page_sidebar.
    #    Alternatively, use qc.ui(id) elsewhere if you don't want your
    #    chat interface to live in a sidebar.
    qc.sidebar("chat"),
    ui.output_data_frame("data_table"),
)


# Define server logic
def server(input, output, session):
    # 3. Create a querychat object using the config from step 1.
    chat = qc.server("chat", querychat_config)

    # 4. Use the filtered/sorted data frame anywhere you wish, via the
    #    chat.df() reactive.
    @render.data_frame
    def data_table():
        return chat.df()


# Create Shiny app
app = App(app_ui, server)
GitHub Models and GitHub Personal Access Tokens

This example does not use the default OpenAI model directly from OpenAI, which would require you to create an OpenAI API key and save it as an environment variable named OPENAI_API_KEY. Instead we are using GitHub Models as a free way to access the latest LLMs, with a rate-limit. You can follow the instructions on the GitHub Docs or Axure AI Demo on creating a PAT.

We suggest you save your PAT into 2 environment variables: GITHUB_TOKEN, and GITHUB_PAT.

How it works

Powered by LLMs

querychat’s natural language chat experience is powered by LLMs. You may use any model that chatlas supports that has the ability to do tool calls, but we currently recommend (as of March 2025):

  • GPT-4o
  • Claude 3.5 Sonnet
  • Claude 3.7 Sonnet

In our testing, we’ve found that those models strike a good balance between accuracy and latency. Smaller models like GPT-4o-mini are fine for simple queries but make surprising mistakes with moderately complex ones; and reasoning models like o3-mini slow down responses without providing meaningfully better results.

The small open source models (8B and below) we’ve tested have fared extremely poorly. Sorry. 🤷

Powered by SQL

querychat does not have direct access to the raw data; it can only read or filter the data by writing SQL SELECT statements. This is crucial for ensuring relability, transparency, and reproducibility:

  • Reliability: Today’s LLMs are excellent at writing SQL, but bad at direct calculation.
  • Transparency: querychat always displays the SQL to the user, so it can be vetted instead of blindly trusted.
  • Reproducibility: The SQL query can be easily copied and reused.

Currently, querychat uses DuckDB for its SQL engine. It’s extremely fast and has a surprising number of statistical functions.

Customizing querychat

Use a different LLM provider

By default, querychat uses GPT-4o via the OpenAI API. If you want to use a different model, you can provide a create_chat_callback function that takes a system_prompt parameter, and returns a chatlas Chat object:

import chatlas
from functools import partial

# Option 1: Define a function
def my_chat_func(system_prompt: str) -> chatlas.Chat:
    return chatlas.ChatAnthropic(
        model="claude-3-7-sonnet-latest",
        system_prompt=system_prompt
    )

# Option 2: Use partial
my_chat_func = partial(chatlas.ChatAnthropic, model="claude-3-7-sonnet-latest")

querychat_config = querychat.init(
    titanic,
    "titanic",
    create_chat_callback=my_chat_func
)

This would use Claude 3.7 Sonnet instead, which would require you to provide an API key. See the chatlas documentation for more information on how to authenticate with different providers.

Complete example

For a complete working example, see the examples/app-dataframe.py file in the repository. This example includes:

  • Loading a dataset
  • Reading greeting and data description from files
  • Setting up the querychat configuration
  • Creating a Shiny UI with the chat sidebar
  • Displaying the filtered data in the main panel

If you have Shiny installed, and want to get started right away, you can use our querychat template or sidebot template.