Introduction

querychat website banner image

Explore data using natural language queries

PyPI MIT License versions Python Tests

Querychat makes it easy to explore data with natural language through the power of Shiny and large language models (LLMs). Start chatting with your data in just one line of code. Or, with a few more lines, design your own rich user experience around data exploration and analysis through natural language.

Installation

Install the latest stable release from PyPI:

pip install querychat

Quick start

The main entry point is the QueryChat class. It requires a data source (e.g., pandas, polars, etc) and a name for the data. It also accepts optional parameters to customize the behavior, such as the client model. The quickest way to start chatting is to call the .app() method, which returns a Shiny app object.

titanic-app.py
from querychat import QueryChat
from querychat.data import titanic

qc = QueryChat(titanic(), "titanic", client="openai/gpt-4.1")
app = qc.app()

With the above code saved to titanic-app.py and an API key set1, you can run the app from a terminal (or VSCode):

export OPENAI_API_KEY="your_api_key_here"
shiny run --reload titanic-app.py

Once running, you’ll notice 3 main views:

  1. A sidebar chat with suggestions on where to start exploring.
  2. A data table that updates to reflect filtering and sorting queries.
  3. The SQL query behind the data table, for transparency and reproducibility.

Screenshot of querychat's app with the titanic dataset.

Suppose we pick a suggestion like “Show me passengers who survived”. Since this is a filtering operation, both the data table and SQL query update accordingly.

Screenshot of the querychat's app with the titanic dataset filtered to passengers who survived.

Querychat can also handle more general questions about the data that require calculations and aggregations. For example, we can ask “What is the average age of passengers who survived?”. In this case, querychat will generate/execute the SQL query to perform the relevant calculation, and return the result in the chat:

Screenshot of the querychat's app with a summary statistic inlined in the chat.

As you’ll learn later in Build an app, you can also access the SQL query and filtered/sorted data frame programmatically for use elsewhere in your app. This makes it rather seemless to have natural language interaction with your data alongside other visualizations and analyses.

Before we build though, let’s take a moment to better understand how querychat works under the hood, and whether it’s right for you.

How it works

Querychat leverages LLMs incredible capability to translate natural language into SQL queries. Frontier models are shockingly good at this task, but even the best models still need to know the overall data structure to perform well. For this reason, querychat supplies a system prompt with the schema of the data (i.e., column names, types, ranges, etc), but never the raw data itself.

When the LLM generates a SQL query, querychat executes it against a SQL database (DuckDB2 by default) to get results in a safe, reliable, and verifiable manner. In short, this execution is safe since only SELECT statements are allowed, reliable since the database engine handles all calculations, and verifiable since the user can always see the SQL query that was run. This makes querychat a trustworthy tool for data exploration, as every action taken by the LLM is transparent and independently reproducible.

Data privacy

See the Provide context and Tools articles to learn more about what information is provided to the LLM and what it’s capable of doing with code execution.

Bespoke interfaces

While the quickstart app is a great way to get started, querychat is designed to be highly extensible. You can not only customize the underlying model and data source, but also build fully custom Shiny apps around the core chat functionality.

For a motivating example, consider the following (sidebot) app that leverages querychat’s tooling to create reactive summaries and visualizations based on the user’s natural language queries:

Screenshot of sidebot, a custom shiny app built with querychat.

Next steps

From here, you might want to learn more about:

  • Models: customize the LLM behind querychat.
  • Data sources: different data sources you can use with querychat.
  • Provide context: provide the LLM with the context it needs to work well.
  • Build an app: design a custom Shiny app around querychat.

Footnotes

  1. By default, Querychat uses OpenAI to power the chat experience. So, for this example to work, you’ll need an OpenAI API key. See the Models page for details on how to set up credentials for other model providers.↩︎

  2. Duckdb is extremely fast and has a surprising number of statistical functions.↩︎