Data Sources
querychat supports several different data sources, including:
- Any narwhals-compatible data frame.
- Polars LazyFrames for efficient handling of large datasets.
- Ibis Tables for lazy evaluation across many database backends.
- Any SQLAlchemy database.
- A custom DataSource interface/protocol.
The sections below describe how to use each type of data source with querychat.
Data frames
You can use any narwhals-compatible data frame as a data source in querychat. This includes popular data frame libraries like pandas, polars, pyarrow, and many more.
pandas-app.py
import pandas as pd
from querychat import QueryChat
mtcars = pd.read_csv(
"https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
)
qc = QueryChat(mtcars, "mtcars")
app = qc.app()polars-app.py
import polars as pl
from querychat import QueryChat
mtcars = pl.read_csv(
"https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
)
qc = QueryChat(mtcars, "mtcars")
app = qc.app()pyarrow-app.py
import pyarrow as pa
import pyarrow.csv as pv
from querychat import QueryChat
mtcars = pv.read_csv(
"https://gist.githubusercontent.com/seankross/a412dfbd88b3db70b74b/raw/5f23f993cd87c283ce766e7ac6b329ee7cc2e1d1/mtcars.csv"
).to_table()
qc = QueryChat(mtcars, "mtcars")
app = qc.app()If you’re building an app, note you can read the queried data frame reactively using the df() method, which returns a narwhals.DataFrame (or narwhals.LazyFrame for lazy sources). Call .to_native() on the result to get the underlying pandas or polars DataFrame.
Polars LazyFrames
For large datasets, you can use Polars LazyFrames to keep data on disk until it’s actually needed. This is particularly useful when:
- Your dataset is too large to fit comfortably in memory
- You only need filtered or aggregated subsets of the data
- You want faster startup times for your application
With lazy evaluation, data stays on disk and queries are optimized by Polars before execution. Only the final results are loaded into memory.
lazy-app.py
import polars as pl
from querychat import QueryChat
# Scan a large parquet file (doesn't load data yet!)
lf = pl.scan_parquet("large_dataset.parquet")
# Pass the LazyFrame directly to QueryChat
qc = QueryChat(lf, "sales")
app = qc.app()The lazy approach can be significantly faster for large datasets because:
- Deferred loading: Data stays on disk until actually needed, so startup is nearly instant
- Query optimization: Polars optimizes the query plan before execution, potentially skipping unnecessary columns and rows
- Reduced memory: Only the filtered/aggregated results are loaded into memory, not the entire dataset
This is especially beneficial when users typically query small subsets of a large dataset.
You can create LazyFrames from various sources:
# From parquet (most efficient)
lf = pl.scan_parquet("data.parquet")
# From CSV
lf = pl.scan_csv("data.csv")
# From multiple files
lf = pl.scan_parquet("data/*.parquet")
# From an existing DataFrame
df = pl.read_csv("data.csv")
lf = df.lazy()When using a LazyFrame source, the df() method returns a narwhals.LazyFrame. Call .collect() to materialize the results when needed:
# Get the lazy result
result_lazy = qc.df()
# Materialize when ready
result_df = result_lazy.collect()Ibis Tables
Ibis is a Python DataFrame API that provides a unified interface to many different database backends. It enables lazy evaluation and query optimization, making it ideal for working with large datasets that live in databases or data warehouses.
Key benefits of using Ibis with querychat:
- Backend flexibility: Write code once, run on DuckDB, PostgreSQL, BigQuery, Snowflake, and many more
- Lazy evaluation: Queries are optimized and only executed when results are needed
- Data stays in place: No need to move data out of your database or warehouse
- Chainable operations: Query results are Ibis Tables that can be further transformed
pip install ibis-framework[duckdb]
ibis-duckdb-app.py
import ibis
from querychat import QueryChat
# Connect to a DuckDB database
conn = ibis.duckdb.connect("my_database.duckdb")
table = conn.table("my_table")
qc = QueryChat(table, "my_table")
app = qc.app()pip install ibis-framework[postgres]
ibis-postgres-app.py
import ibis
from querychat import QueryChat
# Connect to PostgreSQL
conn = ibis.postgres.connect(
host="localhost",
port=5432,
database="mydatabase",
user="user",
password="password",
)
table = conn.table("my_table")
qc = QueryChat(table, "my_table")
app = qc.app()pip install ibis-framework[bigquery]
ibis-bigquery-app.py
import ibis
from querychat import QueryChat
# Connect to BigQuery (uses default credentials from environment)
conn = ibis.bigquery.connect(
project_id="my-project",
dataset_id="my_dataset",
)
table = conn.table("my_table")
qc = QueryChat(table, "my_table")
app = qc.app()When using an Ibis source, the df() method returns an Ibis Table (lazy). You can chain additional Ibis operations or call .execute() to materialize the results as a pandas DataFrame:
# Get the lazy Ibis Table
result_table = qc.df()
# Chain additional operations
filtered = result_table.filter(result_table.amount > 100)
# Materialize when ready
result_df = result_table.execute()Both Ibis and SQLAlchemy can connect to databases, but they serve different purposes:
- Use Ibis when you want lazy evaluation, need to chain DataFrame-style operations on query results, or want to work with cloud data warehouses like BigQuery or Snowflake
- Use SQLAlchemy when you need a simple, direct SQL connection and are comfortable working with eagerly-evaluated DataFrames
Databases
You can also connect querychat directly to a table in any database supported by SQLAlchemy. This includes popular databases like SQLite, DuckDB, PostgreSQL, MySQL, and many more.
Assuming you have a database set up and accessible, you can pass a SQLAlchemy database URL to create_engine(), and then pass the resulting engine to querychat. Below are some examples for common databases.
pip install duckdb duckdb-engine
duckdb-app.py
from pathlib import Path
from sqlalchemy import create_engine
from querychat import QueryChat
# Assumes my_database.duckdb is in the same directory as this script
db_path = Path(__file__).parent / "my_database.duckdb"
engine = create_engine(f"duckdb:///{db_path}")
qc = QueryChat(engine, "my_table")
app = qc.app()sqlite-app.py
from pathlib import Path
from sqlalchemy import create_engine
from querychat import QueryChat
# Assumes my_database.db is in the same directory as this script
db_path = Path(__file__).parent / "my_database.db"
engine = create_engine(f"sqlite:///{db_path}")
qc = QueryChat(engine, "my_table")
app = qc.app()pip install psycopg2-binary
postgresql-app.py
from sqlalchemy import create_engine
from querychat import QueryChat
engine = create_engine("postgresql+psycopg2://user:password@localhost:5432/mydatabase")
qc = QueryChat(engine, "my_table")
app = qc.app()pip install pymysql
mysql-app.py
from sqlalchemy import create_engine
from querychat import QueryChat
engine = create_engine("mysql+pymysql://user:password@localhost:3306/mydatabase")
qc = QueryChat(engine, "my_table")
app = qc.app()If you don’t have a database set up, you can easily create a local DuckDB database from a CSV file using the following code:
create-duckdb.py
import duckdb
conn = duckdb.connect("my_database.duckdb")
conn.execute("""
CREATE TABLE my_table AS
SELECT * FROM read_csv_auto('path/to/your/file.csv')
""")Or, if you have a pandas DataFrame, you can create the DuckDB database like so:
create-duckdb-from-pandas.py
import duckdb
import pandas as pd
from querychat.data import titanic
conn = duckdb.connect("my_database.duckdb")
conn.register('titanic_df', titanic())
conn.execute("""
CREATE TABLE titanic AS
SELECT * FROM titanic_df
""")Then you can connect to this database using the DuckDB example above (changing the table name as appropriate):
Custom sources
If you have a custom data source that doesn’t fit into the above categories, you can implement the DataSource interface/protocol. This requires implementing methods for getting schema information and executing queries.