from raghilda.store import ChromaDBStore
from raghilda.embedding import EmbeddingOpenAI
# Create a persistent store
store = ChromaDBStore.create(
location="my_vector_store",
name="documents",
embed=EmbeddingOpenAI(),
)Getting Started
ChromaDB is an open-source vector database designed for AI applications. raghilda’s ChromaDBStore provides a convenient interface for storing and retrieving document chunks using ChromaDB as the backend.
Installation
ChromaDB is an optional dependency. Install it with:
pip install chromadbOr install raghilda with ChromaDB support:
pip install "raghilda[chromadb]"Creating a Store
Create a new ChromaDB store with ChromaDBStore.create():
Parameters
| Parameter | Description |
|---|---|
location |
Path for persistent storage. Use ":memory:" or None for in-memory storage. |
name |
Collection name within the store. Defaults to "raghilda_chroma". |
title |
Human-readable title for the store. |
embed |
Embedding function (raghilda provider or ChromaDB function). |
overwrite |
If True, delete existing collection with the same name. |
client |
Optional pre-configured ChromaDB client (e.g., HttpClient). |
In-Memory vs Persistent Storage
# In-memory store (data lost when process ends)
store = ChromaDBStore.create(location=":memory:", embed=EmbeddingOpenAI())
# Persistent store (data saved to disk)
store = ChromaDBStore.create(location="./my_store", embed=EmbeddingOpenAI())Inserting Documents
Insert chunked documents into the store:
from raghilda.document import MarkdownDocument
from raghilda.chunker import MarkdownChunker
# Create and chunk a document
doc = MarkdownDocument(
origin="example.md",
content="# Hello World\n\nThis is a sample document with some content."
)
chunker = MarkdownChunker()
chunked_doc = chunker.chunk(doc)
# Insert into store
store.upsert(chunked_doc)Multiple Documents
For multiple documents, read and chunk each item before calling upsert():
from raghilda.read import read_as_markdown
from raghilda.chunker import MarkdownChunker
files = [
"docs/guide.md",
"docs/reference.md",
"docs/tutorial.md",
]
chunker = MarkdownChunker(chunk_size=500)
for uri in files:
doc = read_as_markdown(uri)
store.upsert(chunker.chunk(doc))Retrieving Documents
Search for relevant chunks using semantic similarity:
# Find the 5 most relevant chunks
results = store.retrieve("How do I get started?", top_k=5)
for chunk in results:
print(f"Score: {chunk.metrics[0].value:.4f}")
print(f"Text: {chunk.text[:100]}...")
print()Deoverlapping Results
By default, overlapping chunks from the same document are merged:
# Merged overlapping chunks (default)
results = store.retrieve("query", top_k=5, deoverlap=True)
# Keep chunks separate
results = store.retrieve("query", top_k=5, deoverlap=False)When chunks are merged, metric values are preserved and user attributes are aggregated into per-chunk lists in start-order. The merged chunk keeps context from the first overlapping chunk.
Attribute Filtering
Use attributes_filter to narrow retrieval results by declared attributes and built-in filterable columns (for example origin):
# Filter by document origin
results = store.retrieve(
"query",
top_k=5,
attributes_filter="origin = 'guide.md'"
)Chroma built-in filterable columns are: chunk_id, start_index, end_index, char_count, context, and origin.
For advanced Chroma-specific filters, you can still pass where=... directly.
Connecting to Existing Stores
Reconnect to a previously created store:
# Connect to existing store
store = ChromaDBStore.connect(
name="documents",
location="my_vector_store",
)
# Check how many documents are stored
print(f"Documents in store: {store.size()}")When using ChromaDB’s built-in embedding functions or raghilda’s EmbeddingOpenAI/EmbeddingCohere, the embedding function is automatically restored from the stored configuration. See Embedding Functions for details.
Using a Remote ChromaDB Server
Connect to a ChromaDB server running elsewhere:
import chromadb
# Connect to remote ChromaDB server
client = chromadb.HttpClient(host="localhost", port=8000)
# Use the client with raghilda
store = ChromaDBStore.create(
client=client,
name="documents",
embed=EmbeddingOpenAI(),
)Complete Example
Here’s a complete workflow from document preparation to retrieval:
from raghilda.store import ChromaDBStore
from raghilda.embedding import EmbeddingOpenAI
from raghilda.read import read_as_markdown
from raghilda.chunker import MarkdownChunker
# 1. Create store
store = ChromaDBStore.create(
location="knowledge_base",
name="docs",
embed=EmbeddingOpenAI(),
overwrite=True,
)
# 2. Prepare and insert documents
chunker = MarkdownChunker()
for path in [
"README.md",
"docs/getting-started.md",
"docs/api-reference.md",
]:
store.upsert(chunker.chunk(read_as_markdown(path)))
print(f"Indexed {store.size()} documents")
# 3. Search
results = store.retrieve("How do I install the package?", top_k=3)
for i, chunk in enumerate(results, 1):
print(f"\n--- Result {i} ---")
print(f"From: {chunk.origin}")
print(f"Text: {chunk.text[:200]}...")Next Steps
- Learn about Embedding Functions for advanced embedding configuration
- Explore the API Reference for all available methods