store.ChromaDBStore

A vector store backed by ChromaDB.

Usage

Source

store.ChromaDBStore()

ChromaDBStore provides local vector storage using Chroma’s embedded client. Documents are chunked by raghilda and embeddings are generated by Chroma’s embedding function (defaults to Chroma’s built-in embedding).

Examples

from raghilda.store import ChromaDBStore

store = ChromaDBStore.create(location="raghilda_chroma", name="docs")

store.upsert(markdown_doc)
chunks = store.retrieve("hello world", top_k=3)

Methods

Name Description
connect() Connect to an existing ChromaDB store.
create() Create a new ChromaDB store.
retrieve() Retrieve the most similar chunks to the given text.

connect()

Connect to an existing ChromaDB store.

Usage

Source

connect(name, location=None, *, client=None)
Parameters
name: str

Collection name for the store.

location: str | Path | None = None

Path where ChromaDB persists its data. Use “:memory:” or None for an in-memory store.

client: Any = None
Optional pre-configured Chroma client (e.g., HttpClient).
Returns
ChromaDBStore
A connected store instance.

create()

Create a new ChromaDB store.

Usage

Source

create(
    location=None,
    *,
    overwrite=False,
    name=None,
    title=None,
    embed=None,
    collection_metadata=None,
    attributes=None,
    client=None
)
Parameters
location: str | Path | None = None

Path where ChromaDB will persist its data. Use “:memory:” or None for an in-memory store.

overwrite: bool = False

Whether to overwrite an existing collection with the same name.

name: Optional[str] = None

Collection name for the store.

title: Optional[str] = None

Human-readable title for the store.

embed: (
    EmbeddingProvider
    | chromadb.api.types.EmbeddingFunction[chromadb.api.types.Documents]
    | None
)
= None

Optional embedding function. Can be either a raghilda EmbeddingProvider (e.g., EmbeddingOpenAI, EmbeddingCohere) or a ChromaDB embedding function. Raghilda providers are adapted internally to Chroma-compatible embedding functions. If None, Chroma’s default embedding function is used.

collection_metadata: Optional[dict[str, Any]] = None

Additional metadata to attach to the Chroma collection.

attributes: Optional[AttributesSchemaSpec] = None

Optional schema for user-defined attribute columns. Attribute names use identifier-style syntax. Chroma also provides built-in filterable columns: chunk_id, start_index, end_index, char_count, context, and origin.

client: Any = None
Optional pre-configured Chroma client (e.g., HttpClient).
Returns
ChromaDBStore
A newly created store instance.

retrieve()

Retrieve the most similar chunks to the given text.

Usage

Source

retrieve(text, top_k, *, deoverlap=True, attributes_filter=None, **kwargs)

Uses ChromaDB’s vector similarity search to find relevant chunks, then optionally merges overlapping chunks from the same document.

Parameters
text: str

The query text to search for.

top_k: int

The maximum number of chunks to return.

deoverlap: bool = True

If True (default), merge overlapping chunks from the same document. Overlapping chunks are identified by their start_index and end_index positions. When merged, the resulting chunk spans the union of the original ranges, combines metrics, and aggregates attribute values into per-chunk lists in start-order. The context value is kept from the first chunk in each merged overlap group.

attributes_filter: Optional[AttributeFilter] = None

Optional attribute filter as SQL-like string or dict AST. Example string: "tenant = 'docs' AND priority >= 2". Supports declared attributes plus built-in columns: chunk_id, start_index, end_index, char_count, context, and origin.

**kwargs
Additional arguments passed to ChromaDB’s query() method.
Returns
Sequence[RetrievedChromaDBMarkdownChunk]
The retrieved chunks with their relevance metrics.