store.ChromaDBStore

A vector store backed by ChromaDB.

Usage

store.ChromaDBStore(
    client,
    collection,
    metadata,
)

ChromaDBStore provides local vector storage using Chroma’s embedded client. Documents are chunked by raghilda and embeddings are generated by Chroma’s embedding function (defaults to Chroma’s built-in embedding).

Examples

from raghilda.store import ChromaDBStore

store = ChromaDBStore.create(location="raghilda_chroma", name="docs")

store.upsert(markdown_doc)
chunks = store.retrieve("hello world", top_k=3)

Methods

Name	Description
connect()	Connect to an existing ChromaDB store.
create()	Create a new ChromaDB store.
retrieve()	Retrieve the most similar chunks to the given text.

connect()

Connect to an existing ChromaDB store.

Usage

Source

connect(name, location=None, *, client=None)

Parameters

name: str: Collection name for the store.
location: str | Path | None = None: Path where ChromaDB persists its data. Use “:memory:” or None for an in-memory store.
client: Any = None: Optional pre-configured Chroma client (e.g., HttpClient).

Returns

ChromaDBStore: A connected store instance.

create()

Create a new ChromaDB store.

Usage

Source

create(
    location=None,
    *,
    overwrite=False,
    name=None,
    title=None,
    embed=None,
    collection_metadata=None,
    attributes=None,
    client=None
)

Parameters

location: str | Path | None = None: Path where ChromaDB will persist its data. Use “:memory:” or None for an in-memory store.
overwrite: bool = False: Whether to overwrite an existing collection with the same name.
name: Optional[str] = None: Collection name for the store.
title: Optional[str] = None: Human-readable title for the store.
embed: ( EmbeddingProvider | chromadb.api.types.EmbeddingFunction[chromadb.api.types.Documents] | None ) = None: Optional embedding function. Can be either a raghilda EmbeddingProvider (e.g., EmbeddingOpenAI, EmbeddingCohere) or a ChromaDB embedding function. Raghilda providers are adapted internally to Chroma-compatible embedding functions. If None, Chroma’s default embedding function is used.
collection_metadata: Optional[dict[str, Any]] = None: Additional metadata to attach to the Chroma collection.
attributes: Optional[AttributesSchemaSpec] = None: Optional schema for user-defined attribute columns. Attribute names use identifier-style syntax. Chroma also provides built-in filterable columns: chunk_id, start_index, end_index, char_count, context, and origin.
client: Any = None: Optional pre-configured Chroma client (e.g., HttpClient).

Returns

ChromaDBStore: A newly created store instance.

retrieve()

Retrieve the most similar chunks to the given text.

Usage

Source

retrieve(text, top_k, *, deoverlap=True, attributes_filter=None, **kwargs)

Uses ChromaDB’s vector similarity search to find relevant chunks, then optionally merges overlapping chunks from the same document.

Parameters

text: str: The query text to search for.
top_k: int: The maximum number of chunks to return.
deoverlap: bool = True: If True (default), merge overlapping chunks from the same document. Overlapping chunks are identified by their start_index and end_index positions. When merged, the resulting chunk spans the union of the original ranges, combines metrics, and aggregates attribute values into per-chunk lists in start-order. The context value is kept from the first chunk in each merged overlap group.
attributes_filter: Optional[AttributeFilter] = None: Optional attribute filter as SQL-like string or dict AST. Example string: "tenant = 'docs' AND priority >= 2". Supports declared attributes plus built-in columns: chunk_id, start_index, end_index, char_count, context, and origin.
**kwargs: Additional arguments passed to ChromaDB’s query() method.

Returns

Sequence[RetrievedChromaDBMarkdownChunk]: The retrieved chunks with their relevance metrics.