from raghilda.store import ChromaDBStore
store = ChromaDBStore.create(location="raghilda_chroma", name="docs")
store.upsert(markdown_doc)
chunks = store.retrieve("hello world", top_k=3)store.ChromaDBStore
A vector store backed by ChromaDB.
Usage
store.ChromaDBStore()ChromaDBStore provides local vector storage using Chroma’s embedded client. Documents are chunked by raghilda and embeddings are generated by Chroma’s embedding function (defaults to Chroma’s built-in embedding).
Examples
Methods
| Name | Description |
|---|---|
| connect() | Connect to an existing ChromaDB store. |
| create() | Create a new ChromaDB store. |
| retrieve() | Retrieve the most similar chunks to the given text. |
connect()
Connect to an existing ChromaDB store.
Usage
connect(name, location=None, *, client=None)Parameters
name: str-
Collection name for the store.
location: str | Path | None = None-
Path where ChromaDB persists its data. Use “:memory:” or None for an in-memory store.
client: Any = None- Optional pre-configured Chroma client (e.g., HttpClient).
Returns
ChromaDBStore- A connected store instance.
create()
Create a new ChromaDB store.
Usage
create(
location=None,
*,
overwrite=False,
name=None,
title=None,
embed=None,
collection_metadata=None,
attributes=None,
client=None
)Parameters
location: str | Path | None = None-
Path where ChromaDB will persist its data. Use “:memory:” or None for an in-memory store.
overwrite: bool = False-
Whether to overwrite an existing collection with the same name.
name: Optional[str] = None-
Collection name for the store.
title: Optional[str] = None-
Human-readable title for the store.
embed: (
EmbeddingProvider
| chromadb.api.types.EmbeddingFunction[chromadb.api.types.Documents]
| None
)
= None-
Optional embedding function. Can be either a raghilda EmbeddingProvider (e.g., EmbeddingOpenAI, EmbeddingCohere) or a ChromaDB embedding function. Raghilda providers are adapted internally to Chroma-compatible embedding functions. If None, Chroma’s default embedding function is used.
collection_metadata: Optional[dict[str, Any]] = None-
Additional metadata to attach to the Chroma collection.
attributes: Optional[AttributesSchemaSpec] = None-
Optional schema for user-defined attribute columns. Attribute names use identifier-style syntax. Chroma also provides built-in filterable columns:
chunk_id,start_index,end_index,char_count,context, andorigin. client: Any = None- Optional pre-configured Chroma client (e.g., HttpClient).
Returns
ChromaDBStore- A newly created store instance.
retrieve()
Retrieve the most similar chunks to the given text.
Usage
retrieve(text, top_k, *, deoverlap=True, attributes_filter=None, **kwargs)Uses ChromaDB’s vector similarity search to find relevant chunks, then optionally merges overlapping chunks from the same document.
Parameters
text: str-
The query text to search for.
top_k: int-
The maximum number of chunks to return.
deoverlap: bool = True-
If True (default), merge overlapping chunks from the same document. Overlapping chunks are identified by their
start_indexandend_indexpositions. When merged, the resulting chunk spans the union of the original ranges, combines metrics, and aggregates attribute values into per-chunk lists in start-order. Thecontextvalue is kept from the first chunk in each merged overlap group. attributes_filter: Optional[AttributeFilter] = None-
Optional attribute filter as SQL-like string or dict AST. Example string:
"tenant = 'docs' AND priority >= 2". Supports declared attributes plus built-in columns:chunk_id,start_index,end_index,char_count,context, andorigin. **kwargs-
Additional arguments passed to ChromaDB’s
query()method.
Returns
Sequence[RetrievedChromaDBMarkdownChunk]- The retrieved chunks with their relevance metrics.