Reference
Document Stores
Backend storage systems for documents and embeddings.
- BaseDocumentStore
-
Abstract base class for document stores.
- DuckDBDocumentStore
-
DuckDB-backed document store with vector search.
- PostgreSQLDocumentStore
-
PostgreSQL-backed document store with pgvector.
DuckDBDocumentStore Methods
Methods for the DuckDBDocumentStore class
- DuckDBDocumentStore.upsert_documents()
-
Insert or update documents in the store.
- DuckDBDocumentStore.ingest_from_directory()
-
Ingest all documents from a directory.
- DuckDBDocumentStore.retrieve_by_similarity()
-
Retrieve documents by vector similarity search.
- DuckDBDocumentStore.retrieve_by_bm25_score()
-
Retrieve documents using BM25 text scoring.
- DuckDBDocumentStore.retrieve_hybrid_combination()
-
Retrieve using hybrid vector + BM25 combination.
- DuckDBDocumentStore.build_vector_index()
-
Build or rebuild the vector similarity index.
- DuckDBDocumentStore.get_collection_size()
-
Return the number of documents in the store.
Embedding Providers
Services for generating vector embeddings.
- EmbeddingProvider
-
Base class for embedding providers.
- OpenAIEmbeddingProvider
-
OpenAI embedding provider using text-embedding models.
- CohereEmbeddingProvider
-
Cohere embedding provider with input type support.
Chunker Strategies
Strategies for splitting documents into chunks.
- BaseChunkerStrategy
-
Abstract base class for document chunking strategies.
- MarkdownChunkerStrategy
-
Markdown-aware chunking strategy that respects heading boundaries.
Data Types
Type definitions and result containers.
- RetrievedDocumentChunk
-
A document chunk returned from a retrieval query.
- DocumentMetadataConfig
-
Configuration for document metadata extraction.
- EmbeddingVectorResult
-
Result container for embedding vector operations.
Plain Text Names
Classes with long names containing no special characters.
- documentstorewithvectorsearchcapabilities
-
A store for documents supporting vector search.
- EMBEDDINGPROVIDERWITHBATCHPROCESSINGSUPPORT
-
All-uppercase embedding provider class.
- Chunkerstrategywithoverlapdetection
-
Initial-cap chunker strategy class.
documentstorewithvectorsearchcapabilities Methods
Methods for the documentstorewithvectorsearchcapabilities class
- documentstorewithvectorsearchcapabilities.insertdocumentswithembeddings()
-
Insert documents along with their embedding vectors.
- documentstorewithvectorsearchcapabilities.searchbyvectorsimilarity()
-
Search for documents by vector similarity.
- documentstorewithvectorsearchcapabilities.rebuildvectorsearchindex()
-
Rebuild the internal vector search index.
- documentstorewithvectorsearchcapabilities.deletedocumentsbyidentifier()
-
Delete a document by its unique identifier.
- documentstorewithvectorsearchcapabilities.countdocumentsincollection()
-
Return the total number of documents stored.
- documentstorewithvectorsearchcapabilities.exportcollectiontojsonlines()
-
Export all documents to a JSON Lines file.
EMBEDDINGPROVIDERWITHBATCHPROCESSINGSUPPORT Methods
Methods for the EMBEDDINGPROVIDERWITHBATCHPROCESSINGSUPPORT class
- EMBEDDINGPROVIDERWITHBATCHPROCESSINGSUPPORT.GENERATEEMBEDDINGSFROMTEXTINPUT()
-
Generate embeddings from a list of text inputs.
- EMBEDDINGPROVIDERWITHBATCHPROCESSINGSUPPORT.CALCULATETOKENCOUNTFORTEXTS()
-
Calculate total token count for the given texts.
- EMBEDDINGPROVIDERWITHBATCHPROCESSINGSUPPORT.RETRIEVEMODELCONFIGURATION()
-
Retrieve the current model configuration.
- EMBEDDINGPROVIDERWITHBATCHPROCESSINGSUPPORT.VALIDATEINPUTTEXTLENGTHS()
-
Validate that all input texts are within length limits.
- EMBEDDINGPROVIDERWITHBATCHPROCESSINGSUPPORT.EXPORTEMBEDDINGSTOFILE()
-
Export computed embeddings to a file.
- EMBEDDINGPROVIDERWITHBATCHPROCESSINGSUPPORT.RESETINTERNALBATCHCOUNTER()
-
Reset the internal batch processing counter.
Chunkerstrategywithoverlapdetection Methods
Methods for the Chunkerstrategywithoverlapdetection class
- Chunkerstrategywithoverlapdetection.splitcontentintochunks()
-
Split document content into overlapping chunks.
- Chunkerstrategywithoverlapdetection.detectoverlapboundaries()
-
Detect optimal overlap boundary positions.
- Chunkerstrategywithoverlapdetection.mergeundersizedfragments()
-
Merge fragments that are too small to stand alone.
- Chunkerstrategywithoverlapdetection.calculateoverlappercentage()
-
Calculate the average overlap percentage between chunks.
- Chunkerstrategywithoverlapdetection.exportchunkswithoverlap()
-
Export chunks with overlap markers to a file.
- Chunkerstrategywithoverlapdetection.resetinternalchunkcache()
-
Reset the internal chunk processing cache.