chunker.BaseChunker

Base class for chunkers.

Usage

Source

chunker.BaseChunker()

A chunker splits a raghilda.document.Document into a raghilda.document.ChunkedDocument containing smaller text segments suitable for embedding and retrieval.

Subclasses must implement chunk() and chunk_text() to provide a concrete chunking strategy:

Methods

Name Description
chunk() Chunk a document into a ~raghilda.document.ChunkedDocument.
chunk_text() Chunk raw text into a sequence of ~raghilda.chunk.Chunk objects.

chunk()

Chunk a document into a ~raghilda.document.ChunkedDocument.

Usage

Source

chunk(document)
Parameters
document: Document
The document to chunk.
Returns
ChunkedDocument
The document with chunks attached.

chunk_text()

Chunk raw text into a sequence of ~raghilda.chunk.Chunk objects.

Usage

Source

chunk_text(text)
Parameters
text: str
The text to chunk.
Returns
Sequence[Chunk]
The resulting chunks with positional information.