chunker.BaseChunker
Base class for chunkers.
Usage
chunker.BaseChunker()A chunker splits a raghilda.document.Document into a raghilda.document.ChunkedDocument containing smaller text segments suitable for embedding and retrieval.
Subclasses must implement chunk() and chunk_text() to provide a concrete chunking strategy:
- raghilda.chunker.MarkdownChunker: splits Markdown documents at semantic boundaries (headings, paragraphs, sentences).
Methods
| Name | Description |
|---|---|
| chunk() |
Chunk a document into a ~raghilda.document.ChunkedDocument.
|
| chunk_text() |
Chunk raw text into a sequence of ~raghilda.chunk.Chunk objects.
|
chunk()
Chunk a document into a ~raghilda.document.ChunkedDocument.
Usage
chunk(document)Parameters
document: Document- The document to chunk.
Returns
ChunkedDocument- The document with chunks attached.
chunk_text()
Chunk raw text into a sequence of ~raghilda.chunk.Chunk objects.
Usage
chunk_text(text)Parameters
text: str- The text to chunk.
Returns
Sequence[Chunk]- The resulting chunks with positional information.