from raghilda.store import DuckDBStore
from raghilda.embedding import EmbeddingOpenAI
store = DuckDBStore.create(
location="quarto_docs.db",
embed=EmbeddingOpenAI(),
name="quarto",
title="Quarto Documentation",
overwrite=True,
)Core Concepts
Large language models (LLMs) sometimes generate confident but incorrect information—a phenomenon known as hallucination. This happens because LLMs work by predicting the most likely next words based on patterns learned during training, without any inherent concept of truth or factual accuracy.
Why RAG?
Retrieval-Augmented Generation (RAG) addresses this by grounding LLM responses in trusted source material. Instead of relying solely on the model’s training data, RAG retrieves relevant content from a curated knowledge base and includes it in the prompt. This shifts the model’s role from open-ended generation to summarizing vetted content.
While RAG doesn’t eliminate hallucinations entirely, it significantly reduces them for domain-specific applications by ensuring responses are anchored in authoritative sources.
Building a RAG System
A RAG system has two main phases:
- Preparation: Building a searchable knowledge store from your documents
- Retrieval: Finding relevant content to augment LLM prompts
Let’s walk through building a RAG system using the Quarto documentation as our knowledge base.
Creating a Store
First, create a store with an embedding provider. The store will hold your document chunks and their vector embeddings:
raghilda supports multiple embedding providers (OpenAI, Cohere) and storage backends (DuckDB, ChromaDB, OpenAI Vector Stores, PostgreSQL). See the API Reference for all options.
Finding Documents
Next, identify the documents to include. The find_links() function can crawl a website to discover pages:
from raghilda.scrape import find_links
links = find_links(
"https://quarto.org/docs/guide/",
depth=1, # follow links 1 level deep from the starting page
children_only=True,
)
print(f"Found {len(links)} pages")Found 2 pages
The depth parameter controls how many levels of links to follow, and children_only=True restricts crawling to pages under the starting URL.
You can also work with local files or provide a list of URLs directly:
# Local files
links = ["docs/guide.md", "docs/reference.md", "docs/tutorial.md"]
# Or use glob patterns with pathlib
from pathlib import Path
links = list(Path("docs").glob("**/*.md"))Preparing Documents
Prepare each document explicitly by reading it, chunking it, and passing the result to upsert():
from raghilda.read import read_as_markdown
from raghilda.chunker import MarkdownChunker
chunker = MarkdownChunker()
for link in links:
document = read_as_markdown(link)
chunked = chunker.chunk(document)
store.upsert(chunked)
print(f"Indexed {store.size()} documents")Indexed 2 documents
That is the full preparation phase. Each document is converted to Markdown, split into overlapping chunks, embedded, and written to the store through explicit calls that keep the indexing pipeline visible.
What Happens During Preparation
Each item you index typically goes through two steps before it is stored:
1. Convert to Markdown — read_as_markdown() converts the item (a URL or file path) into a Markdown document. It handles HTML pages, PDFs, DOCX files, and more using MarkItDown. For HTML, it extracts the <main> element and removes <nav> elements by default.
from raghilda.read import read_as_markdown
doc = read_as_markdown("https://quarto.org/docs/guide/")
print(doc.content[:500])# Guide – Quarto
# Guide
Comprehensive guide to using Quarto. If you are just starting out, you may want to explore the [tutorials](../../docs/get-started/index.html) to learn the basics.
#### Authoring
###### Create content with markdown
* [Markdown Basics](../../docs/authoring/markdown-basics.html)
* [Figures](../../docs/authoring/figures.html)
* [Tables](../../docs/authoring/tables.html)
* [Diagrams](../../docs/authoring/diagrams.html)
* [Citations](../../docs/authoring/citations.html)
*
2. Chunk the document — MarkdownChunker splits the Markdown into overlapping chunks at semantic boundaries (headings, paragraphs, sentences). The defaults are a chunk size of 1600 characters with 50% overlap between chunks. Each chunk retains the heading hierarchy it falls under as context.
from raghilda.chunker import MarkdownChunker
chunker = MarkdownChunker(
chunk_size=1600, # Target size in characters
target_overlap=0.5, # 50% overlap between chunks
)
chunked_doc = chunker.chunk(doc)
print(f"Created {len(chunked_doc.chunks)} chunks")
print(f"\nFirst chunk context: {chunked_doc.chunks[0].context}")
print(f"First chunk text:\n{chunked_doc.chunks[0].text[:200]}...")Created 6 chunks
First chunk context: None
First chunk text:
# Guide – Quarto
# Guide
Comprehensive guide to using Quarto. If you are just starting out, you may want to explore the [tutorials](../../docs/get-started/index.html) to learn the basics.
#### Auth...
After chunking, upsert() embeds the chunks using the store’s embedding provider and writes them to the database.
Customizing Preparation
You can wrap your preferred reading and chunking logic in a helper that returns a chunked Document:
from raghilda.read import read_as_markdown
from raghilda.chunker import MarkdownChunker
chunker = MarkdownChunker(chunk_size=800, target_overlap=0.3)
def prepare(uri):
doc = read_as_markdown(uri)
return chunker.chunk(doc)
for link in links:
store.upsert(prepare(link))Common reasons to customize prepare:
- Adjust chunk size or overlap — Smaller chunks for more precise retrieval, larger for more context.
- Set hard heading boundaries — Use
segment_by_heading_levels=[1, 2]to prevent chunks from crossing major sections. - Control HTML extraction — Pass
html_extract_selectorsorhtml_zap_selectorstoread_as_markdown(). - Use a different chunker — Any chunker that returns a chunked document will work. See the Chunking guide for more options, including chonkie integration.
Building Indexes
After ingestion, build indexes to speed up retrieval:
store.build_index()This creates both a vector similarity index (HNSW) for semantic search and a BM25 index for keyword search.
Retrieving Content
Now you can search your knowledge base:
chunks = store.retrieve("How do I create a Quarto presentation?", top_k=5)
for chunk in chunks:
print(f"Score: {chunk.metrics[0].value:.4f}")
print(chunk.text[:200])
print("---")Score: 0.4994
# Guide – Quarto
# Guide
Comprehensive guide to using Quarto. If you are just starting out, you may want to explore the [tutorials](../../docs/get-started/index.html) to learn the basics.
#### Auth
---
Score: 0.4994
# Guide – Quarto
# Guide
Comprehensive guide to using Quarto. If you are just starting out, you may want to explore the [tutorials](../../docs/get-started/index.html) to learn the basics.
#### Auth
---
The retrieve() method combines vector similarity search (semantic matching) with BM25 (keyword matching) for hybrid retrieval. By default, overlapping chunks from the same document are merged (deoverlap=True) to provide more coherent results; metrics are concatenated and attribute values are aggregated in start-order lists on merged chunks. The merged chunk keeps context from the first overlapping chunk.
You can also use the individual search methods:
# Vector similarity search only
chunks = store.retrieve_vss("presentations", top_k=5)
# BM25 keyword search only
chunks = store.retrieve_bm25("presentations", top_k=5)Using with an LLM
The retrieved chunks can augment your LLM prompts. Here’s an example using chatlas:
from chatlas import ChatOpenAI
# Connect to existing store
store = DuckDBStore.connect("quarto_docs.db", read_only=True)
# Define a search tool
def search_docs(query: str) -> str:
"""Search the Quarto documentation for relevant information."""
import json
chunks = store.retrieve(query, top_k=5, deoverlap=True)
return json.dumps([{"text": chunk.text, "context": chunk.context} for chunk in chunks])
# Create chat with RAG tool
chat = ChatOpenAI(
model="gpt-4o-mini",
system_prompt="""Answer questions about Quarto using the search tool.
Always search the documentation before answering.""",
)
chat.register_tool(search_docs)
# Ask a question
chat.chat("How do I add citations to a Quarto document?")Reconnecting to a Store
To reuse an existing store:
store = DuckDBStore.connect("quarto_docs.db")
print(f"Store contains {store.size()} chunks")The embedding configuration is automatically restored, so you can immediately start retrieving.
Complete Example
Here’s the full workflow in one script:
from raghilda.store import DuckDBStore
from raghilda.embedding import EmbeddingOpenAI
from raghilda.scrape import find_links
# 1. Create store
store = DuckDBStore.create(
location="quarto_docs.db",
embed=EmbeddingOpenAI(),
name="quarto",
title="Quarto Documentation",
overwrite=True,
)
# 2. Find documents
links = find_links(
"https://quarto.org/docs/guide/",
depth=1,
children_only=True,
)
# 3. Prepare and insert documents
from raghilda.read import read_as_markdown
from raghilda.chunker import MarkdownChunker
chunker = MarkdownChunker()
for link in links:
store.upsert(chunker.chunk(read_as_markdown(link)))
# 4. Build indexes
store.build_index()
# 5. Retrieve
chunks = store.retrieve("How do I create a presentation?", top_k=3)
for chunk in chunks:
print(f"\n## {chunk.context}")
print(chunk.text[:300])Next Steps
- Learn about chunk sizing, overlap, and alternative chunkers in the Chunking guide
- Learn schema-based retrieval scoping with Attribute Filters
- Explore ChromaDB Store for an alternative storage backend
- See the API Reference for detailed documentation
- Check out the examples for more use cases