store.PostgreSQLStore

A store backed by a PostgreSQL database with pgvector.

Usage

store.PostgreSQLStore(
    con,
    metadata,
    schema,
)

Uses PostgreSQL for storage with two retrieval methods:

Full-text search via retrieve_fts(): uses PostgreSQL’s built-in tsvector/tsquery with ts_rank for ranking. A pre-computed tsvector column with a GIN index is created automatically.
Vector similarity search via retrieve_vss(): uses pgvector for nearest-neighbor search over embeddings. An HNSW index for cosine distance is created automatically when an embedding provider is given. Use build_index() to add indexes for other distance methods (L2, inner product).

Methods

Name	Description
build_index()	Build an HNSW index on the embedding column for the given distance method.
close()	Close the store’s database connection.
connect()	Connect to an existing PostgreSQL store.
create()	Create a new PostgreSQL store.
retrieve()	Retrieve the most similar chunks to the given text.
retrieve_fts()	Retrieve chunks using PostgreSQL full-text search.
retrieve_vss()	Retrieve chunks using pgvector similarity search.
size()	Count the number of documents in the store.
upsert()	Upsert a document into the store.

build_index()

Build an HNSW index on the embedding column for the given distance method.

Usage

Source

build_index(method)

A cosine distance index is created by default when calling create() with an embedding provider. Use this method to add indexes for other distance methods.

Parameters

method: str: The distance method to index for.

close()

Close the store’s database connection.

Usage

Source

close()

connect()

Connect to an existing PostgreSQL store.

Usage

Source

connect(con, schema="raghilda")

Parameters

con: ConnectionLike: A PostgreSQL connection string (e.g. "postgresql://user:pass@localhost/mydb").
schema: str = "raghilda": PostgreSQL schema where the store tables live. Defaults to "raghilda".

Returns

PostgreSQLStore: A connected store instance.

create()

Create a new PostgreSQL store.

Usage

Source

create(
    con,
    embed,
    name=None,
    title=None,
    attributes=None,
    vss_index="cosine_distance",
    schema="raghilda",
    overwrite=False
)

Parameters

con: ConnectionLike: A PostgreSQL connection string (e.g. "postgresql://user:pass@localhost/mydb").
embed: Optional[EmbeddingProvider]: Embedding provider for generating vector embeddings. If None, only full-text search will be available.
name: Optional[str] = None: Internal name for the store.
title: Optional[str] = None: Human-readable title for the store.
attributes: Optional[AttributesSchemaSpec] = None: Optional schema for user-defined attribute columns stored per chunk.
vss_index: Optional[str] = "cosine_distance": The distance method to build an HNSW index for. Defaults to cosine distance. Set to None to skip creating a VSS index. Ignored when embed is None.
schema: str = "raghilda": PostgreSQL schema to create the store tables in. Defaults to "raghilda". The schema is created if it does not exist.
overwrite: bool = False: If False (default), raise an error when the schema already contains store tables. Set to True to drop the existing store and recreate it.

Returns

PostgreSQLStore: A newly created store instance.

Raises

ValueError: If overwrite is False and the schema already contains a store.

retrieve()

Retrieve the most similar chunks to the given text.

Usage

Source

retrieve(text, top_k=3, *, deoverlap=True, attributes_filter=None)

Combines results from vector similarity search (if embeddings are available) and full-text search, then deduplicates by chunk id, merging metrics from both methods.

Parameters

text: str: The query text to search for.
top_k: int = 3: The maximum number of chunks to return from each retrieval method (VSS and FTS). Because results from both methods are combined before deoverlapping, the final count may differ from top_k.
deoverlap: bool = True: If True (default), merge overlapping chunks from the same document. Overlapping chunks are identified by their start_index and end_index positions.
attributes_filter: Optional[AttributeFilter] = None: Optional filter to scope retrieval using attribute columns. Can be a SQL-like string or a dict AST. Example string: "tenant = 'docs' AND priority >= 2".

Returns

list[RetrievedChunk]: The retrieved chunks with their relevance metrics.

retrieve_fts()

Retrieve chunks using PostgreSQL full-text search.

Usage

Source

retrieve_fts(text, top_k=3, *, attributes_filter=None)

Uses to_tsvector / plainto_tsquery with ts_rank for ranking results.

Parameters

text: str: The query text to search for.
top_k: int = 3: The maximum number of chunks to return.
attributes_filter: Optional[AttributeFilter] = None: Optional filter to scope retrieval using attribute columns. Can be a SQL-like string or a dict AST.

Returns

list[RetrievedChunk]: The matching chunks ranked by ts_rank score.

retrieve_vss()

Retrieve chunks using pgvector similarity search.

Usage

Source

retrieve_vss(query, top_k=3, *, method=None, attributes_filter=None)

Uses pgvector distance operators for nearest-neighbor search. For best performance, ensure an HNSW index exists for the chosen distance method (created automatically for cosine distance, or via build_index() for others).

Parameters

query: str | Sequence[float]: The query text or embedding vector. If a string is provided, it will be embedded using the store’s embedding provider.
top_k: int = 3: The maximum number of chunks to return.
method: Optional[str] = None: The distance method to use. Defaults to cosine distance. One of "cosine_distance", "l2_distance", or "inner_product".
attributes_filter: Optional[AttributeFilter] = None: Optional filter to scope retrieval using attribute columns. Can be a SQL-like string or a dict AST.

Returns

list[RetrievedChunk]: The most similar chunks with distance metrics.

Raises

ValueError: If query is a string but no embedding provider is configured.

size()

Count the number of documents in the store.

Usage

Source

size()

Returns

int: The number of documents (not chunks) in the store.

upsert()

Upsert a document into the store.

Usage

Source

upsert(document, *, skip_if_unchanged=True)

The document must be a ~raghilda.document.ChunkedMarkdownDocument. Use ~raghilda.chunker.MarkdownChunker to chunk a ~raghilda.document.MarkdownDocument before upserting.

Parameters

document: Document: The chunked document to upsert.
skip_if_unchanged: bool = True: If True (default), skip the write when the existing document for the same origin already has identical content. This avoids re-computing embeddings.