store.PostgreSQLStore

A store backed by a PostgreSQL database with pgvector.

Usage

Source

store.PostgreSQLStore()

Uses PostgreSQL for storage with two retrieval methods:

  • Full-text search via retrieve_fts(): uses PostgreSQL’s built-in tsvector/tsquery with ts_rank for ranking. A pre-computed tsvector column with a GIN index is created automatically.
  • Vector similarity search via retrieve_vss(): uses pgvector for nearest-neighbor search over embeddings. An HNSW index for cosine distance is created automatically when an embedding provider is given. Use build_index() to add indexes for other distance methods (L2, inner product).

Methods

Name Description
build_index() Build an HNSW index on the embedding column for the given distance method.
close() Close the store’s database connection.
connect() Connect to an existing PostgreSQL store.
create() Create a new PostgreSQL store.
retrieve() Retrieve the most similar chunks to the given text.
retrieve_fts() Retrieve chunks using PostgreSQL full-text search.
retrieve_vss() Retrieve chunks using pgvector similarity search.
size() Count the number of documents in the store.
upsert() Upsert a document into the store.

build_index()

Build an HNSW index on the embedding column for the given distance method.

Usage

Source

build_index(method)

A cosine distance index is created by default when calling create() with an embedding provider. Use this method to add indexes for other distance methods.

Parameters
method: str
The distance method to index for.

close()

Close the store’s database connection.

Usage

Source

close()

connect()

Connect to an existing PostgreSQL store.

Usage

Source

connect(con, schema="raghilda")
Parameters
con: ConnectionLike

A PostgreSQL connection string (e.g. "postgresql://user:pass@localhost/mydb").

schema: str = "raghilda"
PostgreSQL schema where the store tables live. Defaults to "raghilda".
Returns
PostgreSQLStore
A connected store instance.

create()

Create a new PostgreSQL store.

Usage

Source

create(
    con,
    embed,
    name=None,
    title=None,
    attributes=None,
    vss_index="cosine_distance",
    schema="raghilda",
    overwrite=False
)
Parameters
con: ConnectionLike

A PostgreSQL connection string (e.g. "postgresql://user:pass@localhost/mydb").

embed: Optional[EmbeddingProvider]

Embedding provider for generating vector embeddings. If None, only full-text search will be available.

name: Optional[str] = None

Internal name for the store.

title: Optional[str] = None

Human-readable title for the store.

attributes: Optional[AttributesSchemaSpec] = None

Optional schema for user-defined attribute columns stored per chunk.

vss_index: Optional[str] = "cosine_distance"

The distance method to build an HNSW index for. Defaults to cosine distance. Set to None to skip creating a VSS index. Ignored when embed is None.

schema: str = "raghilda"

PostgreSQL schema to create the store tables in. Defaults to "raghilda". The schema is created if it does not exist.

overwrite: bool = False
If False (default), raise an error when the schema already contains store tables. Set to True to drop the existing store and recreate it.
Returns
PostgreSQLStore
A newly created store instance.
Raises
ValueError
If overwrite is False and the schema already contains a store.

retrieve()

Retrieve the most similar chunks to the given text.

Usage

Source

retrieve(text, top_k=3, *, deoverlap=True, attributes_filter=None)

Combines results from vector similarity search (if embeddings are available) and full-text search, then deduplicates by chunk id, merging metrics from both methods.

Parameters
text: str

The query text to search for.

top_k: int = 3

The maximum number of chunks to return from each retrieval method (VSS and FTS). Because results from both methods are combined before deoverlapping, the final count may differ from top_k.

deoverlap: bool = True

If True (default), merge overlapping chunks from the same document. Overlapping chunks are identified by their start_index and end_index positions.

attributes_filter: Optional[AttributeFilter] = None
Optional filter to scope retrieval using attribute columns. Can be a SQL-like string or a dict AST. Example string: "tenant = 'docs' AND priority >= 2".
Returns
list[RetrievedChunk]
The retrieved chunks with their relevance metrics.

retrieve_fts()

Retrieve chunks using PostgreSQL full-text search.

Usage

Source

retrieve_fts(text, top_k=3, *, attributes_filter=None)

Uses to_tsvector / plainto_tsquery with ts_rank for ranking results.

Parameters
text: str

The query text to search for.

top_k: int = 3

The maximum number of chunks to return.

attributes_filter: Optional[AttributeFilter] = None
Optional filter to scope retrieval using attribute columns. Can be a SQL-like string or a dict AST.
Returns
list[RetrievedChunk]
The matching chunks ranked by ts_rank score.

retrieve_vss()

Retrieve chunks using pgvector similarity search.

Usage

Source

retrieve_vss(query, top_k=3, *, method=None, attributes_filter=None)

Uses pgvector distance operators for nearest-neighbor search. For best performance, ensure an HNSW index exists for the chosen distance method (created automatically for cosine distance, or via build_index() for others).

Parameters
query: str | Sequence[float]

The query text or embedding vector. If a string is provided, it will be embedded using the store’s embedding provider.

top_k: int = 3

The maximum number of chunks to return.

method: Optional[str] = None

The distance method to use. Defaults to cosine distance. One of "cosine_distance", "l2_distance", or "inner_product".

attributes_filter: Optional[AttributeFilter] = None
Optional filter to scope retrieval using attribute columns. Can be a SQL-like string or a dict AST.
Returns
list[RetrievedChunk]
The most similar chunks with distance metrics.
Raises
ValueError
If query is a string but no embedding provider is configured.

size()

Count the number of documents in the store.

Usage

Source

size()
Returns
int
The number of documents (not chunks) in the store.

upsert()

Upsert a document into the store.

Usage

Source

upsert(document, *, skip_if_unchanged=True)

The document must be a ~raghilda.document.ChunkedMarkdownDocument. Use ~raghilda.chunker.MarkdownChunker to chunk a ~raghilda.document.MarkdownDocument before upserting.

Parameters
document: Document

The chunked document to upsert.

skip_if_unchanged: bool = True
If True (default), skip the write when the existing document for the same origin already has identical content. This avoids re-computing embeddings.