store.PostgreSQLStore
A store backed by a PostgreSQL database with pgvector.
Usage
store.PostgreSQLStore()Uses PostgreSQL for storage with two retrieval methods:
- Full-text search via retrieve_fts(): uses PostgreSQL’s built-in
tsvector/tsquerywithts_rankfor ranking. A pre-computedtsvectorcolumn with a GIN index is created automatically. - Vector similarity search via retrieve_vss(): uses pgvector for nearest-neighbor search over embeddings. An HNSW index for cosine distance is created automatically when an embedding provider is given. Use build_index() to add indexes for other distance methods (L2, inner product).
Methods
| Name | Description |
|---|---|
| build_index() | Build an HNSW index on the embedding column for the given distance method. |
| close() | Close the store’s database connection. |
| connect() | Connect to an existing PostgreSQL store. |
| create() | Create a new PostgreSQL store. |
| retrieve() | Retrieve the most similar chunks to the given text. |
| retrieve_fts() | Retrieve chunks using PostgreSQL full-text search. |
| retrieve_vss() | Retrieve chunks using pgvector similarity search. |
| size() | Count the number of documents in the store. |
| upsert() | Upsert a document into the store. |
build_index()
Build an HNSW index on the embedding column for the given distance method.
Usage
build_index(method)A cosine distance index is created by default when calling create() with an embedding provider. Use this method to add indexes for other distance methods.
Parameters
method: str- The distance method to index for.
close()
Close the store’s database connection.
Usage
close()connect()
Connect to an existing PostgreSQL store.
Usage
connect(con, schema="raghilda")Parameters
con: ConnectionLike-
A PostgreSQL connection string (e.g.
"postgresql://user:pass@localhost/mydb"). schema: str = "raghilda"-
PostgreSQL schema where the store tables live. Defaults to
"raghilda".
Returns
PostgreSQLStore- A connected store instance.
create()
Create a new PostgreSQL store.
Usage
create(
con,
embed,
name=None,
title=None,
attributes=None,
vss_index="cosine_distance",
schema="raghilda",
overwrite=False
)Parameters
con: ConnectionLike-
A PostgreSQL connection string (e.g.
"postgresql://user:pass@localhost/mydb"). embed: Optional[EmbeddingProvider]-
Embedding provider for generating vector embeddings. If None, only full-text search will be available.
name: Optional[str] = None-
Internal name for the store.
title: Optional[str] = None-
Human-readable title for the store.
attributes: Optional[AttributesSchemaSpec] = None-
Optional schema for user-defined attribute columns stored per chunk.
vss_index: Optional[str] = "cosine_distance"-
The distance method to build an HNSW index for. Defaults to cosine distance. Set to
Noneto skip creating a VSS index. Ignored when embed isNone. schema: str = "raghilda"-
PostgreSQL schema to create the store tables in. Defaults to
"raghilda". The schema is created if it does not exist. overwrite: bool = False- If False (default), raise an error when the schema already contains store tables. Set to True to drop the existing store and recreate it.
Returns
PostgreSQLStore- A newly created store instance.
Raises
ValueError-
If
overwriteis False and the schema already contains a store.
retrieve()
Retrieve the most similar chunks to the given text.
Usage
retrieve(text, top_k=3, *, deoverlap=True, attributes_filter=None)Combines results from vector similarity search (if embeddings are available) and full-text search, then deduplicates by chunk id, merging metrics from both methods.
Parameters
text: str-
The query text to search for.
top_k: int = 3-
The maximum number of chunks to return from each retrieval method (VSS and FTS). Because results from both methods are combined before deoverlapping, the final count may differ from
top_k. deoverlap: bool = True-
If True (default), merge overlapping chunks from the same document. Overlapping chunks are identified by their
start_indexandend_indexpositions. attributes_filter: Optional[AttributeFilter] = None-
Optional filter to scope retrieval using attribute columns. Can be a SQL-like string or a dict AST. Example string:
"tenant = 'docs' AND priority >= 2".
Returns
list[RetrievedChunk]- The retrieved chunks with their relevance metrics.
retrieve_fts()
Retrieve chunks using PostgreSQL full-text search.
Usage
retrieve_fts(text, top_k=3, *, attributes_filter=None)Uses to_tsvector / plainto_tsquery with ts_rank for ranking results.
Parameters
text: str-
The query text to search for.
top_k: int = 3-
The maximum number of chunks to return.
attributes_filter: Optional[AttributeFilter] = None- Optional filter to scope retrieval using attribute columns. Can be a SQL-like string or a dict AST.
Returns
list[RetrievedChunk]- The matching chunks ranked by ts_rank score.
retrieve_vss()
Retrieve chunks using pgvector similarity search.
Usage
retrieve_vss(query, top_k=3, *, method=None, attributes_filter=None)Uses pgvector distance operators for nearest-neighbor search. For best performance, ensure an HNSW index exists for the chosen distance method (created automatically for cosine distance, or via build_index() for others).
Parameters
query: str | Sequence[float]-
The query text or embedding vector. If a string is provided, it will be embedded using the store’s embedding provider.
top_k: int = 3-
The maximum number of chunks to return.
method: Optional[str] = None-
The distance method to use. Defaults to cosine distance. One of
"cosine_distance","l2_distance", or"inner_product". attributes_filter: Optional[AttributeFilter] = None- Optional filter to scope retrieval using attribute columns. Can be a SQL-like string or a dict AST.
Returns
list[RetrievedChunk]- The most similar chunks with distance metrics.
Raises
ValueError-
If
queryis a string but no embedding provider is configured.
size()
Count the number of documents in the store.
Usage
size()Returns
int- The number of documents (not chunks) in the store.
upsert()
Upsert a document into the store.
Usage
upsert(document, *, skip_if_unchanged=True)The document must be a ~raghilda.document.ChunkedMarkdownDocument. Use ~raghilda.chunker.MarkdownChunker to chunk a ~raghilda.document.MarkdownDocument before upserting.
Parameters
document: Document-
The chunked document to upsert.
skip_if_unchanged: bool = True- If True (default), skip the write when the existing document for the same origin already has identical content. This avoids re-computing embeddings.