from raghilda.embedding import EmbeddingSentenceTransformers
provider = EmbeddingSentenceTransformers(model="all-MiniLM-L6-v2")
embeddings = provider.embed(["hello world", "testing embeddings"])
print(len(embeddings))
print(len(embeddings[0])) # Dimension of the embeddingembedding.EmbeddingSentenceTransformers
Creates an embedding function provider backed by sentence-transformers models.
Usage
embedding.EmbeddingSentenceTransformers()Implements the EmbeddingProvider interface.
This provider runs models locally using the sentence-transformers library, enabling offline/private embedding without external API calls.
Parameters
model: str = "all-MiniLM-L6-v2"-
The sentence-transformers model to use. Default is “all-MiniLM-L6-v2”. Any model from the Hugging Face Hub that is compatible with sentence-transformers can be used.
device: Optional[str] = None-
The device to run the model on (e.g., “cpu”, “cuda”, “mps”). If None, sentence-transformers will auto-detect the best available device.
batch_size: int = 64-
The number of texts to process in each batch.
prompts: Optional[dict[EmbedInputType, str]] = None- Optional mapping from EmbedInputType to a prefix string to prepend to each text before encoding. This is useful for models that require task-specific prefixes (e.g., nomic-embed-text uses “search_query:” and “search_document:”).
Examples
Install raghilda with sentence-transformers support:
pip install raghilda[sentence-transformers]For models that use task-specific prefixes:
from raghilda.embedding import EmbeddingSentenceTransformers, EmbedInputType
provider = EmbeddingSentenceTransformers(
model="nomic-ai/nomic-embed-text-v1.5",
prompts={
EmbedInputType.QUERY: "search_query: ",
EmbedInputType.DOCUMENT: "search_document: ",
},
)
# Queries get "search_query: " prepended automatically
query_emb = provider.embed(["Who is Laurens van Der Maaten?"], EmbedInputType.QUERY)
# Documents get "search_document: " prepended automatically
doc_emb = provider.embed(["TSNE is a dimensionality reduction algorithm"])