Skip to main content

Sentence Transformers

A local embedding pipeline using sentence-transformers with real all-MiniLM-L6-v2 model inference. Demonstrates zero-cost local attribution with real vector operations -- cosine similarity search using numpy, synthesis approach decisions, and retrieval quality assessment. All embedding generation runs locally with no API costs.

Environment variables

This example requires OPENAI_API_KEY (for LLM synthesis only), WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys. Embedding generation is always local and free.

Architecture

Key Code

Real local embeddings with @tool(embedding)

Model loading and encoding are recorded as embedding tool calls -- all run locally.

@waxell.tool(tool_type="embedding", name="load_model")
def load_model(model_name: str = "all-MiniLM-L6-v2") -> dict:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(model_name)
return {"model": model_name, "dimensions": model.get_sentence_embedding_dimension(),
"_model_obj": model}

@waxell.tool(tool_type="embedding", name="encode_documents")
def encode_documents(model, texts: list[str]) -> dict:
embeddings = model.encode(texts, show_progress_bar=False)
return {"count": len(texts), "dimensions": embeddings.shape[1], "_embeddings": embeddings}

Real cosine similarity with @retrieval and numpy

Similarity search uses real numpy dot product on actual embeddings.

@waxell.retrieval(source="sentence_transformers")
def cosine_similarity_search(query_embedding, doc_embeddings, texts, top_k=3) -> list[dict]:
q_norm = query_embedding / np.linalg.norm(query_embedding)
d_norms = doc_embeddings / np.linalg.norm(doc_embeddings, axis=1, keepdims=True)
scores = d_norms @ q_norm
top_indices = np.argsort(scores)[::-1][:top_k]
return [{"id": str(idx), "text": texts[idx], "score": round(float(scores[idx]), 4)}
for idx in top_indices]

@waxell.decision(name="choose_synthesis_approach", options=["direct_quote", "paraphrase", "summary"])
def choose_synthesis_approach(results: list[dict]) -> dict:
top_score = results[0]["score"]
if top_score > 0.9:
return {"chosen": "direct_quote", "reasoning": f"Top score={top_score:.3f}"}
return {"chosen": "paraphrase", "reasoning": "Mixed relevance scores"}

What this demonstrates

  • @waxell.observe -- single-agent pipeline with full decorator coverage
  • @waxell.tool -- model loading, document encoding, and query encoding with tool_type="embedding"
  • @waxell.retrieval -- real cosine similarity search with source="sentence_transformers"
  • @waxell.decision -- synthesis approach selection based on score distribution
  • @waxell.reasoning_dec -- retrieval quality evaluation
  • waxell.score() -- average cosine similarity and top relevance scores
  • Zero-cost local embeddings -- all embedding generation is local and free
  • Real vector math -- actual numpy cosine similarity on real embeddings

Run it

# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.sentence_transformers_agent --dry-run

# Live (real OpenAI for synthesis only)
export OPENAI_API_KEY="sk-..."
python -m app.demos.sentence_transformers_agent

Source

dev/waxell-dev/app/demos/sentence_transformers_agent.py