Sentence Transformers
A local embedding pipeline using sentence-transformers with real all-MiniLM-L6-v2 model inference. Demonstrates zero-cost local attribution with real vector operations -- cosine similarity search using numpy, synthesis approach decisions, and retrieval quality assessment. All embedding generation runs locally with no API costs.
Environment variables
This example requires OPENAI_API_KEY (for LLM synthesis only), WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys. Embedding generation is always local and free.
Architecture
Key Code
Real local embeddings with @tool(embedding)
Model loading and encoding are recorded as embedding tool calls -- all run locally.
@waxell.tool(tool_type="embedding", name="load_model")
def load_model(model_name: str = "all-MiniLM-L6-v2") -> dict:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(model_name)
return {"model": model_name, "dimensions": model.get_sentence_embedding_dimension(),
"_model_obj": model}
@waxell.tool(tool_type="embedding", name="encode_documents")
def encode_documents(model, texts: list[str]) -> dict:
embeddings = model.encode(texts, show_progress_bar=False)
return {"count": len(texts), "dimensions": embeddings.shape[1], "_embeddings": embeddings}
Real cosine similarity with @retrieval and numpy
Similarity search uses real numpy dot product on actual embeddings.
@waxell.retrieval(source="sentence_transformers")
def cosine_similarity_search(query_embedding, doc_embeddings, texts, top_k=3) -> list[dict]:
q_norm = query_embedding / np.linalg.norm(query_embedding)
d_norms = doc_embeddings / np.linalg.norm(doc_embeddings, axis=1, keepdims=True)
scores = d_norms @ q_norm
top_indices = np.argsort(scores)[::-1][:top_k]
return [{"id": str(idx), "text": texts[idx], "score": round(float(scores[idx]), 4)}
for idx in top_indices]
@waxell.decision(name="choose_synthesis_approach", options=["direct_quote", "paraphrase", "summary"])
def choose_synthesis_approach(results: list[dict]) -> dict:
top_score = results[0]["score"]
if top_score > 0.9:
return {"chosen": "direct_quote", "reasoning": f"Top score={top_score:.3f}"}
return {"chosen": "paraphrase", "reasoning": "Mixed relevance scores"}
What this demonstrates
@waxell.observe-- single-agent pipeline with full decorator coverage@waxell.tool-- model loading, document encoding, and query encoding withtool_type="embedding"@waxell.retrieval-- real cosine similarity search withsource="sentence_transformers"@waxell.decision-- synthesis approach selection based on score distribution@waxell.reasoning_dec-- retrieval quality evaluationwaxell.score()-- average cosine similarity and top relevance scores- Zero-cost local embeddings -- all embedding generation is local and free
- Real vector math -- actual numpy cosine similarity on real embeddings
Run it
# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.sentence_transformers_agent --dry-run
# Live (real OpenAI for synthesis only)
export OPENAI_API_KEY="sk-..."
python -m app.demos.sentence_transformers_agent