Sentence Transformers

A local embedding pipeline using sentence-transformers with real all-MiniLM-L6-v2 model inference. Demonstrates zero-cost local attribution with real vector operations -- cosine similarity search using numpy, synthesis approach decisions, and retrieval quality assessment. All embedding generation runs locally with no API costs.

Environment variables

This example requires OPENAI_API_KEY (for LLM synthesis only), WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys. Embedding generation is always local and free.

Architecture

Key Code

Real local embeddings with `@tool(embedding)`

Model loading and encoding are recorded as embedding tool calls -- all run locally.

@waxell.tool(tool_type="embedding", name="load_model")
def load_model(model_name: str = "all-MiniLM-L6-v2") -> dict:
    from sentence_transformers import SentenceTransformer
    model = SentenceTransformer(model_name)
    return {"model": model_name, "dimensions": model.get_sentence_embedding_dimension(),
            "_model_obj": model}

@waxell.tool(tool_type="embedding", name="encode_documents")
def encode_documents(model, texts: list[str]) -> dict:
    embeddings = model.encode(texts, show_progress_bar=False)
    return {"count": len(texts), "dimensions": embeddings.shape[1], "_embeddings": embeddings}

Real cosine similarity with `@retrieval` and numpy

Similarity search uses real numpy dot product on actual embeddings.

@waxell.retrieval(source="sentence_transformers")
def cosine_similarity_search(query_embedding, doc_embeddings, texts, top_k=3) -> list[dict]:
    q_norm = query_embedding / np.linalg.norm(query_embedding)
    d_norms = doc_embeddings / np.linalg.norm(doc_embeddings, axis=1, keepdims=True)
    scores = d_norms @ q_norm
    top_indices = np.argsort(scores)[::-1][:top_k]
    return [{"id": str(idx), "text": texts[idx], "score": round(float(scores[idx]), 4)}
            for idx in top_indices]

@waxell.decision(name="choose_synthesis_approach", options=["direct_quote", "paraphrase", "summary"])
def choose_synthesis_approach(results: list[dict]) -> dict:
    top_score = results[0]["score"]
    if top_score > 0.9:
        return {"chosen": "direct_quote", "reasoning": f"Top score={top_score:.3f}"}
    return {"chosen": "paraphrase", "reasoning": "Mixed relevance scores"}

What this demonstrates

@waxell.observe -- single-agent pipeline with full decorator coverage
@waxell.tool -- model loading, document encoding, and query encoding with tool_type="embedding"
@waxell.retrieval -- real cosine similarity search with source="sentence_transformers"
@waxell.decision -- synthesis approach selection based on score distribution
@waxell.reasoning_dec -- retrieval quality evaluation
waxell.score() -- average cosine similarity and top relevance scores
Zero-cost local embeddings -- all embedding generation is local and free
Real vector math -- actual numpy cosine similarity on real embeddings

Run it

# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.sentence_transformers_agent --dry-run

# Live (real OpenAI for synthesis only)
export OPENAI_API_KEY="sk-..."
python -m app.demos.sentence_transformers_agent

Source

dev/waxell-dev/app/demos/sentence_transformers_agent.py

Architecture​

Key Code​

Real local embeddings with @tool(embedding)​

Real cosine similarity with @retrieval and numpy​

What this demonstrates​

Run it​

Source​