Embedding Models Comparison
A multi-provider embedding aggregator comparing 6 embedding providers (BGE, E5, Instructor, TEI, Mixedbread, Transformers) across 3 agents. The comparator runs all providers and analyzes dimensions and latency, the evaluator picks the best provider via quality-speed scoring and synthesizes recommendations with OpenAI.
Environment variables
This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys. All embedding providers use mock data.
Architecture
Key Code
Six embedding providers as @tool(embedding) calls
Each provider is recorded as a separate embedding tool call with latency and dimension tracking.
@waxell.tool(tool_type="embedding")
def embed_bge(texts: list) -> dict:
"""Embed with BGE (BAAI/bge-large-en-v1.5)."""
bge = MockFlagModel()
out = bge.encode(texts)
return {"model": bge.model_name, "dimensions": len(out[0]),
"vectors": len(out), "latency_ms": 45}
@waxell.tool(tool_type="embedding")
def embed_instructor(texts: list, instruction: str = "Represent the document:") -> dict:
"""Embed with Instructor (hkunlp/instructor-xl)."""
instructor = MockInstructorModel()
out = instructor.encode([[instruction, t] for t in texts])
return {"model": instructor._model_name, "dimensions": len(out[0]),
"vectors": len(out), "instruction": instruction, "latency_ms": 68}
Quality-speed scoring for best provider selection
The evaluator scores each provider on a weighted quality (60%) + speed (40%) metric.
@waxell.decision(name="pick_best_provider",
options=["bge", "e5", "instructor", "tei", "mixedbread", "transformers"])
async def pick_best(comp: dict) -> dict:
comparison_list = comp["comparison"]
max_dim = max(c["dimensions"] for c in comparison_list)
max_lat = max(c["latency_ms"] for c in comparison_list) or 1
scored = []
for c in comparison_list:
quality = c["dimensions"] / max_dim
speed = 1.0 - (c["latency_ms"] / max_lat)
scored.append((c["provider"], 0.6 * quality + 0.4 * speed))
best = max(scored, key=lambda x: x[1])
return {"chosen": best[0], "reasoning": f"Balanced score {best[1]:.3f}"}
waxell.score("embedding_coverage", 1.0, comment="all 6 providers tested")
waxell.score("recommendation_quality", 0.91, comment="comprehensive comparison")
waxell.score("cost_efficiency", 0.95, comment="5 of 6 providers are free")
What this demonstrates
@waxell.observe-- parent-child agent hierarchy with automatic lineage@waxell.step_dec-- query preprocessing and embedding comparison steps@waxell.tool-- six embedding providers recorded withtool_type="embedding"@waxell.decision-- strategy selection, best provider pick, and routing decisionswaxell.decide()-- execution scope and synthesis provider routing@waxell.reasoning_dec-- dimension analysis and overall quality evaluationwaxell.score()-- coverage, recommendation quality, and cost efficiency scores- 6-provider comparison -- BGE, E5, Instructor, TEI, Mixedbread, Transformers
- Quality-speed tradeoff -- weighted scoring for provider recommendation
Run it
# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.embedding_models_agent --dry-run
# Live (real OpenAI for synthesis)
export OPENAI_API_KEY="sk-..."
python -m app.demos.embedding_models_agent