Skip to main content

Embedding Models Comparison

A multi-provider embedding aggregator comparing 6 embedding providers (BGE, E5, Instructor, TEI, Mixedbread, Transformers) across 3 agents. The comparator runs all providers and analyzes dimensions and latency, the evaluator picks the best provider via quality-speed scoring and synthesizes recommendations with OpenAI.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys. All embedding providers use mock data.

Architecture

Key Code

Six embedding providers as @tool(embedding) calls

Each provider is recorded as a separate embedding tool call with latency and dimension tracking.

@waxell.tool(tool_type="embedding")
def embed_bge(texts: list) -> dict:
"""Embed with BGE (BAAI/bge-large-en-v1.5)."""
bge = MockFlagModel()
out = bge.encode(texts)
return {"model": bge.model_name, "dimensions": len(out[0]),
"vectors": len(out), "latency_ms": 45}

@waxell.tool(tool_type="embedding")
def embed_instructor(texts: list, instruction: str = "Represent the document:") -> dict:
"""Embed with Instructor (hkunlp/instructor-xl)."""
instructor = MockInstructorModel()
out = instructor.encode([[instruction, t] for t in texts])
return {"model": instructor._model_name, "dimensions": len(out[0]),
"vectors": len(out), "instruction": instruction, "latency_ms": 68}

Quality-speed scoring for best provider selection

The evaluator scores each provider on a weighted quality (60%) + speed (40%) metric.

@waxell.decision(name="pick_best_provider",
options=["bge", "e5", "instructor", "tei", "mixedbread", "transformers"])
async def pick_best(comp: dict) -> dict:
comparison_list = comp["comparison"]
max_dim = max(c["dimensions"] for c in comparison_list)
max_lat = max(c["latency_ms"] for c in comparison_list) or 1
scored = []
for c in comparison_list:
quality = c["dimensions"] / max_dim
speed = 1.0 - (c["latency_ms"] / max_lat)
scored.append((c["provider"], 0.6 * quality + 0.4 * speed))
best = max(scored, key=lambda x: x[1])
return {"chosen": best[0], "reasoning": f"Balanced score {best[1]:.3f}"}

waxell.score("embedding_coverage", 1.0, comment="all 6 providers tested")
waxell.score("recommendation_quality", 0.91, comment="comprehensive comparison")
waxell.score("cost_efficiency", 0.95, comment="5 of 6 providers are free")

What this demonstrates

  • @waxell.observe -- parent-child agent hierarchy with automatic lineage
  • @waxell.step_dec -- query preprocessing and embedding comparison steps
  • @waxell.tool -- six embedding providers recorded with tool_type="embedding"
  • @waxell.decision -- strategy selection, best provider pick, and routing decisions
  • waxell.decide() -- execution scope and synthesis provider routing
  • @waxell.reasoning_dec -- dimension analysis and overall quality evaluation
  • waxell.score() -- coverage, recommendation quality, and cost efficiency scores
  • 6-provider comparison -- BGE, E5, Instructor, TEI, Mixedbread, Transformers
  • Quality-speed tradeoff -- weighted scoring for provider recommendation

Run it

# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.embedding_models_agent --dry-run

# Live (real OpenAI for synthesis)
export OPENAI_API_KEY="sk-..."
python -m app.demos.embedding_models_agent

Source

dev/waxell-dev/app/demos/embedding_models_agent.py