Lightweight Vector Engines Agent

A multi-agent comparison pipeline across 5 lightweight/in-process vector search engines (Annoy, hnswlib, USearch, ScaNN, DuckDB). A parent orchestrator coordinates 3 child agents -- a builder that constructs all indices, a searcher that queries all engines and collects results, and an evaluator that reasons about quality trade-offs and synthesizes a recommendation with an LLM.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to skip real API calls. All engine libraries are mocked for portability.

Architecture

Key Code

Index Construction and Search with `@tool`

The builder constructs 5 different indices. The searcher queries all 5 engines, each recorded as a separate tool span.

@waxell.tool(tool_type="vector_db")
def build_annoy_index(dim: int, n_docs: int, mock_vectors: list) -> dict:
    """Build an Annoy index from vectors."""
    idx = MockAnnoyIndex(f=dim, metric="angular")
    for i, vec in enumerate(mock_vectors):
        idx.add_item(i, vec)
    idx.build(10)
    return {"index": idx, "n_trees": 10, "n_items": idx.get_n_items()}

@waxell.tool(tool_type="vector_db")
def search_hnsw(index, query_vector: list, k: int = 3) -> dict:
    """Search HNSW index for nearest neighbors."""
    labels, distances = index.knn_query([query_vector], k=k)
    return {"labels": labels[0], "distances": distances[0], "num_results": len(labels[0]), "latency_ms": 0}

@waxell.tool(tool_type="vector_db")
def search_duckdb(connection, k: int = 3) -> dict:
    """Search DuckDB using array_cosine_similarity SQL function."""
    result = connection.execute(
        "SELECT id, title, array_cosine_similarity(embedding, $1) AS distance "
        "FROM documents ORDER BY distance DESC LIMIT 3"
    )
    return {"rows": result.fetchall(), "sql_function": "array_cosine_similarity", "latency_ms": 5}

Multi-Engine Collection and Evaluation

@waxell.retrieval(source="multi-engine")
def collect_search_results(engine_results: dict) -> list[dict]:
    """Collect and normalize search results from all engines."""
    collected = [{"engine": name, "num_results": r["num_results"],
                  "latency_ms": r["latency_ms"]} for name, r in engine_results.items()]
    collected.sort(key=lambda x: x["latency_ms"])
    return collected

@waxell.reasoning_dec(step="engine_quality_assessment")
async def evaluate_engines(comparison: list[dict]) -> dict:
    fastest = comparison[0]["engine"]
    return {
        "thought": f"Compared {len(comparison)} engines. Fastest: {fastest}.",
        "evidence": [f"{c['engine']}: {c['num_results']} results in {c['latency_ms']}ms" for c in comparison],
        "conclusion": f"{fastest} best latency; graph-based engines best recall-latency trade-off",
    }

What this demonstrates

@waxell.observe -- parent-child agent hierarchy (orchestrator + 3 child agents) with automatic lineage via WaxellContext
@waxell.tool(tool_type="vector_db") -- 10 tool spans (5 build + 5 search) across Annoy, hnswlib, USearch, ScaNN, and DuckDB
@waxell.retrieval(source="multi-engine") -- cross-engine result collection normalized by latency
@waxell.decision -- engine family selection (all, tree-based, graph-based, sql-based)
waxell.decide() -- engine recommendation with confidence score
@waxell.reasoning_dec -- engine quality assessment comparing recall vs latency trade-offs
waxell.score() -- comparison coverage and recommendation confidence scores
Auto-instrumented LLM calls -- OpenAI synthesis captured without extra code
5 in-process engines -- Annoy (trees), hnswlib (HNSW), USearch (HNSW), ScaNN (hybrid), DuckDB (SQL) compared

Run it

# Dry-run mode (no API key needed)
cd dev/waxell-dev
python -m app.demos.lightweight_vector_agent --dry-run

# Live mode
export OPENAI_API_KEY="sk-..."
python -m app.demos.lightweight_vector_agent

# Custom query
python -m app.demos.lightweight_vector_agent --dry-run --query "Compare vector search engines"

Source

dev/waxell-dev/app/demos/lightweight_vector_agent.py

Architecture​

Key Code​

Index Construction and Search with @tool​

Multi-Engine Collection and Evaluation​

What this demonstrates​

Run it​

Source​