Skip to main content

LanceDB Agent

A multi-agent serverless vector search pipeline using LanceDB. A parent orchestrator coordinates 3 child agents -- an indexer that creates tables and adds vectors in Lance columnar format, a searcher that performs vector search and evaluates coverage, and a synthesizer that builds a synthesis prompt, generates an answer, and assesses quality. The pipeline demonstrates SDK primitives across LanceDB-specific operations including zero-copy table creation and distance-based search.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to skip real API calls.

Architecture

Key Code

LanceDB Table and Vector Operations with @tool

The indexer creates a table in Lance format and adds vectors. The searcher performs distance-based vector search.

@waxell.tool(tool_type="vector_db")
def create_table(table_name: str, num_rows: int,
storage_format: str = "lance") -> dict:
"""Create a LanceDB table with vectorized documents."""
return {"table": table_name, "rows": num_rows, "format": storage_format}

@waxell.tool(tool_type="vector_db")
def add_vectors(table_name: str, documents: list) -> dict:
"""Add vector embeddings to a LanceDB table."""
return {"table": table_name, "added": len(documents), "format": "lance"}

@waxell.tool(tool_type="vector_db")
def vector_search(table_name: str, query: str, limit: int = 3) -> dict:
"""Search a LanceDB table for similar vectors."""
return {"results": len(results), "top_distance": results[0]["distance"]}

Coverage Evaluation with @reasoning

The searcher evaluates how well results cover the query intent using category analysis.

@waxell.reasoning_dec(step="search_coverage_evaluation")
async def evaluate_search_coverage(results: list, query: str) -> dict:
categories = [r.get("metadata", {}).get("category", "unknown") for r in results]
avg_score = sum(r.get("score", 0) for r in results) / max(len(results), 1)
return {
"thought": f"LanceDB returned {len(results)} results with avg relevance {avg_score:.3f}. "
f"Categories covered: {', '.join(set(categories))}.",
"evidence": [f"{r['id']}: {r.get('metadata', {}).get('category', '?')} (score: {r.get('score', 0):.3f})"
for r in results],
"conclusion": "Good coverage" if avg_score > 0.5 else "Coverage may be insufficient",
}

What this demonstrates

  • @waxell.observe -- parent-child agent hierarchy (orchestrator + 3 child agents) with automatic lineage via WaxellContext
  • @waxell.tool(tool_type="vector_db") -- LanceDB operations (create table, add vectors, vector search) recorded as tool spans
  • @waxell.retrieval(source="lancedb") -- distance-based result ranking recorded with LanceDB as the source
  • @waxell.decision -- search strategy classification via OpenAI (semantic, hybrid, keyword)
  • waxell.decide() -- manual table routing decision based on search strategy
  • @waxell.reasoning_dec -- search coverage evaluation and answer quality assessment
  • @waxell.step_dec -- query preprocessing and synthesis prompt construction
  • waxell.score() -- answer quality and retrieval relevance scores attached to the trace
  • waxell.tag() / waxell.metadata() -- vector DB type, table name, storage format (Lance), and corpus size
  • Auto-instrumented LLM calls -- OpenAI calls captured without extra code
  • Serverless vector search -- Lance columnar format with zero-copy operations

Run it

# Dry-run mode (no API key needed)
cd dev/waxell-dev
python -m app.demos.lancedb_agent --dry-run

# Live mode
export OPENAI_API_KEY="sk-..."
python -m app.demos.lancedb_agent

# Custom query
python -m app.demos.lancedb_agent --dry-run --query "Find zero-copy vector patterns"

Source

dev/waxell-dev/app/demos/lancedb_agent.py