LanceDB Agent

A multi-agent serverless vector search pipeline using LanceDB. A parent orchestrator coordinates 3 child agents -- an indexer that creates tables and adds vectors in Lance columnar format, a searcher that performs vector search and evaluates coverage, and a synthesizer that builds a synthesis prompt, generates an answer, and assesses quality. The pipeline demonstrates SDK primitives across LanceDB-specific operations including zero-copy table creation and distance-based search.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to skip real API calls.

Architecture

Key Code

LanceDB Table and Vector Operations with `@tool`

The indexer creates a table in Lance format and adds vectors. The searcher performs distance-based vector search.

@waxell.tool(tool_type="vector_db")
def create_table(table_name: str, num_rows: int,
                 storage_format: str = "lance") -> dict:
    """Create a LanceDB table with vectorized documents."""
    return {"table": table_name, "rows": num_rows, "format": storage_format}

@waxell.tool(tool_type="vector_db")
def add_vectors(table_name: str, documents: list) -> dict:
    """Add vector embeddings to a LanceDB table."""
    return {"table": table_name, "added": len(documents), "format": "lance"}

@waxell.tool(tool_type="vector_db")
def vector_search(table_name: str, query: str, limit: int = 3) -> dict:
    """Search a LanceDB table for similar vectors."""
    return {"results": len(results), "top_distance": results[0]["distance"]}

Coverage Evaluation with `@reasoning`

The searcher evaluates how well results cover the query intent using category analysis.

@waxell.reasoning_dec(step="search_coverage_evaluation")
async def evaluate_search_coverage(results: list, query: str) -> dict:
    categories = [r.get("metadata", {}).get("category", "unknown") for r in results]
    avg_score = sum(r.get("score", 0) for r in results) / max(len(results), 1)
    return {
        "thought": f"LanceDB returned {len(results)} results with avg relevance {avg_score:.3f}. "
                   f"Categories covered: {', '.join(set(categories))}.",
        "evidence": [f"{r['id']}: {r.get('metadata', {}).get('category', '?')} (score: {r.get('score', 0):.3f})"
                     for r in results],
        "conclusion": "Good coverage" if avg_score > 0.5 else "Coverage may be insufficient",
    }

What this demonstrates

@waxell.observe -- parent-child agent hierarchy (orchestrator + 3 child agents) with automatic lineage via WaxellContext
@waxell.tool(tool_type="vector_db") -- LanceDB operations (create table, add vectors, vector search) recorded as tool spans
@waxell.retrieval(source="lancedb") -- distance-based result ranking recorded with LanceDB as the source
@waxell.decision -- search strategy classification via OpenAI (semantic, hybrid, keyword)
waxell.decide() -- manual table routing decision based on search strategy
@waxell.reasoning_dec -- search coverage evaluation and answer quality assessment
@waxell.step_dec -- query preprocessing and synthesis prompt construction
waxell.score() -- answer quality and retrieval relevance scores attached to the trace
waxell.tag() / waxell.metadata() -- vector DB type, table name, storage format (Lance), and corpus size
Auto-instrumented LLM calls -- OpenAI calls captured without extra code
Serverless vector search -- Lance columnar format with zero-copy operations

Run it

# Dry-run mode (no API key needed)
cd dev/waxell-dev
python -m app.demos.lancedb_agent --dry-run

# Live mode
export OPENAI_API_KEY="sk-..."
python -m app.demos.lancedb_agent

# Custom query
python -m app.demos.lancedb_agent --dry-run --query "Find zero-copy vector patterns"

Source

dev/waxell-dev/app/demos/lancedb_agent.py

Architecture​

Key Code​

LanceDB Table and Vector Operations with @tool​

Coverage Evaluation with @reasoning​

What this demonstrates​

Run it​

Source​