Cohere Rerank Agent

A multi-agent RAG pipeline with Cohere embedding and reranking. A parent orchestrator coordinates 3 child agents -- a retriever that embeds the query and documents with Cohere, ranks initial results by cosine similarity, a reranker that rescores results using Cohere's rerank API and selects the top-k, and a synthesizer that generates answers with quality assessment. Demonstrates embedding and reranking tool spans with token tracking.

Environment variables

This example requires OPENAI_API_KEY, COHERE_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to skip real API calls.

Architecture

Key Code

Cohere Embedding with `@tool(tool_type="embedding")`

The retriever embeds both the query and candidate documents using Cohere's embedding API.

@waxell.tool(tool_type="embedding")
def embed_query(query: str, model: str = "embed-english-v3.0") -> dict:
    """Embed query text using Cohere embedding model."""
    return {"embedding": query_embedding, "model": model, "tokens": token_count}

@waxell.tool(tool_type="embedding")
def embed_documents(documents: list, model: str = "embed-english-v3.0") -> dict:
    """Embed candidate documents using Cohere embedding model."""
    return {"embeddings": doc_embeddings, "count": len(documents), "tokens": total_tokens}

Cohere Reranking with `@tool(tool_type="reranking")`

The reranker rescores initial results using Cohere's cross-encoder rerank model.

@waxell.tool(tool_type="reranking")
def cohere_rerank(query: str, documents: list,
                  model: str = "rerank-english-v3.0", top_n: int = 3) -> dict:
    """Rerank documents using Cohere rerank API."""
    return {
        "reranked": reranked_docs,
        "top_score": reranked_docs[0]["relevance_score"],
        "model": model,
        "tokens": token_count,
    }

@waxell.retrieval(source="cohere")
def collect_reranked_results(reranked_docs: list) -> list[dict]:
    """Collect reranked results with scores."""
    return [{"text": d["text"], "score": d["relevance_score"], "rank": i}
            for i, d in enumerate(reranked_docs)]

What this demonstrates

@waxell.observe -- parent-child agent hierarchy (orchestrator + 3 child agents) with automatic lineage
@waxell.tool(tool_type="embedding") -- Cohere embed-english-v3.0 calls for query and document embedding
@waxell.tool(tool_type="reranking") -- Cohere rerank-english-v3.0 call with relevance rescoring
@waxell.retrieval(source="cohere") -- initial ranking and reranked result collection
@waxell.decision -- embedding model selection and top-k threshold selection
@waxell.reasoning_dec -- answer quality assessment
@waxell.step_dec -- query preprocessing
Token tracking -- embedding and reranking token counts recorded in tool outputs
Auto-instrumented LLM calls -- OpenAI synthesis captured without extra code
Embed + Rerank pipeline -- Cohere embedding followed by cross-encoder reranking

Run it

# Dry-run mode (no API key needed)
cd dev/waxell-dev
python -m app.demos.cohere_rerank_agent --dry-run

# Live mode
export COHERE_API_KEY="..."
export OPENAI_API_KEY="sk-..."
python -m app.demos.cohere_rerank_agent

# Custom query
python -m app.demos.cohere_rerank_agent --dry-run --query "Best practices for RAG optimization"

Source

dev/waxell-dev/app/demos/cohere_rerank_agent.py

Architecture​

Key Code​

Cohere Embedding with @tool(tool_type="embedding")​

Cohere Reranking with @tool(tool_type="reranking")​

What this demonstrates​

Run it​

Source​