Skip to main content

Cohere Rerank Agent

A multi-agent RAG pipeline with Cohere embedding and reranking. A parent orchestrator coordinates 3 child agents -- a retriever that embeds the query and documents with Cohere, ranks initial results by cosine similarity, a reranker that rescores results using Cohere's rerank API and selects the top-k, and a synthesizer that generates answers with quality assessment. Demonstrates embedding and reranking tool spans with token tracking.

Environment variables

This example requires OPENAI_API_KEY, COHERE_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to skip real API calls.

Architecture

Key Code

Cohere Embedding with @tool(tool_type="embedding")

The retriever embeds both the query and candidate documents using Cohere's embedding API.

@waxell.tool(tool_type="embedding")
def embed_query(query: str, model: str = "embed-english-v3.0") -> dict:
"""Embed query text using Cohere embedding model."""
return {"embedding": query_embedding, "model": model, "tokens": token_count}

@waxell.tool(tool_type="embedding")
def embed_documents(documents: list, model: str = "embed-english-v3.0") -> dict:
"""Embed candidate documents using Cohere embedding model."""
return {"embeddings": doc_embeddings, "count": len(documents), "tokens": total_tokens}

Cohere Reranking with @tool(tool_type="reranking")

The reranker rescores initial results using Cohere's cross-encoder rerank model.

@waxell.tool(tool_type="reranking")
def cohere_rerank(query: str, documents: list,
model: str = "rerank-english-v3.0", top_n: int = 3) -> dict:
"""Rerank documents using Cohere rerank API."""
return {
"reranked": reranked_docs,
"top_score": reranked_docs[0]["relevance_score"],
"model": model,
"tokens": token_count,
}

@waxell.retrieval(source="cohere")
def collect_reranked_results(reranked_docs: list) -> list[dict]:
"""Collect reranked results with scores."""
return [{"text": d["text"], "score": d["relevance_score"], "rank": i}
for i, d in enumerate(reranked_docs)]

What this demonstrates

  • @waxell.observe -- parent-child agent hierarchy (orchestrator + 3 child agents) with automatic lineage
  • @waxell.tool(tool_type="embedding") -- Cohere embed-english-v3.0 calls for query and document embedding
  • @waxell.tool(tool_type="reranking") -- Cohere rerank-english-v3.0 call with relevance rescoring
  • @waxell.retrieval(source="cohere") -- initial ranking and reranked result collection
  • @waxell.decision -- embedding model selection and top-k threshold selection
  • @waxell.reasoning_dec -- answer quality assessment
  • @waxell.step_dec -- query preprocessing
  • Token tracking -- embedding and reranking token counts recorded in tool outputs
  • Auto-instrumented LLM calls -- OpenAI synthesis captured without extra code
  • Embed + Rerank pipeline -- Cohere embedding followed by cross-encoder reranking

Run it

# Dry-run mode (no API key needed)
cd dev/waxell-dev
python -m app.demos.cohere_rerank_agent --dry-run

# Live mode
export COHERE_API_KEY="..."
export OPENAI_API_KEY="sk-..."
python -m app.demos.cohere_rerank_agent

# Custom query
python -m app.demos.cohere_rerank_agent --dry-run --query "Best practices for RAG optimization"

Source

dev/waxell-dev/app/demos/cohere_rerank_agent.py