Skip to main content

Multi-Agent Coordination Agent

A 15-step cross-framework stress test that simulates 3 agent frameworks (CrewAI, AutoGen, Agno) coordinating on a research task with shared Zep memory, Pinecone vector retrieval, and DeepEval evaluation. The CrewAI phase runs researcher + writer agents, AutoGen runs a 2-round fact-checker + critic group chat with @reasoning, and Agno synthesizes the final output using Groq for fast inference. Produces 3 evaluation scores and an accept/revise verdict.

Environment variables

This example runs in dry-run mode by default (no API key needed). For live mode, set OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL.

Architecture

Key Code

Cross-framework tool calls with shared Zep memory

Each framework stores its results in shared Zep memory for downstream consumption.

@waxell.tool(tool_type="memory")
def zep_memory_add(zep_client, session_id, messages, phase="") -> dict:
return zep_client.memory.add(session_id=session_id, messages=messages)

@waxell.tool(tool_type="agent_framework")
def crewai_agent_execute(role: str, task: str) -> dict:
return {"output_length": len(response), "status": "completed"}

@waxell.tool(tool_type="agent_framework")
def agno_agent_run(agent, prompt, context) -> dict:
result = agent.run(prompt=prompt, context=context)
return {"output_length": len(SYNTHESIS_RESPONSE), "status": "completed"}

AutoGen multi-round reasoning and accept/revise verdict

4 rounds of @reasoning capture the fact-checker and critic debate, followed by a deployment decision.

@waxell.reasoning_dec(step="fact_check_round1")
def reasoning_fact_check_r1() -> dict:
return {
"thought": "CrewAI claims verified. Agno characterization oversimplified. Governance omitted.",
"evidence": ["CrewAI docs", "AutoGen docs", "Agno README"],
"conclusion": "Research factually sound but has completeness gaps.",
}

@waxell.decision(name="accept_or_revise", options=["accept", "revise"])
def decide_accept_or_revise(eval_scores, avg_score, all_passed) -> dict:
verdict = "accept" if all_passed and avg_score >= 0.80 else "revise"
return {"chosen": verdict, "reasoning": f"All metrics passed, avg {avg_score:.2f}", "confidence": 0.93}

What this demonstrates

  • 3 agent frameworks -- CrewAI (supervisor-worker), AutoGen (group chat), Agno (reactive pipeline) in a single trace.
  • @waxell.tool(tool_type="memory") -- 4 Zep memory operations (3 adds, 1 search) for shared state.
  • @waxell.tool(tool_type="agent_framework") -- CrewAI and Agno agent execution recorded.
  • @waxell.retrieval -- Pinecone vector retrieval and Zep memory retrieval.
  • @waxell.reasoning_dec -- 4 reasoning chains (2 fact-checker rounds, 2 critic rounds).
  • @waxell.decision -- accept/revise verdict based on evaluation scores.
  • waxell.score() -- faithfulness, coherence, completeness, and pipeline_quality scores.
  • @waxell.step_dec -- 15 step recordings across all phases.
  • 15-step pipeline -- maximum-depth cross-framework coordination stress test.

Run it

# Dry-run (no API key needed)
python -m app.demos.multi_agent_coordination_agent --dry-run

# Live mode
OPENAI_API_KEY=sk-... python -m app.demos.multi_agent_coordination_agent

Source

dev/waxell-dev/app/demos/multi_agent_coordination_agent.py