Skip to main content

AutoGen

An AutoGen-style multi-agent group chat with a parent orchestrator coordinating 2 child agents -- a runner and an evaluator. The runner executes planner and engineer agents in a round-robin group chat, while the evaluator reviews the conversation and scores collaboration quality.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

Orchestrator with group chat initialization

The parent agent initializes the group chat config and delegates execution to child agents.

@waxell.observe(agent_name="autogen-orchestrator", workflow_name="autogen-group-chat")
async def run_agent(query: str, dry_run: bool = False, waxell_ctx=None, **kwargs):
waxell.tag("demo", "autogen")
waxell.metadata("chat_type", "group_chat")

# @step -- initialize group chat configuration
chat_config = await init_groupchat(
agents=["planner", "engineer", "reviewer"], max_rounds=5,
)

# Child agents execute sequentially
runner_result = await run_agents(query=query, client=client)
evaluator_result = await run_evaluator(
query=query, plan=runner_result["plan"],
implementation=runner_result["implementation"], client=client,
)

Runner with @step and @decision for speaker selection

Each agent turn is recorded as a step, with round-robin speaker selection captured as a decision.

@waxell.decision(name="select_speaker", options=["planner", "engineer", "reviewer"])
async def select_speaker(round_num: int, agents: list[str]) -> dict:
chosen = agents[round_num % len(agents)]
return {
"chosen": chosen,
"reasoning": f"Round-robin: round {round_num} maps to '{chosen}'",
}

@waxell.step_dec(name="agent_planner")
async def run_planner_step(query: str, client) -> dict:
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "system", "content": "Break down the task into steps."},
{"role": "user", "content": query}],
)
return {"plan": response.choices[0].message.content}

Evaluator with @reasoning and score()

The evaluator reviews the conversation and assesses collaboration quality.

@waxell.reasoning_dec(step="conversation_evaluation")
async def evaluate_conversation(plan: str, implementation: str, review: str) -> dict:
plan_has_steps = any(c.isdigit() for c in plan[:50])
impl_has_detail = len(implementation) > 100
quality_score = sum([plan_has_steps, impl_has_detail, len(review) > 50]) / 3.0
return {
"thought": f"Plan {'includes' if plan_has_steps else 'lacks'} structured steps.",
"evidence": [f"Plan length: {len(plan)} chars"],
"conclusion": f"Conversation quality: {quality_score:.0%}",
}

waxell.score("conversation_quality", 0.85, comment="multi-agent collaboration")
waxell.score("review_approved", True, data_type="boolean")

What this demonstrates

  • @waxell.observe -- parent-child agent hierarchy with automatic lineage
  • @waxell.step_dec -- group chat init, planner, and engineer agent turns recorded as steps
  • @waxell.decision -- round-robin speaker selection at each group chat round
  • @waxell.reasoning_dec -- chain-of-thought conversation quality evaluation
  • waxell.score() -- conversation quality and reviewer approval scores
  • waxell.tag() / waxell.metadata() -- framework, chat type, and agent role metadata
  • Auto-instrumented LLM calls -- three OpenAI gpt-4o-mini calls captured automatically
  • AutoGen group chat pattern -- planner, engineer, reviewer in sequential round-robin

Run it

# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.autogen_agent --dry-run

# Live (real OpenAI)
export OPENAI_API_KEY="sk-..."
python -m app.demos.autogen_agent

# Custom query
python -m app.demos.autogen_agent --query "Design a monitoring strategy"

Source

dev/waxell-dev/app/demos/autogen_agent.py