Skip to main content

Workflow Agents

A multi-agent workflow framework comparison that exercises 3 structured workflow frameworks -- Julep (session chat, task creation, execution), Langroid (LLM response, task run), and ControlFlow (task run, agent call) -- through a workflow-runner child agent and a workflow-evaluator child agent. Mock objects expose the exact methods that each instrumentor wraps via wrapt.wrap_function_wrapper. Demonstrates @tool, @retrieval, @reasoning, @decision, and waxell.decide() in a single pipeline.

Environment variables

This example runs in dry-run mode by default (no API key needed). For live mode, set OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL.

Architecture

Key Code

Framework-specific workflow tool wrappers

Each framework's exact wrapt-target methods are wrapped with @waxell.tool for trace attribution.

@waxell.tool(tool_type="workflow")
def julep_session_chat(sessions, session_id, messages, model) -> dict:
result = sessions.chat(session_id=session_id, messages=messages, model=model)
return {"response": result.choices[0]["message"]["content"], "model": model}

@waxell.tool(tool_type="workflow")
def langroid_llm_response(agent, message) -> dict:
return {"agent_name": agent.config.name, "response": agent.llm_response(message)}

@waxell.tool(tool_type="workflow")
def controlflow_task_run(task) -> dict:
return {"objective": task.objective, "result": task.run(), "status": "completed"}

Comparison and synthesis

The evaluator compares workflow approaches using reasoning and produces a quality score.

@waxell.reasoning_dec(step="framework_comparison")
async def compare_framework_approaches(julep_result, langroid_result, cf_result) -> dict:
return {
"thought": "Julep excels at stateful multi-turn workflows. "
"Langroid provides clean LLM abstractions. "
"ControlFlow offers structured task execution.",
"evidence": ["julep: session-based", "langroid: agent-centric", "controlflow: task-centric"],
"conclusion": "Each framework optimizes for different workflow dimensions",
}

What this demonstrates

  • @waxell.tool(tool_type="workflow") -- 7 tool calls covering Julep (session_chat, task_create, execution_create), Langroid (llm_response, task_run), and ControlFlow (task_run, agent_call).
  • @waxell.retrieval -- collection of framework results as a retrieval step.
  • @waxell.step_dec -- comparison preparation step.
  • @waxell.decision -- primary framework selection.
  • waxell.decide() -- inline synthesis strategy decision.
  • @waxell.reasoning_dec -- cross-framework approach comparison.
  • waxell.score() -- framework_coverage and synthesis_quality scores.
  • Auto-instrumented LLM calls -- OpenAI synthesis call.
  • 3 workflow frameworks -- Julep (stateful sessions), Langroid (LLM-centric), ControlFlow (structured tasks).

Run it

# Dry-run (no API key needed)
python -m app.demos.workflow_agents_agent --dry-run

# Live mode with OpenAI
OPENAI_API_KEY=sk-... python -m app.demos.workflow_agents_agent

Source

dev/waxell-dev/app/demos/workflow_agents_agent.py