CrewAI vs Waxell

This page compares three approaches to building the same agent: CrewAI alone, CrewAI enhanced with Waxell Observe, and a fully native Waxell implementation. The use case is a research agent that searches for information and produces a summary report.

A) CrewAI Alone

A standard CrewAI setup with an Agent, Task, and Crew. CrewAI provides a high-level abstraction for multi-agent orchestration, but leaves observability, cost tracking, and governance to you.

from crewai import Agent, Task, Crew
from crewai_tools import SerperDevTool

# Define tools
search_tool = SerperDevTool()

# Define the research agent
researcher = Agent(
    role="Research Analyst",
    goal="Find comprehensive information about the given topic",
    backstory="You are an expert research analyst skilled at finding and synthesizing information.",
    tools=[search_tool],
    llm="gpt-4o",
    verbose=True,
)

# Define the summarizer agent
summarizer = Agent(
    role="Report Writer",
    goal="Produce a clear, well-structured summary report",
    backstory="You are a skilled technical writer who distills complex research into actionable summaries.",
    llm="gpt-4o",
    verbose=True,
)

# Define tasks
research_task = Task(
    description="Research the topic: {topic}. Find key facts, recent developments, and expert opinions.",
    expected_output="A detailed research brief with sources",
    agent=researcher,
)

summary_task = Task(
    description="Using the research brief, write a concise executive summary with key findings and recommendations.",
    expected_output="A structured executive summary in markdown",
    agent=summarizer,
)

# Build and run the crew
crew = Crew(
    agents=[researcher, summarizer],
    tasks=[research_task, summary_task],
    verbose=True,
)

result = crew.kickoff(inputs={"topic": "AI governance trends 2025"})
print(result)

What is missing:

No centralized tracking of LLM calls across agents
No cost visibility (how much did this crew run cost?)
No policy enforcement (no budget limits, no content filtering)
No audit trail beyond verbose console output
No durability (a crash means re-running the entire crew from scratch)

B) CrewAI + Waxell Observe

The same CrewAI workflow wrapped with WaxellContext for full observability. Your CrewAI code stays the same -- you just wrap the execution in a context manager.

from crewai import Agent, Task, Crew
from crewai_tools import SerperDevTool
from waxell_observe import WaxellContext

search_tool = SerperDevTool()

researcher = Agent(
    role="Research Analyst",
    goal="Find comprehensive information about the given topic",
    backstory="You are an expert research analyst skilled at finding and synthesizing information.",
    tools=[search_tool],
    llm="gpt-4o",
    verbose=True,
)

summarizer = Agent(
    role="Report Writer",
    goal="Produce a clear, well-structured summary report",
    backstory="You are a skilled technical writer who distills complex research into actionable summaries.",
    llm="gpt-4o",
    verbose=True,
)

research_task = Task(
    description="Research the topic: {topic}. Find key facts, recent developments, and expert opinions.",
    expected_output="A detailed research brief with sources",
    agent=researcher,
)

summary_task = Task(
    description="Using the research brief, write a concise executive summary with key findings and recommendations.",
    expected_output="A structured executive summary in markdown",
    agent=summarizer,
)

crew = Crew(
    agents=[researcher, summarizer],
    tasks=[research_task, summary_task],
    verbose=True,
)

# Wrap execution with Waxell Observe
async with WaxellContext(agent_name="research-crew") as ctx:
    result = crew.kickoff(inputs={"topic": "AI governance trends 2025"})

    # Record LLM calls (CrewAI exposes token usage in its output)
    if hasattr(result, "token_usage"):
        ctx.record_llm_call(
            model="gpt-4o",
            tokens_in=result.token_usage.get("prompt_tokens", 0),
            tokens_out=result.token_usage.get("completion_tokens", 0),
            task="research_crew",
        )

    # Record execution steps
    ctx.record_step("research", output={"status": "complete"})
    ctx.record_step("summarize", output={"status": "complete"})

    ctx.set_result({"summary": str(result)})

Decorator alternative

You can also use the @waxell_agent decorator for simpler CrewAI setups:

from waxell_observe import waxell_agent

@waxell_agent(agent_name="research-crew")
async def run_research(topic: str, waxell_ctx=None) -> str:
    crew = Crew(agents=[researcher, summarizer], tasks=[research_task, summary_task])
    result = crew.kickoff(inputs={"topic": topic})

    if waxell_ctx and hasattr(result, "token_usage"):
        waxell_ctx.record_llm_call(
            model="gpt-4o",
            tokens_in=result.token_usage.get("prompt_tokens", 0),
            tokens_out=result.token_usage.get("completion_tokens", 0),
        )

    return str(result)

What you now get -- with minimal changes to your CrewAI code:

Every crew execution tracked as a run with inputs, outputs, and status
LLM call recording with cost estimates
Step-by-step execution trail for debugging
Pre-execution policy checks (budget enforcement, rate limiting)
Full audit trail visible in the Waxell dashboard

C) Waxell Native

The same research-and-summarize use case built natively with the Waxell SDK. The workflow, tools, and LLM calls are all governed and tracked by default.

from waxell_runtime import agent, workflow, tool, WorkflowContext

@agent(
    name="research-agent",
    description="Researches topics and produces executive summaries",
    signals=["research_request"],
    domains=["search"],
)
class ResearchAgent:

    @tool
    async def web_search(self, ctx: WorkflowContext, query: str) -> dict:
        """Search the web for information on a topic."""
        return await ctx.domain("search", "web_search", query=query)

    @workflow("research_and_summarize")
    async def research_and_summarize(self, ctx: WorkflowContext, topic: str) -> dict:
        """Research a topic and produce an executive summary."""

        # Step 1: Search for information
        search_results = await ctx.tool("web_search", query=topic)
        ctx.log_step("research_complete", {"results_count": len(search_results)})

        # Step 2: Synthesize research into a brief
        research_brief = await ctx.llm.generate(
            prompt=(
                f"Synthesize these search results into a research brief:\n"
                f"Topic: {topic}\n"
                f"Results: {search_results}"
            ),
            output_format="text",
            task="research_synthesis",
        )

        # Step 3: Generate executive summary
        summary = await ctx.llm.generate(
            prompt=(
                f"Write a concise executive summary with key findings "
                f"and recommendations based on this research brief:\n\n"
                f"{research_brief}"
            ),
            output_format="json",
            task="executive_summary",
        )

        return summary

What you gain with native Waxell:

Single-agent simplicity: No need to define separate "agents" for research and summarization -- workflows handle orchestration
Durable execution: Each step is a checkpoint. If the process crashes after the search completes, it resumes at the synthesis step
LLM routing: ctx.llm.generate() automatically selects models based on task type, applies rate limiting, and handles fallbacks
Domain abstraction: ctx.domain("search", "web_search", ...) routes through a governed endpoint, not a hardcoded API call
Zero instrumentation: Every LLM call, tool invocation, and step is tracked in the control plane automatically

Comparison Table

Capability	CrewAI	CrewAI + Observe	Waxell Native
Agent definition	Agent/Task/Crew classes	Unchanged	Declarative (`@agent`, `@workflow`, `@tool`)
Multi-agent orchestration	Built-in (Crew)	Unchanged	Workflows with steps
Observability	Verbose console output	Run tracking via WaxellContext	Built-in, zero instrumentation
LLM cost tracking	Not included	Manual recording with auto-estimation	Built-in with tenant-level overrides
Policy enforcement	Not included	Pre-execution checks	Full lifecycle governance
Budget limits	Not included	Supported via policies	Built-in with tenant/agent scoping
Durable workflows	Not included	Not included	Checkpoint/resume with WorkflowEnvelope
Approval workflows	Not included	Not included	Built-in with pause/resume
Multi-tenancy	Not included	Tenant-scoped via control plane	Native tenant isolation
Audit logging	Not included	Run-level audit trail	Full execution trace with `agent_trace`

Which Approach Should You Choose?

Start with Observe

If you already have CrewAI agents running, start with CrewAI + Observe. Wrap your crew.kickoff() calls with WaxellContext and you immediately get cost tracking, policy enforcement, and audit trails. Migration to native Waxell can happen later when you need durable workflows.

See the Progressive Migration guide for a phased approach to adopting Waxell.

A) CrewAI Alone​

B) CrewAI + Waxell Observe​

C) Waxell Native​

Comparison Table​

Which Approach Should You Choose?​

A) CrewAI Alone

B) CrewAI + Waxell Observe

C) Waxell Native

Comparison Table

Which Approach Should You Choose?