Skip to main content

Best Practices

This guide covers recommended patterns and practices for building production-ready Waxell agents.

Agent Design

Keep Agents Focused

Each agent should have a single, clear responsibility:

# Good: Single responsibility
@agent(name="email-classifier")
class EmailClassifier:
"""Classifies incoming emails."""
pass

@agent(name="email-responder")
class EmailResponder:
"""Generates email responses."""
pass

# Avoid: Multiple responsibilities
@agent(name="email-handler")
class EmailHandler:
"""Classifies, responds, and archives emails.""" # Too broad
pass

Use Capabilities for Reuse

Extract common patterns into capabilities:

@capability(name="classification")
class ClassificationCapability:
@decision
def classify(self, ctx, text: str, categories: list[str]):
return ctx.llm.classify(text, categories=categories)

@agent(
name="support-agent",
capabilities=[ClassificationCapability]
)
class SupportAgent:
pass

Workflow Design

Prefer Small Steps

Break workflows into small, checkpointable steps:

@workflow
def process_order(self, ctx):
# Good: Each step is independently checkpointable
validated = ctx.call(self.validate)
enriched = ctx.call(self.enrich, data=validated)
result = ctx.call(self.process, data=enriched)
return result

Handle Errors Gracefully

Implement error handling at each step:

@workflow
def resilient_workflow(self, ctx):
try:
result = ctx.call(self.risky_operation)
except ValidationError:
return ctx.call(self.handle_validation_error)
except ExternalServiceError:
return ctx.call(self.retry_with_backoff)

return result

Decision Design

Provide Clear Prompts

Be explicit about what you want from the LLM:

@decision
def classify_intent(self, ctx):
# Good: Clear, specific prompt
return ctx.llm.classify(
text=ctx.input.message,
categories=["billing", "technical", "general"],
instructions="Classify based on the primary topic discussed"
)

Use Structured Output

Prefer structured output for reliable parsing:

from pydantic import BaseModel

class Analysis(BaseModel):
sentiment: str
confidence: float
topics: list[str]

@decision
def analyze(self, ctx):
return ctx.llm.generate(
prompt=f"Analyze: {ctx.input.text}",
response_model=Analysis # Structured output
)

Governance

Set Appropriate Rate Limits

Protect your resources with rate limits:

@agent(
name="email-sender",
rate_limit={
"requests_per_minute": 10,
"tokens_per_minute": 50000
}
)
class EmailSender:
pass

Require Approval for Sensitive Operations

Add human oversight for risky actions:

@tool(requires_approval=True)
def delete_account(self, ctx, user_id: str):
"""Delete user account - requires human approval."""
pass

Testing

Test at Multiple Levels

  1. Unit tests: Test individual decisions and tools
  2. Integration tests: Test workflows end-to-end
  3. Simulation tests: Test with mock LLM responses
from waxell_infra.testing import AgentTestHarness

def test_classification_workflow():
harness = AgentTestHarness(ClassifierAgent)

result = harness.run(
workflow="classify",
input={"message": "I can't log in"},
mock_llm_responses={"classify": "technical"}
)

assert result["category"] == "technical"

Monitoring

Enable Observability

Use structured logging and tracing:

@workflow
def monitored_workflow(self, ctx):
ctx.log.info("Starting workflow", input_size=len(ctx.input))

result = ctx.call(self.process)

ctx.log.info("Workflow complete", result_status=result["status"])
return result

Set Up Alerts

Monitor key metrics:

  • Execution success rate
  • Latency percentiles
  • Token consumption
  • Error rates by type

Next Steps