Waxell Observe
You already have agents -- add observability in 2 lines of code.
Waxell Observe is a lightweight Python package that brings LLM call tracking, cost management, and policy enforcement to any AI agent. It works with any Python agent framework -- LangChain, LlamaIndex, CrewAI, custom code, or anything else. No vendor lock-in, no runtime changes, no migration required.
Fastest Path: Auto-Instrumentation
Two lines to automatically trace all LLM calls across 200+ providers:
import waxell_observe as waxell
waxell.init(api_key="wax_sk_...", api_url="https://acme.waxell.dev")
# Import LLM SDKs AFTER init() -- they're now auto-instrumented
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
# Automatically traced with model, tokens, cost, latency
The Decorator Pattern (Recommended)
Decorators are the primary way to instrument your agents. Wrap functions with @observe and behavior decorators to get structured, rich traces with minimal code:
import waxell_observe as waxell
waxell.init()
from openai import AsyncOpenAI
client = AsyncOpenAI()
@waxell.retrieval(source="pinecone")
async def search_docs(query: str) -> list[dict]:
return await vector_store.search(query, top_k=10)
@waxell.decision(name="approach", options=["summarize", "compare", "deep_dive"])
async def choose_approach(query: str) -> dict:
return {"chosen": "deep_dive", "reasoning": "Query asks for detailed analysis"}
@waxell.tool(tool_type="api")
async def run_analysis(docs: list) -> dict:
return await analysis_service.analyze(docs)
@waxell.observe(agent_name="research-pipeline")
async def run_pipeline(query: str):
docs = await search_docs(query)
approach = await choose_approach(query)
analysis = await run_analysis(docs)
# Inline enrichment
waxell.score("quality", 0.92)
waxell.tag("domain", "research")
return {"result": analysis, "approach": approach["chosen"]}
Every decorated function inside @observe is automatically recorded as a structured span. No manual ctx.record_*() calls needed.
Decorator Reference
| Decorator | Purpose | What it captures |
|---|---|---|
@waxell.observe() | Agent run boundary | Inputs, outputs, policy checks, run lifecycle |
@waxell.tool() | Tool/function calls | Name, inputs, output, duration, status |
@waxell.retrieval() | RAG search operations | Query, documents, scores, source |
@waxell.decision() | Routing/classification | Chosen option, reasoning, confidence |
@waxell.reasoning_dec() | Chain-of-thought | Thought, evidence, conclusion |
@waxell.step_dec() | Pipeline steps | Step name and output |
@waxell.retry_dec() | Retry/fallback logic | Attempt count, strategy, errors |
Convenience Functions
Use these anywhere inside an @observe scope for inline enrichment:
| Function | Purpose |
|---|---|
waxell.score(name, value) | Quality scores (numeric, boolean, categorical) |
waxell.tag(key, value) | Searchable key-value tags |
waxell.metadata(key, value) | Arbitrary structured metadata |
waxell.step(name, output=) | Quick step recording |
waxell.decide(name, chosen=) | Inline decision recording |
waxell.retrieve(query=, documents=) | Inline retrieval recording |
waxell.reason(step=, thought=) | Inline reasoning recording |
waxell.retry(attempt=, reason=) | Inline retry recording |
waxell.user_message(content) | Record inbound user message |
waxell.agent_response(content) | Record outbound agent response |
waxell.communication(channel=) | Record outbound messages (Slack, email, etc.) |
waxell.flush() / waxell.flush_sync() | Flush buffered data for long-running agents |
waxell.diagnose() | Introspect SDK state and configuration |
Advanced: Context Manager
For complex scenarios where decorators don't fit -- multi-step orchestration, batch processing, conditional context creation -- use WaxellContext directly:
from waxell_observe import WaxellContext
async with WaxellContext(
agent_name="research-agent",
session_id="sess_abc123",
user_id="user_456",
) as ctx:
result = await run_research_pipeline(query)
ctx.record_llm_call(model="claude-sonnet-4", tokens_in=500, tokens_out=200)
ctx.record_step("summarize", output={"summary": result})
ctx.set_result({"answer": result})
See the Context Manager page for the full API.
LangChain Integration
Drop-in callback handler for any LangChain chain or agent:
from waxell_observe.integrations.langchain import WaxellLangChainHandler
handler = WaxellLangChainHandler(agent_name="langchain-agent")
result = chain.invoke(input, config={"callbacks": [handler]})
handler.flush_sync(result={"output": result})
What You Get
| Feature | Description |
|---|---|
| LLM Call Tracking | Model, token counts, cost, prompt/response previews for every LLM call |
| LLM Call Explorer | Browse, filter, and inspect every LLM call with prompt/response viewer |
| Session Tracking | Group related runs by session for conversation-level analytics |
| User Tracking | Per-user cost attribution, usage patterns, and analytics |
| Scoring | Capture quality scores via SDK or UI annotations |
| Annotation Queues | Human review workflows for manual quality assessment |
| Prompt Management | Version-controlled prompts with labels, playground, and SDK retrieval |
| Cost Analytics | Model usage breakdown, per-user costs, custom pricing overrides |
| Policy Enforcement | Pre-execution and mid-execution checks with allow/block/warn/throttle actions |
| Behavior Tracking | Structured spans for tools, retrievals, decisions, reasoning, retries |
| Approval Workflows | Human-in-the-loop approval for policy-blocked actions |
| Conversation Tracking | Auto-captured conversation state, context utilization, message counts |
Framework Compatibility
Waxell Observe works with any Python agent framework:
- OpenAI -- auto-instrumentation or decorators
- Anthropic -- auto-instrumentation or decorators
- LangChain / LangGraph -- first-class callback handler
- LiteLLM -- unified API for 100+ providers
- LlamaIndex -- auto-instrumentation or decorators
- CrewAI -- auto-instrumentation or decorators
- Custom frameworks -- decorators or context manager
- Any Python code -- if it runs Python, you can observe it
Next Steps
- Quickstart -- Get up and running in 5 minutes
- Decorator Pattern -- Full
@observereference with all parameters - Auto-Instrumentation -- Zero-code tracing for 200+ libraries
- Behavior Tracking -- Deep dive into tools, retrievals, decisions, reasoning
- Cookbook -- Working examples for every provider and pattern
- FAQ -- Answers to common questions