Quickstart: Observe Your Agents
Add observability to any Python AI agent in under 5 minutes. By the end you'll have LLM call tracking, structured behavior recording (tools, retrievals, decisions, reasoning), quality scores, and session/user attribution.
Prerequisites
- Python 3.10+
- A Waxell API key (get one from your Waxell control plane dashboard)
Step 1: Install
pip install waxell-observe
Step 2: Initialize and Auto-Instrument
Call init() before importing any LLM SDK. This auto-instruments 157+ libraries (OpenAI, Anthropic, Groq, LiteLLM, Cohere, Mistral, vector DBs, and more) with zero code changes.
import waxell_observe as waxell
waxell.init(api_key="wax_sk_...", api_url="https://acme.waxell.dev")
# Import LLM SDKs AFTER init()
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
# Automatically traced: model, tokens, cost, latency
You can also configure via environment variables:
export WAXELL_API_URL="https://acme.waxell.dev"
export WAXELL_API_KEY="wax_sk_..."
Then just call waxell.init() without arguments.
Step 3: Wrap Your Agent with @observe
The @observe decorator creates a tracked execution run for each call, capturing inputs, outputs, and policy enforcement:
import waxell_observe as waxell
waxell.init()
from openai import OpenAI
client = OpenAI()
@waxell.observe(agent_name="support-bot")
async def handle_ticket(query: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": query}],
)
return response.choices[0].message.content
# Every call creates a tracked run with IO capture
result = await handle_ticket("How do I reset my password?")
Step 4: Add Behavior Decorators
This is where Waxell traces become rich and useful. Wrap your internal functions with behavior decorators to record what your agent does -- not just LLM calls.
@tool -- Record Tool Calls
@waxell.tool(tool_type="vector_db")
def search_knowledge_base(query: str, top_k: int = 5) -> list[dict]:
"""Every call auto-records: name, inputs, output, duration, status."""
results = index.search(query, top_k=top_k)
return [{"id": r.id, "title": r.title, "score": r.score} for r in results]
@retrieval -- Record RAG Retrievals
@waxell.retrieval(source="pinecone")
async def retrieve_docs(query: str) -> list[dict]:
"""Auto-records: query, documents, scores, source, duration."""
results = await vector_store.search(query, top_k=10)
return [{"id": r.id, "title": r.title, "score": r.score} for r in results]
@decision -- Record Routing/Classification Decisions
@waxell.decision(name="route_query", options=["faq", "technical", "billing"])
async def classify_query(query: str) -> dict:
"""Auto-records: chosen option, reasoning, confidence."""
result = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Classify: {query}"}],
)
return {"chosen": "technical", "reasoning": "User asks about API integration"}
@reasoning_dec -- Record Chain-of-Thought
@waxell.reasoning_dec(step="quality_check")
async def assess_answer(answer: str, sources: list) -> dict:
"""Auto-records: thought process, evidence, conclusion."""
return {
"thought": "Answer covers all source material with citations",
"evidence": [f"Source: {s['title']}" for s in sources],
"conclusion": "High quality, ready to present",
}
@step_dec -- Record Execution Steps
@waxell.step_dec(name="preprocess")
def clean_input(query: str) -> dict:
"""Auto-records as a named execution step with output."""
cleaned = query.strip().lower()
return {"original": query, "cleaned": cleaned}
@retry_dec -- Record Retry/Fallback Logic
@waxell.retry_dec(max_attempts=3, strategy="retry")
async def call_llm_with_retry(prompt: str) -> str:
"""On failure, retries up to 3 times, recording each attempt."""
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
Full Example: Putting It Together
import waxell_observe as waxell
waxell.init()
from openai import OpenAI
client = OpenAI()
@waxell.retrieval(source="knowledge_base")
async def search(query: str) -> list[dict]:
return await vector_store.search(query, top_k=10)
@waxell.decision(name="approach", options=["summarize", "compare", "deep_dive"])
async def choose_approach(query: str) -> dict:
return {"chosen": "deep_dive", "reasoning": "Query asks for detailed analysis"}
@waxell.tool(tool_type="api")
async def run_analysis(docs: list) -> dict:
return await analysis_service.analyze(docs)
@waxell.reasoning_dec(step="quality_check")
async def check_quality(result: dict) -> dict:
return {"thought": "Analysis is thorough", "conclusion": "Ready to present"}
@waxell.observe(agent_name="research-pipeline")
async def run_pipeline(query: str):
docs = await search(query)
approach = await choose_approach(query)
analysis = await run_analysis(docs)
quality = await check_quality(analysis)
# Inline enrichment (see Step 5)
waxell.score("quality", 0.92)
waxell.tag("domain", "research")
return {"result": analysis, "approach": approach["chosen"]}
Every decorated function inside @observe is automatically recorded as a structured span in the trace. No manual ctx.record_*() calls needed.
Behavior decorators are no-ops when called outside an @observe or WaxellContext scope -- your functions work normally with zero overhead.
Step 5: Add Inline Enrichment
Use top-level convenience functions anywhere inside an @observe or WaxellContext scope:
@waxell.observe(agent_name="support-bot")
async def handle_ticket(query: str) -> str:
response = await call_llm(query)
# Record a quick step inline (no decorator needed)
waxell.step("generate_response", output={"length": len(response)})
# Attach a quality score
waxell.score("relevance", 0.95)
# Add searchable tags
waxell.tag("category", "password-reset")
waxell.tag("priority", "low")
# Add arbitrary metadata
waxell.metadata("model_version", "gpt-4o-2024-08-06")
return response
Full reference of convenience functions:
| Function | What it does |
|---|---|
waxell.step(name, output=) | Record a named execution step |
waxell.score(name, value) | Attach a quality score (numeric, categorical, or boolean) |
waxell.tag(key, value) | Add a searchable key-value tag |
waxell.metadata(key, value) | Add arbitrary metadata to the trace |
waxell.decide(name, chosen=, options=) | Record a decision inline |
waxell.retrieve(query=, documents=, source=) | Record a retrieval inline |
waxell.reason(step=, thought=, conclusion=) | Record reasoning inline |
waxell.retry(attempt=, reason=) | Record a retry event inline |
Step 6: Add Session and User Tracking
Group related runs and attribute costs to end users with WaxellContext:
from waxell_observe import WaxellContext
async with WaxellContext(
agent_name="chat-agent",
session_id="session-abc123", # Groups related runs
user_id="user-456", # Per-user cost attribution
) as ctx:
response = await call_llm(prompt)
ctx.set_result({"output": response})
Or pass them directly to @observe:
@waxell.observe(
agent_name="chat-agent",
session_id="session-abc123",
user_id="user-456",
)
async def handle_message(message: str) -> str:
return await call_llm(message)
LangChain Integration
If you use LangChain, LLM calls are captured automatically via the callback handler:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from waxell_observe.integrations.langchain import WaxellLangChainHandler
handler = WaxellLangChainHandler(agent_name="langchain-bot")
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("Answer: {question}")
chain = prompt | llm
result = chain.invoke(
{"question": "What is Waxell?"},
config={"callbacks": [handler]},
)
handler.flush_sync(result={"output": result.content})
What You Get in the Dashboard
After instrumenting your agent, every run appears in the Waxell dashboard with:
- Agent name, workflow, and execution status
- Captured inputs and outputs (from
@observe) - LLM calls with model, tokens, cost, latency, prompt/response previews
- Behavior spans -- tool calls, retrievals, decisions, reasoning steps (from behavior decorators)
- Scores, tags, and metadata (from enrichment functions)
- Session timeline grouping related runs
- Per-user cost attribution
Summary: Instrumentation Surface
| Layer | How | What you get |
|---|---|---|
waxell.init() | 1 line | Auto-instrumented LLM calls for 157+ providers |
@waxell.observe() | Wrap agent function | Named runs, IO capture, policy enforcement |
@waxell.tool() | Wrap tool functions | Structured tool call spans |
@waxell.retrieval() | Wrap retrieval functions | Query, documents, scores, source |
@waxell.decision() | Wrap routing functions | Chosen option, reasoning, confidence |
@waxell.reasoning_dec() | Wrap reasoning functions | Thought, evidence, conclusion |
@waxell.step_dec() | Wrap pipeline steps | Named step with output |
@waxell.retry_dec() | Wrap retryable functions | Attempt tracking, fallback recording |
waxell.score() | Inline call | Quality scores attached to runs |
waxell.tag() | Inline call | Searchable key-value tags |
waxell.metadata() | Inline call | Arbitrary trace metadata |
waxell.step() | Inline call | Quick step recording without a decorator |
WaxellContext | Context manager | Session/user tracking, explicit lifecycle control |
Next Steps
- Behavior Tracking -- Full reference for all behavior decorators and methods
- Auto-Instrumentation -- Full list of 157+ auto-instrumented libraries
- Decorator Pattern -- Full
@observereference with all parameters - Scoring -- Quality metrics and score analytics
- Cost Management -- Track and control LLM spending
- Policy & Governance -- Pre-execution and mid-execution policy checks
- Cookbook -- Working examples for every provider and pattern