Skip to main content

Quickstart: Observe Your Agents

Add observability to any Python AI agent in under 5 minutes. By the end you'll have LLM call tracking, structured behavior recording (tools, retrievals, decisions, reasoning), quality scores, and session/user attribution.

Prerequisites

  • Python 3.10+
  • A Waxell API key (get one from your Waxell control plane dashboard)

Step 1: Install

pip install waxell-observe

Step 2: Initialize and Auto-Instrument

Call init() before importing any LLM SDK. This auto-instruments 157+ libraries (OpenAI, Anthropic, Groq, LiteLLM, Cohere, Mistral, vector DBs, and more) with zero code changes.

import waxell_observe as waxell

waxell.init(api_key="wax_sk_...", api_url="https://acme.waxell.dev")

# Import LLM SDKs AFTER init()
from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
# Automatically traced: model, tokens, cost, latency
Environment Variables

You can also configure via environment variables:

export WAXELL_API_URL="https://acme.waxell.dev"
export WAXELL_API_KEY="wax_sk_..."

Then just call waxell.init() without arguments.

Step 3: Wrap Your Agent with @observe

The @observe decorator creates a tracked execution run for each call, capturing inputs, outputs, and policy enforcement:

import waxell_observe as waxell

waxell.init()

from openai import OpenAI

client = OpenAI()

@waxell.observe(agent_name="support-bot")
async def handle_ticket(query: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": query}],
)
return response.choices[0].message.content

# Every call creates a tracked run with IO capture
result = await handle_ticket("How do I reset my password?")

Step 4: Add Behavior Decorators

This is where Waxell traces become rich and useful. Wrap your internal functions with behavior decorators to record what your agent does -- not just LLM calls.

@tool -- Record Tool Calls

@waxell.tool(tool_type="vector_db")
def search_knowledge_base(query: str, top_k: int = 5) -> list[dict]:
"""Every call auto-records: name, inputs, output, duration, status."""
results = index.search(query, top_k=top_k)
return [{"id": r.id, "title": r.title, "score": r.score} for r in results]

@retrieval -- Record RAG Retrievals

@waxell.retrieval(source="pinecone")
async def retrieve_docs(query: str) -> list[dict]:
"""Auto-records: query, documents, scores, source, duration."""
results = await vector_store.search(query, top_k=10)
return [{"id": r.id, "title": r.title, "score": r.score} for r in results]

@decision -- Record Routing/Classification Decisions

@waxell.decision(name="route_query", options=["faq", "technical", "billing"])
async def classify_query(query: str) -> dict:
"""Auto-records: chosen option, reasoning, confidence."""
result = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Classify: {query}"}],
)
return {"chosen": "technical", "reasoning": "User asks about API integration"}

@reasoning_dec -- Record Chain-of-Thought

@waxell.reasoning_dec(step="quality_check")
async def assess_answer(answer: str, sources: list) -> dict:
"""Auto-records: thought process, evidence, conclusion."""
return {
"thought": "Answer covers all source material with citations",
"evidence": [f"Source: {s['title']}" for s in sources],
"conclusion": "High quality, ready to present",
}

@step_dec -- Record Execution Steps

@waxell.step_dec(name="preprocess")
def clean_input(query: str) -> dict:
"""Auto-records as a named execution step with output."""
cleaned = query.strip().lower()
return {"original": query, "cleaned": cleaned}

@retry_dec -- Record Retry/Fallback Logic

@waxell.retry_dec(max_attempts=3, strategy="retry")
async def call_llm_with_retry(prompt: str) -> str:
"""On failure, retries up to 3 times, recording each attempt."""
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content

Full Example: Putting It Together

import waxell_observe as waxell

waxell.init()

from openai import OpenAI

client = OpenAI()


@waxell.retrieval(source="knowledge_base")
async def search(query: str) -> list[dict]:
return await vector_store.search(query, top_k=10)


@waxell.decision(name="approach", options=["summarize", "compare", "deep_dive"])
async def choose_approach(query: str) -> dict:
return {"chosen": "deep_dive", "reasoning": "Query asks for detailed analysis"}


@waxell.tool(tool_type="api")
async def run_analysis(docs: list) -> dict:
return await analysis_service.analyze(docs)


@waxell.reasoning_dec(step="quality_check")
async def check_quality(result: dict) -> dict:
return {"thought": "Analysis is thorough", "conclusion": "Ready to present"}


@waxell.observe(agent_name="research-pipeline")
async def run_pipeline(query: str):
docs = await search(query)
approach = await choose_approach(query)
analysis = await run_analysis(docs)
quality = await check_quality(analysis)

# Inline enrichment (see Step 5)
waxell.score("quality", 0.92)
waxell.tag("domain", "research")

return {"result": analysis, "approach": approach["chosen"]}

Every decorated function inside @observe is automatically recorded as a structured span in the trace. No manual ctx.record_*() calls needed.

Behavior outside @observe

Behavior decorators are no-ops when called outside an @observe or WaxellContext scope -- your functions work normally with zero overhead.

Step 5: Add Inline Enrichment

Use top-level convenience functions anywhere inside an @observe or WaxellContext scope:

@waxell.observe(agent_name="support-bot")
async def handle_ticket(query: str) -> str:
response = await call_llm(query)

# Record a quick step inline (no decorator needed)
waxell.step("generate_response", output={"length": len(response)})

# Attach a quality score
waxell.score("relevance", 0.95)

# Add searchable tags
waxell.tag("category", "password-reset")
waxell.tag("priority", "low")

# Add arbitrary metadata
waxell.metadata("model_version", "gpt-4o-2024-08-06")

return response

Full reference of convenience functions:

FunctionWhat it does
waxell.step(name, output=)Record a named execution step
waxell.score(name, value)Attach a quality score (numeric, categorical, or boolean)
waxell.tag(key, value)Add a searchable key-value tag
waxell.metadata(key, value)Add arbitrary metadata to the trace
waxell.decide(name, chosen=, options=)Record a decision inline
waxell.retrieve(query=, documents=, source=)Record a retrieval inline
waxell.reason(step=, thought=, conclusion=)Record reasoning inline
waxell.retry(attempt=, reason=)Record a retry event inline

Step 6: Add Session and User Tracking

Group related runs and attribute costs to end users with WaxellContext:

from waxell_observe import WaxellContext

async with WaxellContext(
agent_name="chat-agent",
session_id="session-abc123", # Groups related runs
user_id="user-456", # Per-user cost attribution
) as ctx:
response = await call_llm(prompt)
ctx.set_result({"output": response})

Or pass them directly to @observe:

@waxell.observe(
agent_name="chat-agent",
session_id="session-abc123",
user_id="user-456",
)
async def handle_message(message: str) -> str:
return await call_llm(message)

LangChain Integration

If you use LangChain, LLM calls are captured automatically via the callback handler:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from waxell_observe.integrations.langchain import WaxellLangChainHandler

handler = WaxellLangChainHandler(agent_name="langchain-bot")

llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("Answer: {question}")
chain = prompt | llm

result = chain.invoke(
{"question": "What is Waxell?"},
config={"callbacks": [handler]},
)

handler.flush_sync(result={"output": result.content})

What You Get in the Dashboard

After instrumenting your agent, every run appears in the Waxell dashboard with:

  • Agent name, workflow, and execution status
  • Captured inputs and outputs (from @observe)
  • LLM calls with model, tokens, cost, latency, prompt/response previews
  • Behavior spans -- tool calls, retrievals, decisions, reasoning steps (from behavior decorators)
  • Scores, tags, and metadata (from enrichment functions)
  • Session timeline grouping related runs
  • Per-user cost attribution

Summary: Instrumentation Surface

LayerHowWhat you get
waxell.init()1 lineAuto-instrumented LLM calls for 157+ providers
@waxell.observe()Wrap agent functionNamed runs, IO capture, policy enforcement
@waxell.tool()Wrap tool functionsStructured tool call spans
@waxell.retrieval()Wrap retrieval functionsQuery, documents, scores, source
@waxell.decision()Wrap routing functionsChosen option, reasoning, confidence
@waxell.reasoning_dec()Wrap reasoning functionsThought, evidence, conclusion
@waxell.step_dec()Wrap pipeline stepsNamed step with output
@waxell.retry_dec()Wrap retryable functionsAttempt tracking, fallback recording
waxell.score()Inline callQuality scores attached to runs
waxell.tag()Inline callSearchable key-value tags
waxell.metadata()Inline callArbitrary trace metadata
waxell.step()Inline callQuick step recording without a decorator
WaxellContextContext managerSession/user tracking, explicit lifecycle control

Next Steps