Instrument OpenAI Directly
This tutorial covers four ways to add observability to OpenAI (or any LLM provider) calls, ordered from simplest to most control.
Most users only need Step 1 (auto-instrumentation). It captures every LLM call with zero code changes. Add decorators (Step 2) when you want named traces and enrichment.
Prerequisites
- Python 3.10+
waxell-observeinstalled (pip install waxell-observe)- OpenAI API key set as
OPENAI_API_KEY - Waxell API credentials configured
What You'll Learn
- Four instrumentation approaches: auto-instrumentation, decorator, context manager, and manual
- When to use each approach
- How to add metadata, tags, and scores
- How to view results in the LLM Calls explorer
Setup
Configure your Waxell credentials via environment variables:
export WAXELL_API_URL="https://acme.waxell.dev"
export WAXELL_API_KEY="wax_sk_..."
export OPENAI_API_KEY="sk-..."
Then initialize in your code:
import waxell_observe as waxell
waxell.init() # reads WAXELL_API_URL and WAXELL_API_KEY from env
Or pass credentials directly:
import waxell_observe as waxell
waxell.init(api_key="wax_sk_...", api_url="https://acme.waxell.dev")
Step 1: Auto-Instrumentation (Recommended)
The simplest approach -- call init() before importing OpenAI and every call is captured automatically. No decorators, no context managers, no manual recording.
import waxell_observe as waxell
waxell.init() # must come before importing openai
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is the capital of France?"}],
)
print(response.choices[0].message.content)
# Automatically traced with model, tokens, cost, latency
That's it. Every OpenAI call is now captured with model name, token counts, cost estimates, and latency. LLM calls made outside any decorator or context manager are automatically buffered and flushed to auto-generated runs.
Step 2: @observe Decorator (Named Traces)
Add @observe when you want named traces with automatic IO capture, enrichment, and policy enforcement. Combined with init(), LLM calls are recorded automatically -- you just add structure and metadata.
import waxell_observe as waxell
waxell.init()
from openai import OpenAI
client = OpenAI()
@waxell.observe(agent_name="my-chatbot")
def chat(message: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": message}],
)
text = response.choices[0].message.content
# Enrich the trace with top-level convenience functions
waxell.tag("domain", "general-knowledge")
waxell.score("answer_length", len(text))
waxell.metadata("model", "gpt-4o")
return text
# Use it -- just call your function normally
result = chat("What is the capital of France?")
print(result)
The decorator handles:
- Starting and completing the execution run
- Capturing function inputs and outputs
- Checking policies before execution (set
enforce_policy=Falseto skip) - Creating an OTel trace span
With init() active, OpenAI calls are recorded automatically. You don't need waxell_ctx.record_llm_call() unless you're using a provider that isn't auto-instrumented.
Async version:
@waxell.observe(agent_name="my-chatbot")
async def chat(message: str) -> str:
response = await openai.AsyncOpenAI().chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": message}],
)
return response.choices[0].message.content
Step 3: Context Manager (Advanced)
Use WaxellContext when you need explicit control over session IDs, user tracking, or when your agent logic spans multiple functions.
import waxell_observe as waxell
from waxell_observe import WaxellContext
waxell.init()
from openai import OpenAI
client = OpenAI()
def chat_with_context(message: str, session_id: str, user_id: str) -> str:
with WaxellContext(
agent_name="my-chatbot",
workflow_name="chat",
inputs={"message": message},
session_id=session_id,
user_id=user_id,
) as ctx:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": message}],
)
text = response.choices[0].message.content
# LLM call recorded automatically by init()
# Add structured metadata
ctx.set_tag("environment", "production")
ctx.set_result({"output": text})
return text
result = chat_with_context(
message="What is the capital of France?",
session_id="sess_abc123",
user_id="user_42",
)
The context manager gives you access to:
ctx.record_step()-- record execution stepsctx.record_score()-- attach quality scoresctx.set_tag()-- add searchable tagsctx.set_metadata()-- add metadata to the tracectx.set_result()-- set the run resultctx.run_id-- the assigned run ID
Step 4: Manual Recording (Legacy / Custom Providers)
For custom LLM providers that aren't auto-instrumented, use WaxellObserveClient directly. This is the most verbose approach and is only recommended when the other patterns don't fit.
from waxell_observe import WaxellObserveClient
client = WaxellObserveClient(
api_url="https://acme.waxell.dev",
api_key="wax_sk_...",
)
async def chat_manual(message: str) -> str:
# 1. Start a run
run_info = await client.start_run(
agent_name="my-chatbot",
workflow_name="chat",
inputs={"message": message},
)
try:
# 2. Make your LLM call (any provider)
response = call_custom_llm(message) # your custom LLM call
# 3. Record the LLM call
await client.record_llm_calls(
run_id=run_info.run_id,
calls=[
{
"model": "custom-model-v2",
"tokens_in": 150,
"tokens_out": 80,
"cost": 0.003,
"task": "chat",
}
],
)
# 4. Complete the run
await client.complete_run(
run_id=run_info.run_id,
result={"output": response},
status="success",
)
return response
except Exception as e:
await client.complete_run(
run_id=run_info.run_id,
status="error",
error=str(e),
)
raise
Compare Approaches
| Feature | Auto-Instrumentation | Decorator | Context Manager | Manual |
|---|---|---|---|---|
| Lines of code | 2 | Few | Moderate | Most |
| LLM call capture | Automatic | Automatic (with init()) | Automatic (with init()) | Manual |
| IO capture | -- | Automatic | Manual via set_result() | Manual |
| Session/user tracking | -- | Via params | Via params | Via metadata |
| Policy enforcement | -- | Automatic | Automatic | Call check_policy() yourself |
| OTel tracing | Automatic | Automatic | Automatic | Not included |
| Error handling | -- | Automatic | Automatic | Manual try/catch |
| Custom providers | Supported providers only | Yes | Yes | Any provider |
When to use each:
- Auto-instrumentation -- Best for most cases. Two lines and you're done. Start here.
- Decorator -- Add when you want named agent traces, IO capture, or enrichment (tags, scores, metadata).
- Context Manager -- Advanced. Use when you need explicit session IDs, user tracking, or multi-function workflows.
- Manual -- Legacy / custom providers. Only when auto-instrumentation doesn't support your provider.
Viewing Results in the LLM Calls Explorer
After instrumenting your code, every recorded LLM call appears in the Observability > LLM Calls explorer. You can filter by:
- Model -- See all calls to a specific model
- Agent -- Filter to a specific agent
- Cost range -- Find expensive calls
- Token range -- Find high-token calls
- Date range -- Narrow to a time window
- Search -- Full-text search across model, task, prompt, and response previews
Each LLM call record includes:
- Model name, task label
- Input/output token counts and total
- Estimated cost (using system or custom model pricing)
- Prompt and response previews (first 500 characters)
- Linked agent run and workflow
- Timestamp
Next Steps
- Auto-Instrumentation -- Full
init()reference and supported libraries - Decorator Pattern -- Full
@observereference - Context Manager -- Full
WaxellContextreference - Track a RAG Pipeline -- End-to-end RAG observability
- Cost Optimization -- Reduce LLM spending with observability data