Skip to main content

Instrument OpenAI Directly

This tutorial covers four ways to add observability to OpenAI (or any LLM provider) calls, ordered from simplest to most control.

Start with Auto-Instrumentation

Most users only need Step 1 (auto-instrumentation). It captures every LLM call with zero code changes. Add decorators (Step 2) when you want named traces and enrichment.

Prerequisites

  • Python 3.10+
  • waxell-observe installed (pip install waxell-observe)
  • OpenAI API key set as OPENAI_API_KEY
  • Waxell API credentials configured

What You'll Learn

  • Four instrumentation approaches: auto-instrumentation, decorator, context manager, and manual
  • When to use each approach
  • How to add metadata, tags, and scores
  • How to view results in the LLM Calls explorer

Setup

Configure your Waxell credentials via environment variables:

export WAXELL_API_URL="https://acme.waxell.dev"
export WAXELL_API_KEY="wax_sk_..."
export OPENAI_API_KEY="sk-..."

Then initialize in your code:

import waxell_observe as waxell

waxell.init() # reads WAXELL_API_URL and WAXELL_API_KEY from env

Or pass credentials directly:

import waxell_observe as waxell

waxell.init(api_key="wax_sk_...", api_url="https://acme.waxell.dev")

The simplest approach -- call init() before importing OpenAI and every call is captured automatically. No decorators, no context managers, no manual recording.

import waxell_observe as waxell

waxell.init() # must come before importing openai

from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is the capital of France?"}],
)
print(response.choices[0].message.content)
# Automatically traced with model, tokens, cost, latency

That's it. Every OpenAI call is now captured with model name, token counts, cost estimates, and latency. LLM calls made outside any decorator or context manager are automatically buffered and flushed to auto-generated runs.

Step 2: @observe Decorator (Named Traces)

Add @observe when you want named traces with automatic IO capture, enrichment, and policy enforcement. Combined with init(), LLM calls are recorded automatically -- you just add structure and metadata.

import waxell_observe as waxell

waxell.init()

from openai import OpenAI

client = OpenAI()


@waxell.observe(agent_name="my-chatbot")
def chat(message: str) -> str:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": message}],
)

text = response.choices[0].message.content

# Enrich the trace with top-level convenience functions
waxell.tag("domain", "general-knowledge")
waxell.score("answer_length", len(text))
waxell.metadata("model", "gpt-4o")

return text


# Use it -- just call your function normally
result = chat("What is the capital of France?")
print(result)

The decorator handles:

  • Starting and completing the execution run
  • Capturing function inputs and outputs
  • Checking policies before execution (set enforce_policy=False to skip)
  • Creating an OTel trace span
LLM Call Recording

With init() active, OpenAI calls are recorded automatically. You don't need waxell_ctx.record_llm_call() unless you're using a provider that isn't auto-instrumented.

Async version:

@waxell.observe(agent_name="my-chatbot")
async def chat(message: str) -> str:
response = await openai.AsyncOpenAI().chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": message}],
)
return response.choices[0].message.content

Step 3: Context Manager (Advanced)

Use WaxellContext when you need explicit control over session IDs, user tracking, or when your agent logic spans multiple functions.

import waxell_observe as waxell
from waxell_observe import WaxellContext

waxell.init()

from openai import OpenAI

client = OpenAI()


def chat_with_context(message: str, session_id: str, user_id: str) -> str:
with WaxellContext(
agent_name="my-chatbot",
workflow_name="chat",
inputs={"message": message},
session_id=session_id,
user_id=user_id,
) as ctx:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": message}],
)

text = response.choices[0].message.content

# LLM call recorded automatically by init()
# Add structured metadata
ctx.set_tag("environment", "production")
ctx.set_result({"output": text})

return text


result = chat_with_context(
message="What is the capital of France?",
session_id="sess_abc123",
user_id="user_42",
)

The context manager gives you access to:

  • ctx.record_step() -- record execution steps
  • ctx.record_score() -- attach quality scores
  • ctx.set_tag() -- add searchable tags
  • ctx.set_metadata() -- add metadata to the trace
  • ctx.set_result() -- set the run result
  • ctx.run_id -- the assigned run ID

Step 4: Manual Recording (Legacy / Custom Providers)

For custom LLM providers that aren't auto-instrumented, use WaxellObserveClient directly. This is the most verbose approach and is only recommended when the other patterns don't fit.

from waxell_observe import WaxellObserveClient

client = WaxellObserveClient(
api_url="https://acme.waxell.dev",
api_key="wax_sk_...",
)

async def chat_manual(message: str) -> str:
# 1. Start a run
run_info = await client.start_run(
agent_name="my-chatbot",
workflow_name="chat",
inputs={"message": message},
)

try:
# 2. Make your LLM call (any provider)
response = call_custom_llm(message) # your custom LLM call

# 3. Record the LLM call
await client.record_llm_calls(
run_id=run_info.run_id,
calls=[
{
"model": "custom-model-v2",
"tokens_in": 150,
"tokens_out": 80,
"cost": 0.003,
"task": "chat",
}
],
)

# 4. Complete the run
await client.complete_run(
run_id=run_info.run_id,
result={"output": response},
status="success",
)

return response

except Exception as e:
await client.complete_run(
run_id=run_info.run_id,
status="error",
error=str(e),
)
raise

Compare Approaches

FeatureAuto-InstrumentationDecoratorContext ManagerManual
Lines of code2FewModerateMost
LLM call captureAutomaticAutomatic (with init())Automatic (with init())Manual
IO capture--AutomaticManual via set_result()Manual
Session/user tracking--Via paramsVia paramsVia metadata
Policy enforcement--AutomaticAutomaticCall check_policy() yourself
OTel tracingAutomaticAutomaticAutomaticNot included
Error handling--AutomaticAutomaticManual try/catch
Custom providersSupported providers onlyYesYesAny provider

When to use each:

  • Auto-instrumentation -- Best for most cases. Two lines and you're done. Start here.
  • Decorator -- Add when you want named agent traces, IO capture, or enrichment (tags, scores, metadata).
  • Context Manager -- Advanced. Use when you need explicit session IDs, user tracking, or multi-function workflows.
  • Manual -- Legacy / custom providers. Only when auto-instrumentation doesn't support your provider.

Viewing Results in the LLM Calls Explorer

After instrumenting your code, every recorded LLM call appears in the Observability > LLM Calls explorer. You can filter by:

  • Model -- See all calls to a specific model
  • Agent -- Filter to a specific agent
  • Cost range -- Find expensive calls
  • Token range -- Find high-token calls
  • Date range -- Narrow to a time window
  • Search -- Full-text search across model, task, prompt, and response previews

Each LLM call record includes:

  • Model name, task label
  • Input/output token counts and total
  • Estimated cost (using system or custom model pricing)
  • Prompt and response previews (first 500 characters)
  • Linked agent run and workflow
  • Timestamp

Next Steps