Skip to main content

Auto-Instrumentation

The simplest way to add observability to your AI agents -- two lines of code and all your LLM calls are automatically traced.

Quick Start

import waxell_observe
waxell_observe.init(api_key="wax_sk_...", api_url="https://waxell.dev")

# Import LLM SDKs AFTER init() -- they're now auto-instrumented
from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
# Automatically traced with model, tokens, cost, latency

How It Works

When you call waxell_observe.init(), the SDK:

  1. Detects installed LLM libraries (OpenAI, Anthropic, etc.)
  2. Patches their HTTP clients to capture request/response data
  3. Emits OpenTelemetry spans for each LLM call
  4. Records token counts, costs, and latencies automatically

LLM calls made outside any @observe decorator or WaxellContext are automatically buffered by a background collector and flushed to auto-generated runs (named auto:{model}). This means you get visibility into every LLM call without any additional code beyond init().

The init() Function

waxell_observe.init(
api_key: str = "", # Waxell API key (wax_sk_...)
api_url: str = "", # Waxell API URL
capture_content: bool = False, # Include prompt/response in traces
instrument: list[str] | None = None, # AI/ML library list (auto-detect if None)
instrument_infra: bool = True, # Auto-instrument infra (HTTP, DB, cache)
infra_libraries: list[str] | None = None, # Only these infra libs (None = all)
infra_exclude: list[str] | None = None, # Exclude these infra libs
resource_attributes: dict | None = None, # Custom OTel resource attributes
debug: bool = False, # Enable debug logging
prompt_guard: bool = False, # Enable client-side prompt guard
prompt_guard_server: bool = False, # Also check server-side guard (ML-powered)
prompt_guard_action: str = "block", # "block", "warn", or "redact"
)

See Installation & Configuration for full parameter details.

Configuration Priority

  1. Explicit arguments to init()
  2. Environment variables (WAXELL_API_KEY, WAXELL_API_URL)
  3. CLI config file (~/.waxell/config)

Environment Variables

export WAXELL_API_KEY="wax_sk_..."
export WAXELL_API_URL="https://waxell.dev"
export WAXELL_CAPTURE_CONTENT="true" # Include prompts in traces
export WAXELL_DEBUG="true" # Debug logging

Supported Libraries

LLM Providers

LibraryKeyNotes
OpenAIopenaiChat, completions, embeddings
AnthropicanthropicMessages API
Google GeminigeminiGemini API
AWS BedrockbedrockBedrock runtime
Mistral AImistralChat, embeddings
CoherecohereChat, embed, rerank
GroqgroqFast inference
LiteLLMlitellmUnified multi-provider API
OllamaollamaLocal model serving
Together AItogetherTogether inference API
Vertex AIvertex_aiGoogle Cloud AI
HuggingFacehuggingfaceInference API

Vector Databases

LibraryKeyNotes
PineconepineconeManaged vector DB
ChromaDBchromaEmbedded vector DB
WeaviateweaviateVector search engine
QdrantqdrantVector similarity search
MilvusmilvusDistributed vector DB
pgvectorpgvectorPostgreSQL vector extension
FAISSfaissFacebook AI similarity search
LanceDBlancedbServerless vector DB

Agent Frameworks

LibraryKeyNotes
LangChainlangchainChain and agent orchestration
CrewAIcrewaiMulti-agent collaboration
OpenAI Agents SDKopenai_agentsOpenAI agent framework
AutoGenautogenMulti-agent conversations
LlamaIndexllamaindexData framework for LLMs
HaystackhaystackNLP pipeline framework
PydanticAIpydanticaiType-safe AI agents
DSPydspyProgramming with foundation models
Google ADKgoogle_adkGoogle Agent Development Kit
Claude Agent SDKclaude_agentsAnthropic agent framework

Safety & Guardrails

LibraryKeyNotes
Guardrails AIguardrails_aiOutput validation
NeMo Guardrailsnemo_guardrailsProgrammable guardrails
LLM Guardllm_guardInput/output scanning
info

The tables above highlight the most commonly used libraries. The SDK supports 100+ libraries in total across additional categories including embeddings/rerankers, evaluation frameworks, voice/speech, RAG frameworks, local inference engines, and more. The full registry is defined in the instrumentor source.

Selective Instrumentation

To instrument only specific libraries:

waxell_observe.init(
api_key="wax_sk_...",
api_url="https://waxell.dev",
instrument=["openai", "anthropic"], # Only these two
)

Drop-in Imports

Alternative to init() -- import pre-instrumented modules:

# Instead of: from openai import OpenAI
from waxell_observe.openai import openai

client = openai.OpenAI()
response = client.chat.completions.create(...) # Auto-traced
# Instead of: import anthropic
from waxell_observe.anthropic import anthropic

client = anthropic.Anthropic()
response = client.messages.create(...) # Auto-traced

Import Order Matters

Auto-instrumentation patches LLM SDKs when they're imported. You must call init() before importing the SDK:

# CORRECT
import waxell_observe
waxell_observe.init(api_key="...")

from openai import OpenAI # Patched!
# WRONG - OpenAI already imported, won't be patched
from openai import OpenAI

import waxell_observe
waxell_observe.init(api_key="...") # Too late!

Adding Structure to Auto-Instrumented Calls

Auto-instrumentation captures LLM calls automatically. Add structure with decorators or context managers to group calls into runs, record behaviors, and enrich traces.

The simplest way to add structure -- decorators handle run tracking and behavior recording while init() handles LLM capture:

import waxell_observe as waxell

# Auto-instrument LLM SDKs
waxell.init(api_key="wax_sk_...", api_url="https://waxell.dev")

from openai import AsyncOpenAI

client = AsyncOpenAI()

@waxell.decision(name="classify_intent", options=["question", "action", "chitchat"])
async def classify(query: str) -> dict:
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Classify: {query}"}],
)
# LLM call auto-captured; return value recorded as decision
return {"chosen": "question", "reasoning": response.choices[0].message.content}

@waxell.observe(agent_name="support-bot")
async def handle_query(query: str) -> str:
# Auto-instrumented LLM calls + decorator-recorded behaviors
classification = await classify(query)

response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": query}],
)
answer = response.choices[0].message.content

# Enrich with scores and tags
waxell.score("helpfulness", 0.9)
waxell.tag("intent", classification["chosen"])
return answer

Context Manager + Auto-Instrumentation (Alternative)

Use the context manager when you need maximum control over recording -- for example, multiple policy checks or explicit run lifecycle management:

import waxell_observe
waxell_observe.init(api_key="wax_sk_...")

from waxell_observe import WaxellContext
from openai import OpenAI

client = OpenAI()

async with WaxellContext(agent_name="my-agent") as ctx:
# LLM calls are auto-traced AND linked to this context
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)

# Explicit context methods for fine-grained control
ctx.set_tag("user_type", "premium")
ctx.record_step("process", output={"status": "complete"})
await ctx.check_policy() # Mid-execution policy check

Kill Switch

Disable all observability without changing code:

export WAXELL_OBSERVE="false"  # or "0" or "no"

When disabled:

  • init() becomes a no-op
  • Context managers pass through without recording
  • Decorators execute functions without wrapping
  • No network calls to Waxell servers

Shutdown

Gracefully flush pending traces before exit:

waxell_observe.shutdown()

This is called automatically on process exit, but explicit shutdown ensures all data is flushed in:

  • Serverless functions (Lambda, Cloud Functions)
  • Short-lived scripts
  • Test suites

Programmatic Control

Manual Instrument/Uninstrument

from waxell_observe.instrumentors import instrument_all, uninstrument_all

# Instrument all detected libraries
results = instrument_all()
# {"openai": True, "anthropic": True, ...}

# Restore original behavior
uninstrument_all()

What Gets Captured

For each LLM call, auto-instrumentation records:

FieldDescription
modelModel name (gpt-4o, claude-sonnet-4, etc.)
tokens_inInput/prompt token count
tokens_outOutput/completion token count
costEstimated USD cost
latencyRequest duration
provideropenai, anthropic, etc.
prompt_previewFirst 500 chars of prompt (if capture_content=True)
response_previewFirst 500 chars of response (if capture_content=True)

Conversation Tracking (Automatic)

When auto-instrumentation is active, waxell automatically captures:

  • User messages — extracted from the messages array sent to the LLM
  • Agent responses — the final text response (not tool-calling intermediaries)
  • Context window metrics — message count, turn count, token utilization
  • System prompt tracking — detects system prompt changes across calls

This works across all 13+ supported providers with zero code changes. User messages appear as io:user_message spans and agent responses as io:agent_response spans in the trace timeline.

Deduplication

If you also call waxell.user_message() or waxell.agent_response() manually for the same content that was auto-captured, the duplicate is automatically suppressed.

See Conversation Tracking for full details.

Next Steps