Common Mistakes

1. Importing LLM SDKs before `init()`

Wrong:

from openai import OpenAI          # Gets un-patched OpenAI
import waxell_observe as waxell
waxell.init()                      # Too late -- OpenAI already imported

client = OpenAI()
client.chat.completions.create(...)  # NOT auto-instrumented

Right:

import waxell_observe as waxell
waxell.init()                      # Patches OpenAI module

from openai import OpenAI          # Gets patched version
client = OpenAI()
client.chat.completions.create(...)  # Auto-instrumented

Alternative -- use drop-in imports (order doesn't matter):

from waxell_observe.openai import openai

client = openai.OpenAI()
client.chat.completions.create(...)  # Always auto-instrumented

2. Using behavior decorators without `@observe`

Behavior decorators (@tool, @decision, @retrieval, etc.) only record data when called inside an @observe or WaxellContext scope. Without a parent scope, they're silent no-ops.

Wrong -- no trace is created:

@waxell.tool(tool_type="api")
async def search(query: str):
    return await api.search(query)

@waxell.decision(name="route")
async def route(query: str):
    return {"chosen": "search"}

# These run fine but nothing is recorded
await route("test")
await search("test")

Right -- wrap the entry point with @observe:

@waxell.tool(tool_type="api")
async def search(query: str):
    return await api.search(query)

@waxell.decision(name="route")
async def route(query: str):
    return {"chosen": "search"}

@waxell.observe(agent_name="my-agent")  # Creates the run scope
async def run_agent(query: str):
    decision = await route(query)
    results = await search(query)
    return results

3. Recording LLM calls manually when auto-instrumentation handles it

If you called waxell.init(), LLM calls are captured automatically. Manually recording them creates duplicates.

Wrong -- double-counted:

waxell.init()  # Auto-instruments OpenAI

@waxell.observe(agent_name="my-agent")
async def my_agent(query: str, waxell_ctx=None):
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": query}],
    )  # Already auto-captured

    # This creates a DUPLICATE record
    if waxell_ctx:
        waxell_ctx.record_llm_call(model="gpt-4o", tokens_in=100, tokens_out=50)

    return response.choices[0].message.content

Right -- let auto-instrumentation handle it:

waxell.init()

@waxell.observe(agent_name="my-agent")
async def my_agent(query: str):
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": query}],
    )  # Auto-captured with accurate token counts and cost

    waxell.score("quality", 0.9)  # Add enrichment, not duplicate LLM records
    return response.choices[0].message.content

Only use ctx.record_llm_call() when auto-instrumentation isn't available (e.g., custom HTTP calls to LLM endpoints, or unsupported providers).

4. Forgetting to guard `waxell_ctx` in unit tests

If your function uses the injected waxell_ctx, it will be None when called without @observe (e.g., in tests).

Wrong -- crashes in tests:

@waxell.observe(agent_name="my-agent")
async def my_agent(query: str, waxell_ctx=None):
    response = await call_llm(query)
    waxell_ctx.record_score("quality", 0.9)  # AttributeError: NoneType has no attribute 'record_score'
    return response

Right -- guard or use convenience functions:

# Option 1: Guard with if
@waxell.observe(agent_name="my-agent")
async def my_agent(query: str, waxell_ctx=None):
    response = await call_llm(query)
    if waxell_ctx:
        waxell_ctx.record_score("quality", 0.9)
    return response

# Option 2: Use convenience functions (preferred -- they're no-ops outside a context)
@waxell.observe(agent_name="my-agent")
async def my_agent(query: str):
    response = await call_llm(query)
    waxell.score("quality", 0.9)  # Safe everywhere
    return response

5. Blocking on `waxell.flush()` in sync code

waxell.flush() is async. Calling it in sync code with asyncio.run() can cause issues if an event loop is already running.

Wrong:

import asyncio

@waxell.observe(agent_name="my-agent")
def sync_agent(query: str):
    result = process(query)
    asyncio.run(waxell.flush())  # RuntimeError if event loop is running
    return result

Right -- use the sync variant:

@waxell.observe(agent_name="my-agent")
def sync_agent(query: str):
    result = process(query)
    waxell.flush_sync()  # Works in sync code
    return result

Note: you usually don't need to flush manually. The context flushes automatically on exit.

6. Using `WaxellContext` when `@observe` would suffice

WaxellContext is powerful but verbose. For single-function agents, @observe is cleaner.

Verbose:

async def my_agent(query: str):
    async with WaxellContext(
        agent_name="my-agent",
        inputs={"query": query},
    ) as ctx:
        response = await call_llm(query)
        ctx.record_score("quality", 0.9)
        ctx.set_tag("type", "qa")
        ctx.set_result({"answer": response})
        return response

Clean:

@waxell.observe(agent_name="my-agent")
async def my_agent(query: str):
    response = await call_llm(query)
    waxell.score("quality", 0.9)
    waxell.tag("type", "qa")
    return response  # Auto-captured as result

Reserve WaxellContext for cases where decorators don't fit: multi-step orchestration, batch loops, conditional context creation, or explicit lifecycle control.

7. Nesting `@observe` decorators unintentionally

Each @observe creates a separate run. Nesting them creates parent-child runs, which may not be what you want.

Creates 2 runs per call:

@waxell.observe(agent_name="outer")
async def outer(query: str):
    return await inner(query)

@waxell.observe(agent_name="inner")
async def inner(query: str):
    return await call_llm(query)

If you want a single run with sub-steps, use @observe on the outer function and behavior decorators on inner functions:

Creates 1 run with a tool span:

@waxell.observe(agent_name="my-agent")
async def outer(query: str):
    return await inner(query)

@waxell.tool(tool_type="function")
async def inner(query: str):
    return await call_llm(query)

Nested @observe is correct for multi-agent architectures where each agent is a distinct run with its own lifecycle and policy checks.

8. Expecting `@observe` to work without configuration

@observe requires a configured client to create runs on the control plane. Without configuration, it logs a warning and runs your function without observability.

Silent failure:

# No init(), no env vars, no config file
@waxell.observe(agent_name="my-agent")
async def my_agent(query: str):
    return await call_llm(query)

await my_agent("test")
# WARNING: Client not configured, skipping run start
# Function runs fine, but no data in dashboard

Explicit setup:

import waxell_observe as waxell
waxell.init(api_key="wax_sk_...", api_url="https://acme.waxell.dev")

@waxell.observe(agent_name="my-agent")
async def my_agent(query: str):
    return await call_llm(query)

9. Mixing up `waxell.tag()` and `waxell.metadata()`

Tags are string-only and searchable in the dashboard and Grafana. Metadata accepts any JSON type but is for contextual information.

Wrong -- complex values as tags:

waxell.tag("config", '{"temperature": 0.7}')  # Stored as a string, not queryable

Right:

waxell.tag("environment", "production")        # String tag -- queryable
waxell.tag("model", "gpt-4o")                  # String tag -- queryable
waxell.metadata("config", {"temperature": 0.7})  # Structured data -- contextual

10. Not calling `handler.flush_sync()` with LangChain

The WaxellLangChainHandler buffers data during chain execution. If you don't flush, the run is never completed.

Wrong -- incomplete run:

handler = WaxellLangChainHandler(agent_name="my-chain")
result = chain.invoke(input, config={"callbacks": [handler]})
# Run is started but never completed -- shows as "running" forever in dashboard

Right:

handler = WaxellLangChainHandler(agent_name="my-chain")
result = chain.invoke(input, config={"callbacks": [handler]})
handler.flush_sync(result={"output": result.content})
# Run is completed with result

1. Importing LLM SDKs before init()​

2. Using behavior decorators without @observe​

3. Recording LLM calls manually when auto-instrumentation handles it​

4. Forgetting to guard waxell_ctx in unit tests​

5. Blocking on waxell.flush() in sync code​

6. Using WaxellContext when @observe would suffice​

7. Nesting @observe decorators unintentionally​

8. Expecting @observe to work without configuration​

9. Mixing up waxell.tag() and waxell.metadata()​

10. Not calling handler.flush_sync() with LangChain​

1. Importing LLM SDKs before `init()`

2. Using behavior decorators without `@observe`

3. Recording LLM calls manually when auto-instrumentation handles it

4. Forgetting to guard `waxell_ctx` in unit tests

5. Blocking on `waxell.flush()` in sync code

6. Using `WaxellContext` when `@observe` would suffice

7. Nesting `@observe` decorators unintentionally

8. Expecting `@observe` to work without configuration

9. Mixing up `waxell.tag()` and `waxell.metadata()`

10. Not calling `handler.flush_sync()` with LangChain