Behavior Tracking

Waxell Observe tracks agent behaviors at three levels of effort:

Auto-instrumented -- LLM calls (157 providers), tool-call decisions, and vector DB retrievals captured with zero code
Decorators -- Wrap functions with @tool, @decision, @retrieval, @reasoning, @retry, or @step for automatic recording
Manual -- Call ctx.record_*() methods or top-level convenience functions for full control

Overview

Behavior	Decorator	Convenience Function	Context Method
Tool calls	`@waxell.tool`	--	`ctx.record_tool_call()`
Retrievals	`@waxell.retrieval`	`waxell.retrieve()`	`ctx.record_retrieval()`
Decisions	`@waxell.decision`	`waxell.decide()`	`ctx.record_decision()`
Reasoning	`@waxell.reasoning_dec`	`waxell.reason()`	`ctx.record_reasoning()`
Retries	`@waxell.retry_dec`	`waxell.retry()`	`ctx.record_retry()`
Steps	`@waxell.step_dec`	`waxell.step()`	`ctx.record_step()`
User messages	--	`waxell.user_message()`	`ctx.record_user_message()`
Agent responses	--	`waxell.agent_response()`	`ctx.record_agent_response()`
Communication	--	`waxell.communication()`	`ctx.record_communication()`
Human input	--	`waxell.input()`	`ctx.input()`
Human turns	--	`waxell.human_turn()`	`ctx.human_turn()`
Approvals	--	`waxell.approval_request()` / `approval_response()`	`ctx.record_approval_request()` / `record_approval_response()`

Tool Calls

@tool decorator

Auto-record function calls as tool invocations with zero boilerplate:

import waxell_observe as waxell

@waxell.tool(tool_type="vector_db")
def search_index(index, query_vec, k: int = 5):
    distances, indices = index.search(query_vec, k)
    return {"distances": distances, "indices": indices}

# Every call auto-records: name, inputs, output, duration_ms, status
# No-op when called outside a WaxellContext

Parameter	Type	Default	Description
`name`	`str \| None`	`None`	Tool name. Defaults to the function name
`tool_type`	`str`	`"function"`	Classification: `"function"`, `"vector_db"`, `"database"`, `"api"`

Manual recording

waxell_ctx.record_tool_call(
    name="web_search",
    input={"query": query},
    output={"result_count": len(results)},
    duration_ms=250,
    status="ok",
    tool_type="api",
)

Parameter	Type	Default	Description
`name`	`str`	(required)	Tool name
`input`	`dict \| str`	`""`	Tool input parameters
`output`	`dict \| str`	`""`	Tool output/result
`duration_ms`	`int \| None`	`None`	Execution time in milliseconds
`status`	`str`	`"ok"`	`"ok"` or `"error"`
`tool_type`	`str`	`"function"`	Classification
`error`	`str`	`""`	Error message if status is `"error"`

Retrievals

Auto-instrumented retrievals

When you use a supported vector database SDK, retrieval operations are captured automatically with zero code. Waxell includes instrumentors for Pinecone, Chroma, Weaviate, Qdrant, Milvus, FAISS, LanceDB, pgvector, MongoDB Atlas Vector Search, Elasticsearch, OpenSearch, Marqo, and many more.

# This Pinecone query automatically records a retrieval span
results = index.query(vector=embedding, top_k=10, namespace="docs")
# Retrieval auto-recorded: source="pinecone", top_k=10, matches_count=10

@retrieval decorator

Auto-record search and retrieval operations:

import waxell_observe as waxell

@waxell.retrieval(source="pinecone")
async def search_documents(query: str, top_k: int = 5) -> list[dict]:
    results = await vector_store.search(query, top_k=top_k)
    return [{"id": r.id, "title": r.title, "score": r.score} for r in results]

# Auto-records: query (first string arg), documents (return value),
# scores (from doc["score"] fields), source, duration_ms

Parameter	Type	Default	Description
`source`	`str`	`""`	Data source name (e.g., `"faiss"`, `"pinecone"`)
`name`	`str \| None`	`None`	Override name. Defaults to function name

Convenience function

waxell.retrieve(
    query="AI safety papers",
    documents=[{"id": 1, "title": "Safety Guidelines", "score": 0.95}],
    source="faiss",
    scores=[0.95],
)

Manual recording

waxell_ctx.record_retrieval(
    query=query,
    documents=[{"id": d.id, "title": d.title, "score": d.score} for d in docs],
    source="pinecone",
    duration_ms=120,
    top_k=5,
)

Parameter	Type	Default	Description
`query`	`str`	(required)	The retrieval query string
`documents`	`list[dict]`	(required)	Retrieved documents
`source`	`str`	`""`	Data source name
`scores`	`list[float] \| None`	`None`	Relevance scores for each document
`duration_ms`	`int \| None`	`None`	Retrieval time in milliseconds
`top_k`	`int \| None`	`None`	Number of documents requested

Decisions

Waxell provides three layers of decision recording, from zero-effort to manual:

Auto-instrumented decisions

When an LLM response contains tool_calls (OpenAI/Groq/Mistral) or tool_use blocks (Anthropic), the auto-instrumentor records the model's tool selection as a decision. No code needed.

# This OpenAI call with tools automatically records a decision span
response = await client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Search for AI safety docs"}],
    tools=[{"type": "function", "function": {"name": "search", ...}}],
)
# Decision auto-recorded: name="tool_call:search", instrumentation_source="auto"

@decision decorator

Wrap any classification/routing function to auto-record its return value as a decision:

import waxell_observe as waxell

@waxell.decision(name="classify_query", options=["factual", "analytical", "creative"])
async def classify_query(query: str) -> dict:
    response = await client.chat.completions.create(...)
    return {"chosen": "analytical", "reasoning": "Complex multi-doc query"}

# Returns the dict AND auto-records the decision
# instrumentation_source="decorator"

Parameter	Type	Default	Description
`name`	`str \| None`	`None`	Decision name. Defaults to function name
`options`	`list[str] \| None`	`None`	Available choices

Return value handling: If the function returns a dict, the SDK extracts chosen, reasoning, and confidence fields. If it returns a str, the entire string is used as chosen.

waxell.decide() convenience function

For inline decisions that don't warrant a separate function:

waxell.decide(
    "retrieval_strategy",
    chosen="semantic_search",
    options=["semantic_search", "keyword_search", "hybrid"],
    reasoning="Analytical query benefits from semantic similarity",
    confidence=0.88,
)
# instrumentation_source="manual"

ctx.record_decision() -- full control

waxell_ctx.record_decision(
    name="output_format",
    options=["brief", "detailed", "bullet_points"],
    chosen="detailed",
    reasoning="User query is analytical",
    confidence=0.85,
    metadata={"user_preference": "verbose"},
)

Parameter	Type	Default	Description
`name`	`str`	(required)	Decision name
`options`	`list[str]`	(required)	Available choices
`chosen`	`str`	(required)	The selected option
`reasoning`	`str`	`""`	Why this option was chosen
`confidence`	`float \| None`	`None`	Confidence score (0.0-1.0)
`metadata`	`dict \| None`	`None`	Additional context

Instrumentation source tracking

Each decision span includes an instrumentation_source attribute indicating how it was captured:

Source	Value	Meaning
Auto-instrumentor	`"auto"`	Detected from LLM `tool_calls`/`tool_use` response
@decision decorator	`"decorator"`	Captured by the `@decision` wrapper
`waxell.decide()` / `ctx.record_decision()`	`"manual"`	Explicitly recorded by user code

Reasoning

@reasoning decorator

Auto-record chain-of-thought steps from a function's return value:

import waxell_observe as waxell

@waxell.reasoning_dec(step="quality_check")
async def assess_quality(answer: str, sources: list) -> dict:
    return {
        "thought": "Answer covers all source material with proper citations",
        "evidence": [f"Source: {s['title']}" for s in sources],
        "conclusion": "High quality, ready to present",
    }

Parameter	Type	Default	Description
`step`	`str \| None`	`None`	Reasoning step name. Defaults to function name

Return value handling: If the function returns a dict, extracts thought, evidence, conclusion. If it returns a str, uses the string as thought.

Convenience function

waxell.reason(
    step="evaluate_sources",
    thought="Source A is more recent but Source B has higher authority",
    evidence=["Source A: published 2024", "Source B: cited 500 times"],
    conclusion="Use Source B as primary",
)

Manual recording

waxell_ctx.record_reasoning(
    step="evaluate_sources",
    thought="Source A is more recent but Source B has higher authority",
    evidence=["Source A: published 2024", "Source B: cited 500 times"],
    conclusion="Use Source B as primary, Source A as supplement",
)

Parameter	Type	Default	Description
`step`	`str`	(required)	Reasoning step name
`thought`	`str`	(required)	The reasoning text/thought process
`evidence`	`list[str] \| None`	`None`	Supporting evidence or references
`conclusion`	`str`	`""`	Conclusion reached at this step

Retries

@retry decorator

Wrap a function with retry logic AND automatic retry recording:

import waxell_observe as waxell

@waxell.retry_dec(max_attempts=3, strategy="retry")
async def call_llm(prompt: str) -> str:
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content

# On failure, retries up to 3 times, recording each attempt as a retry span.
# After exhausting attempts, re-raises the last exception.

Parameter	Type	Default	Description
`max_attempts`	`int`	`3`	Maximum number of attempts (including first)
`strategy`	`str`	`"retry"`	`"retry"`, `"fallback"`, or `"circuit_break"`
`fallback_to`	`str`	`""`	Name of fallback target

Convenience function

waxell.retry(
    attempt=1,
    reason="OpenAI rate limited",
    strategy="fallback",
    original_error="429 Too Many Requests",
    fallback_to="claude-sonnet-4",
)

Manual recording

waxell_ctx.record_retry(
    attempt=1,
    reason="OpenAI rate limited",
    strategy="fallback",
    original_error="429 Too Many Requests",
    fallback_to="claude-sonnet-4",
    max_attempts=3,
)

Parameter	Type	Default	Description
`attempt`	`int`	(required)	Current attempt number (1-based)
`reason`	`str`	(required)	Why a retry/fallback occurred
`strategy`	`str`	`"retry"`	`"retry"`, `"fallback"`, or `"circuit_break"`
`original_error`	`str`	`""`	The error that triggered the retry
`fallback_to`	`str`	`""`	Name of fallback target
`max_attempts`	`int \| None`	`None`	Maximum attempts configured

Steps

@step decorator

Auto-record function calls as execution steps:

import waxell_observe as waxell

@waxell.step_dec(name="preprocess")
async def preprocess_query(query: str) -> dict:
    cleaned = query.strip().lower()
    return {"original": query, "cleaned": cleaned}

# Auto-records: step(name="preprocess", output={"original": ..., "cleaned": ...})

Parameter	Type	Default	Description
`name`	`str \| None`	`None`	Step name. Defaults to function name

Convenience function

waxell.step("preprocess", output={"cleaned": query.strip()})

Manual recording

waxell_ctx.record_step("retrieve", output={"doc_count": len(docs)})

Conversation Tracking

Record user/agent messages for interactive agents (chat, REPL, conversational UIs):

Convenience functions

waxell.user_message("What's the weather in Paris?")
# ... agent processes ...
waxell.agent_response("It's currently 22°C and sunny in Paris.")

Manual recording

waxell_ctx.record_user_message("What's the weather?", metadata={"channel": "web"})
waxell_ctx.record_agent_response("It's sunny in Paris.", metadata={"model": "gpt-4o"})

These create IO spans that appear in the trace timeline alongside LLM calls and tool invocations. See Conversation Tracking for full details.

Communication

Record outbound messages sent via external channels for communication governance:

Convenience function

waxell.communication(
    channel="slack",
    recipient="#ops-alerts",
    body="Deployment completed successfully",
    subject="Deploy Notification",
)

Manual recording

waxell_ctx.record_communication(
    channel="email",
    recipient="user@example.com",
    body="Your report is ready",
    subject="Report Complete",
    metadata={"template": "report_ready_v2"},
)

Supported channels: "slack", "email", "sms", "webhook", or any custom string.

Human-in-the-Loop

Record interactions where a human provides input during agent execution:

Drop-in input replacement

answer = waxell.input("Approve deployment? (y/n): ")
# Records prompt, response, and wait time as a human_turn span

Context manager for non-terminal channels

with waxell.human_turn(prompt="Approve?", channel="slack", action="deployment_approval") as turn:
    response = await wait_for_slack_reaction()
    turn.set_response(response)
# Records the full interaction with timing

One-shot recording

waxell.human_interaction(
    prompt="Pick a target environment",
    response="staging",
    channel="slack",
    elapsed_seconds=12.5,
)

Approval Lifecycle

Record approval workflows triggered by policy blocks:

# Record that an approval request was sent
waxell.approval_request(
    action_type="high_cost_query",
    approvers=["ops-team@company.com"],
    timeout_minutes=10,
    reason="Estimated cost exceeds $5",
)

# Record the approval decision
waxell.approval_response(
    action_type="high_cost_query",
    decision="approved",
    approver="ops-lead@company.com",
    elapsed_seconds=45.0,
)

Advanced Governance Recording

These ctx.record_*() methods are available for detailed governance telemetry. They are typically used with WaxellContext for fine-grained control:

# Code execution tracking
ctx.record_code_execution(language="python", code="df.head()", error="")

# Database access tracking
ctx.record_data_access(table="users", operation="SELECT", record_count=150, columns=["name", "email"])

# Network request tracking
ctx.record_network_request(url="https://api.example.com/data")

# Scope/impact tracking
ctx.record_scope_impact(records_modified=50, files_changed=2, transaction_total=150.0)

# Grounding quality tracking
ctx.record_grounding(query="AI safety", documents=[...], score_distribution={"high": 3, "low": 1})

# Delegation tracking (multi-agent)
ctx.record_delegation(delegated_to="research-agent", task="Find recent papers", complexity="high")

# Memory write tracking (for memory governance policies)
ctx.set_memory_state(memory_items=[{"type": "preference", "key": "theme"}], memory_item_count=1)
ctx.record_memory_write(memory_type="preference", content="theme=dark_mode")

For agents deployed to third-party platforms (IBM watsonx.ai, SageMaker, Vertex, etc.) that need to populate the controlplane Memory tab over HTTPS, see the memory write REST endpoints and the watsonx-deploy agent example for the full pattern.

How It Works

Behavior tracking methods buffer data in two ways:

Steps -- Each call creates a step record (e.g., tool:web_search, retrieval:pinecone, decision:route_to_agent) that appears in the run's step list.
Spans -- Each call creates a behavior span with structured input/output data, flushed to the server via POST /runs/{run_id}/spans/ on context exit.

Both are sent automatically when the WaxellContext exits -- you don't need to flush manually.

Full Example

This example uses decorators for zero-boilerplate recording:

import waxell_observe as waxell

waxell.init(api_key="wax_sk_...", api_url="https://acme.waxell.dev")


@waxell.retrieval(source="pinecone")
async def search_docs(query: str, top_k: int = 10) -> list[dict]:
    return await vector_store.search(query, top_k=top_k)


@waxell.decision(name="approach", options=["summarize", "compare", "deep_dive"])
async def choose_approach(query: str) -> dict:
    return {"chosen": "deep_dive", "reasoning": "Query asks for detailed analysis"}


@waxell.tool(tool_type="api")
async def run_analysis(docs: list) -> dict:
    return await analysis_service.analyze(docs)


@waxell.reasoning_dec(step="quality_check")
async def check_quality(result: dict) -> dict:
    return {"thought": "Analysis is thorough", "conclusion": "Ready to present"}


@waxell.observe(agent_name="research-pipeline")
async def run_pipeline(query: str):
    docs = await search_docs(query, top_k=10)
    approach = await choose_approach(query)
    analysis = await run_analysis(docs)
    quality = await check_quality(analysis)

    waxell.score("quality", 0.92)
    return {"result": analysis, "approach": approach["chosen"]}

Next Steps

Decorator Pattern -- @observe, @tool, @decision, and more
Context Manager -- Advanced lifecycle control with WaxellContext
Conversation Tracking -- Auto-captured conversation data
Python SDK Reference -- Complete API reference
FAQ -- Common questions answered
Common Mistakes -- Avoid these gotchas

Overview​

Tool Calls​

@tool decorator​

Manual recording​

Retrievals​

Auto-instrumented retrievals​

@retrieval decorator​

Convenience function​

Manual recording​

Decisions​

Auto-instrumented decisions​

@decision decorator​

waxell.decide() convenience function​

ctx.record_decision() -- full control​

Instrumentation source tracking​

Reasoning​

@reasoning decorator​

Convenience function​

Manual recording​

Retries​

@retry decorator​

Convenience function​

Manual recording​

Steps​

@step decorator​

Convenience function​

Manual recording​

Conversation Tracking​

Convenience functions​

Manual recording​

Communication​

Convenience function​

Manual recording​

Human-in-the-Loop​

Drop-in input replacement​

Context manager for non-terminal channels​

One-shot recording​

Approval Lifecycle​

Advanced Governance Recording​

How It Works​

Full Example​

Next Steps​

Overview

Tool Calls

@tool decorator

Manual recording

Retrievals

Auto-instrumented retrievals

@retrieval decorator

Convenience function

Manual recording

Decisions

Auto-instrumented decisions

@decision decorator

waxell.decide() convenience function

ctx.record_decision() -- full control

Instrumentation source tracking

Reasoning

@reasoning decorator

Convenience function

Manual recording

Retries

@retry decorator

Convenience function

Manual recording

Steps

@step decorator

Convenience function

Manual recording

Conversation Tracking

Convenience functions

Manual recording

Communication

Convenience function

Manual recording

Human-in-the-Loop

Drop-in input replacement

Context manager for non-terminal channels

One-shot recording

Approval Lifecycle

Advanced Governance Recording

How It Works

Full Example

Next Steps