Skip to main content

Behavior Tracking

Waxell Observe tracks agent behaviors at three levels of effort:

  1. Auto-instrumented -- LLM calls (157 providers), tool-call decisions, and vector DB retrievals captured with zero code
  2. Decorators -- Wrap functions with @tool, @decision, @retrieval, @reasoning, @retry, or @step for automatic recording
  3. Manual -- Call ctx.record_*() methods or top-level convenience functions for full control

Overview

BehaviorDecoratorConvenience FunctionContext Method
Tool calls@waxell.tool--ctx.record_tool_call()
Retrievals@waxell.retrievalwaxell.retrieve()ctx.record_retrieval()
Decisions@waxell.decisionwaxell.decide()ctx.record_decision()
Reasoning@waxell.reasoning_decwaxell.reason()ctx.record_reasoning()
Retries@waxell.retry_decwaxell.retry()ctx.record_retry()
Steps@waxell.step_decwaxell.step()ctx.record_step()

Tool Calls

@tool decorator

Auto-record function calls as tool invocations with zero boilerplate:

import waxell_observe as waxell

@waxell.tool(tool_type="vector_db")
def search_index(index, query_vec, k: int = 5):
distances, indices = index.search(query_vec, k)
return {"distances": distances, "indices": indices}

# Every call auto-records: name, inputs, output, duration_ms, status
# No-op when called outside a WaxellContext
ParameterTypeDefaultDescription
namestr | NoneNoneTool name. Defaults to the function name
tool_typestr"function"Classification: "function", "vector_db", "database", "api"

Manual recording

waxell_ctx.record_tool_call(
name="web_search",
input={"query": query},
output={"result_count": len(results)},
duration_ms=250,
status="ok",
tool_type="api",
)
ParameterTypeDefaultDescription
namestr(required)Tool name
inputdict | str""Tool input parameters
outputdict | str""Tool output/result
duration_msint | NoneNoneExecution time in milliseconds
statusstr"ok""ok" or "error"
tool_typestr"function"Classification
errorstr""Error message if status is "error"

Retrievals

Auto-instrumented retrievals

When you use a supported vector database SDK, retrieval operations are captured automatically with zero code. Waxell includes instrumentors for Pinecone, Chroma, Weaviate, Qdrant, Milvus, FAISS, LanceDB, pgvector, MongoDB Atlas Vector Search, Elasticsearch, OpenSearch, Marqo, and many more.

# This Pinecone query automatically records a retrieval span
results = index.query(vector=embedding, top_k=10, namespace="docs")
# Retrieval auto-recorded: source="pinecone", top_k=10, matches_count=10

@retrieval decorator

Auto-record search and retrieval operations:

import waxell_observe as waxell

@waxell.retrieval(source="pinecone")
async def search_documents(query: str, top_k: int = 5) -> list[dict]:
results = await vector_store.search(query, top_k=top_k)
return [{"id": r.id, "title": r.title, "score": r.score} for r in results]

# Auto-records: query (first string arg), documents (return value),
# scores (from doc["score"] fields), source, duration_ms
ParameterTypeDefaultDescription
sourcestr""Data source name (e.g., "faiss", "pinecone")
namestr | NoneNoneOverride name. Defaults to function name

Convenience function

waxell.retrieve(
query="AI safety papers",
documents=[{"id": 1, "title": "Safety Guidelines", "score": 0.95}],
source="faiss",
scores=[0.95],
)

Manual recording

waxell_ctx.record_retrieval(
query=query,
documents=[{"id": d.id, "title": d.title, "score": d.score} for d in docs],
source="pinecone",
duration_ms=120,
top_k=5,
)
ParameterTypeDefaultDescription
querystr(required)The retrieval query string
documentslist[dict](required)Retrieved documents
sourcestr""Data source name
scoreslist[float] | NoneNoneRelevance scores for each document
duration_msint | NoneNoneRetrieval time in milliseconds
top_kint | NoneNoneNumber of documents requested

Decisions

Waxell provides three layers of decision recording, from zero-effort to manual:

Auto-instrumented decisions

When an LLM response contains tool_calls (OpenAI/Groq/Mistral) or tool_use blocks (Anthropic), the auto-instrumentor records the model's tool selection as a decision. No code needed.

# This OpenAI call with tools automatically records a decision span
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Search for AI safety docs"}],
tools=[{"type": "function", "function": {"name": "search", ...}}],
)
# Decision auto-recorded: name="tool_call:search", instrumentation_source="auto"

@decision decorator

Wrap any classification/routing function to auto-record its return value as a decision:

import waxell_observe as waxell

@waxell.decision(name="classify_query", options=["factual", "analytical", "creative"])
async def classify_query(query: str) -> dict:
response = await client.chat.completions.create(...)
return {"chosen": "analytical", "reasoning": "Complex multi-doc query"}

# Returns the dict AND auto-records the decision
# instrumentation_source="decorator"
ParameterTypeDefaultDescription
namestr | NoneNoneDecision name. Defaults to function name
optionslist[str] | NoneNoneAvailable choices

Return value handling: If the function returns a dict, the SDK extracts chosen, reasoning, and confidence fields. If it returns a str, the entire string is used as chosen.

waxell.decide() convenience function

For inline decisions that don't warrant a separate function:

waxell.decide(
"retrieval_strategy",
chosen="semantic_search",
options=["semantic_search", "keyword_search", "hybrid"],
reasoning="Analytical query benefits from semantic similarity",
confidence=0.88,
)
# instrumentation_source="manual"

ctx.record_decision() -- full control

waxell_ctx.record_decision(
name="output_format",
options=["brief", "detailed", "bullet_points"],
chosen="detailed",
reasoning="User query is analytical",
confidence=0.85,
metadata={"user_preference": "verbose"},
)
ParameterTypeDefaultDescription
namestr(required)Decision name
optionslist[str](required)Available choices
chosenstr(required)The selected option
reasoningstr""Why this option was chosen
confidencefloat | NoneNoneConfidence score (0.0-1.0)
metadatadict | NoneNoneAdditional context

Instrumentation source tracking

Each decision span includes an instrumentation_source attribute indicating how it was captured:

SourceValueMeaning
Auto-instrumentor"auto"Detected from LLM tool_calls/tool_use response
@decision decorator"decorator"Captured by the @decision wrapper
waxell.decide() / ctx.record_decision()"manual"Explicitly recorded by user code

Reasoning

@reasoning decorator

Auto-record chain-of-thought steps from a function's return value:

import waxell_observe as waxell

@waxell.reasoning_dec(step="quality_check")
async def assess_quality(answer: str, sources: list) -> dict:
return {
"thought": "Answer covers all source material with proper citations",
"evidence": [f"Source: {s['title']}" for s in sources],
"conclusion": "High quality, ready to present",
}
ParameterTypeDefaultDescription
stepstr | NoneNoneReasoning step name. Defaults to function name

Return value handling: If the function returns a dict, extracts thought, evidence, conclusion. If it returns a str, uses the string as thought.

Convenience function

waxell.reason(
step="evaluate_sources",
thought="Source A is more recent but Source B has higher authority",
evidence=["Source A: published 2024", "Source B: cited 500 times"],
conclusion="Use Source B as primary",
)

Manual recording

waxell_ctx.record_reasoning(
step="evaluate_sources",
thought="Source A is more recent but Source B has higher authority",
evidence=["Source A: published 2024", "Source B: cited 500 times"],
conclusion="Use Source B as primary, Source A as supplement",
)
ParameterTypeDefaultDescription
stepstr(required)Reasoning step name
thoughtstr(required)The reasoning text/thought process
evidencelist[str] | NoneNoneSupporting evidence or references
conclusionstr""Conclusion reached at this step

Retries

@retry decorator

Wrap a function with retry logic AND automatic retry recording:

import waxell_observe as waxell

@waxell.retry_dec(max_attempts=3, strategy="retry")
async def call_llm(prompt: str) -> str:
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content

# On failure, retries up to 3 times, recording each attempt as a retry span.
# After exhausting attempts, re-raises the last exception.
ParameterTypeDefaultDescription
max_attemptsint3Maximum number of attempts (including first)
strategystr"retry""retry", "fallback", or "circuit_break"
fallback_tostr""Name of fallback target

Convenience function

waxell.retry(
attempt=1,
reason="OpenAI rate limited",
strategy="fallback",
original_error="429 Too Many Requests",
fallback_to="claude-sonnet-4",
)

Manual recording

waxell_ctx.record_retry(
attempt=1,
reason="OpenAI rate limited",
strategy="fallback",
original_error="429 Too Many Requests",
fallback_to="claude-sonnet-4",
max_attempts=3,
)
ParameterTypeDefaultDescription
attemptint(required)Current attempt number (1-based)
reasonstr(required)Why a retry/fallback occurred
strategystr"retry""retry", "fallback", or "circuit_break"
original_errorstr""The error that triggered the retry
fallback_tostr""Name of fallback target
max_attemptsint | NoneNoneMaximum attempts configured

Steps

@step decorator

Auto-record function calls as execution steps:

import waxell_observe as waxell

@waxell.step_dec(name="preprocess")
async def preprocess_query(query: str) -> dict:
cleaned = query.strip().lower()
return {"original": query, "cleaned": cleaned}

# Auto-records: step(name="preprocess", output={"original": ..., "cleaned": ...})
ParameterTypeDefaultDescription
namestr | NoneNoneStep name. Defaults to function name

Convenience function

waxell.step("preprocess", output={"cleaned": query.strip()})

Manual recording

waxell_ctx.record_step("retrieve", output={"doc_count": len(docs)})

How It Works

Behavior tracking methods buffer data in two ways:

  1. Steps -- Each call creates a step record (e.g., tool:web_search, retrieval:pinecone, decision:route_to_agent) that appears in the run's step list.
  2. Spans -- Each call creates a behavior span with structured input/output data, flushed to the server via POST /runs/{run_id}/spans/ on context exit.

Both are sent automatically when the WaxellContext exits -- you don't need to flush manually.

Full Example

This example uses decorators for zero-boilerplate recording:

import waxell_observe as waxell

waxell.init(api_key="wax_sk_...", api_url="https://acme.waxell.dev")


@waxell.retrieval(source="pinecone")
async def search_docs(query: str, top_k: int = 10) -> list[dict]:
return await vector_store.search(query, top_k=top_k)


@waxell.decision(name="approach", options=["summarize", "compare", "deep_dive"])
async def choose_approach(query: str) -> dict:
return {"chosen": "deep_dive", "reasoning": "Query asks for detailed analysis"}


@waxell.tool(tool_type="api")
async def run_analysis(docs: list) -> dict:
return await analysis_service.analyze(docs)


@waxell.reasoning_dec(step="quality_check")
async def check_quality(result: dict) -> dict:
return {"thought": "Analysis is thorough", "conclusion": "Ready to present"}


@waxell.observe(agent_name="research-pipeline")
async def run_pipeline(query: str):
docs = await search_docs(query, top_k=10)
approach = await choose_approach(query)
analysis = await run_analysis(docs)
quality = await check_quality(analysis)

waxell.score("quality", 0.92)
return {"result": analysis, "approach": approach["chosen"]}

Next Steps