Behavior Tracking
Waxell Observe tracks agent behaviors at three levels of effort:
- Auto-instrumented -- LLM calls (157 providers), tool-call decisions, and vector DB retrievals captured with zero code
- Decorators -- Wrap functions with
@tool,@decision,@retrieval,@reasoning,@retry, or@stepfor automatic recording - Manual -- Call
ctx.record_*()methods or top-level convenience functions for full control
Overview
| Behavior | Decorator | Convenience Function | Context Method |
|---|---|---|---|
| Tool calls | @waxell.tool | -- | ctx.record_tool_call() |
| Retrievals | @waxell.retrieval | waxell.retrieve() | ctx.record_retrieval() |
| Decisions | @waxell.decision | waxell.decide() | ctx.record_decision() |
| Reasoning | @waxell.reasoning_dec | waxell.reason() | ctx.record_reasoning() |
| Retries | @waxell.retry_dec | waxell.retry() | ctx.record_retry() |
| Steps | @waxell.step_dec | waxell.step() | ctx.record_step() |
Tool Calls
@tool decorator
Auto-record function calls as tool invocations with zero boilerplate:
import waxell_observe as waxell
@waxell.tool(tool_type="vector_db")
def search_index(index, query_vec, k: int = 5):
distances, indices = index.search(query_vec, k)
return {"distances": distances, "indices": indices}
# Every call auto-records: name, inputs, output, duration_ms, status
# No-op when called outside a WaxellContext
| Parameter | Type | Default | Description |
|---|---|---|---|
name | str | None | None | Tool name. Defaults to the function name |
tool_type | str | "function" | Classification: "function", "vector_db", "database", "api" |
Manual recording
waxell_ctx.record_tool_call(
name="web_search",
input={"query": query},
output={"result_count": len(results)},
duration_ms=250,
status="ok",
tool_type="api",
)
| Parameter | Type | Default | Description |
|---|---|---|---|
name | str | (required) | Tool name |
input | dict | str | "" | Tool input parameters |
output | dict | str | "" | Tool output/result |
duration_ms | int | None | None | Execution time in milliseconds |
status | str | "ok" | "ok" or "error" |
tool_type | str | "function" | Classification |
error | str | "" | Error message if status is "error" |
Retrievals
Auto-instrumented retrievals
When you use a supported vector database SDK, retrieval operations are captured automatically with zero code. Waxell includes instrumentors for Pinecone, Chroma, Weaviate, Qdrant, Milvus, FAISS, LanceDB, pgvector, MongoDB Atlas Vector Search, Elasticsearch, OpenSearch, Marqo, and many more.
# This Pinecone query automatically records a retrieval span
results = index.query(vector=embedding, top_k=10, namespace="docs")
# Retrieval auto-recorded: source="pinecone", top_k=10, matches_count=10
@retrieval decorator
Auto-record search and retrieval operations:
import waxell_observe as waxell
@waxell.retrieval(source="pinecone")
async def search_documents(query: str, top_k: int = 5) -> list[dict]:
results = await vector_store.search(query, top_k=top_k)
return [{"id": r.id, "title": r.title, "score": r.score} for r in results]
# Auto-records: query (first string arg), documents (return value),
# scores (from doc["score"] fields), source, duration_ms
| Parameter | Type | Default | Description |
|---|---|---|---|
source | str | "" | Data source name (e.g., "faiss", "pinecone") |
name | str | None | None | Override name. Defaults to function name |
Convenience function
waxell.retrieve(
query="AI safety papers",
documents=[{"id": 1, "title": "Safety Guidelines", "score": 0.95}],
source="faiss",
scores=[0.95],
)
Manual recording
waxell_ctx.record_retrieval(
query=query,
documents=[{"id": d.id, "title": d.title, "score": d.score} for d in docs],
source="pinecone",
duration_ms=120,
top_k=5,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
query | str | (required) | The retrieval query string |
documents | list[dict] | (required) | Retrieved documents |
source | str | "" | Data source name |
scores | list[float] | None | None | Relevance scores for each document |
duration_ms | int | None | None | Retrieval time in milliseconds |
top_k | int | None | None | Number of documents requested |
Decisions
Waxell provides three layers of decision recording, from zero-effort to manual:
Auto-instrumented decisions
When an LLM response contains tool_calls (OpenAI/Groq/Mistral) or tool_use blocks (Anthropic), the auto-instrumentor records the model's tool selection as a decision. No code needed.
# This OpenAI call with tools automatically records a decision span
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Search for AI safety docs"}],
tools=[{"type": "function", "function": {"name": "search", ...}}],
)
# Decision auto-recorded: name="tool_call:search", instrumentation_source="auto"
@decision decorator
Wrap any classification/routing function to auto-record its return value as a decision:
import waxell_observe as waxell
@waxell.decision(name="classify_query", options=["factual", "analytical", "creative"])
async def classify_query(query: str) -> dict:
response = await client.chat.completions.create(...)
return {"chosen": "analytical", "reasoning": "Complex multi-doc query"}
# Returns the dict AND auto-records the decision
# instrumentation_source="decorator"
| Parameter | Type | Default | Description |
|---|---|---|---|
name | str | None | None | Decision name. Defaults to function name |
options | list[str] | None | None | Available choices |
Return value handling: If the function returns a dict, the SDK extracts chosen, reasoning, and confidence fields. If it returns a str, the entire string is used as chosen.
waxell.decide() convenience function
For inline decisions that don't warrant a separate function:
waxell.decide(
"retrieval_strategy",
chosen="semantic_search",
options=["semantic_search", "keyword_search", "hybrid"],
reasoning="Analytical query benefits from semantic similarity",
confidence=0.88,
)
# instrumentation_source="manual"
ctx.record_decision() -- full control
waxell_ctx.record_decision(
name="output_format",
options=["brief", "detailed", "bullet_points"],
chosen="detailed",
reasoning="User query is analytical",
confidence=0.85,
metadata={"user_preference": "verbose"},
)
| Parameter | Type | Default | Description |
|---|---|---|---|
name | str | (required) | Decision name |
options | list[str] | (required) | Available choices |
chosen | str | (required) | The selected option |
reasoning | str | "" | Why this option was chosen |
confidence | float | None | None | Confidence score (0.0-1.0) |
metadata | dict | None | None | Additional context |
Instrumentation source tracking
Each decision span includes an instrumentation_source attribute indicating how it was captured:
| Source | Value | Meaning |
|---|---|---|
| Auto-instrumentor | "auto" | Detected from LLM tool_calls/tool_use response |
| @decision decorator | "decorator" | Captured by the @decision wrapper |
waxell.decide() / ctx.record_decision() | "manual" | Explicitly recorded by user code |
Reasoning
@reasoning decorator
Auto-record chain-of-thought steps from a function's return value:
import waxell_observe as waxell
@waxell.reasoning_dec(step="quality_check")
async def assess_quality(answer: str, sources: list) -> dict:
return {
"thought": "Answer covers all source material with proper citations",
"evidence": [f"Source: {s['title']}" for s in sources],
"conclusion": "High quality, ready to present",
}
| Parameter | Type | Default | Description |
|---|---|---|---|
step | str | None | None | Reasoning step name. Defaults to function name |
Return value handling: If the function returns a dict, extracts thought, evidence, conclusion. If it returns a str, uses the string as thought.
Convenience function
waxell.reason(
step="evaluate_sources",
thought="Source A is more recent but Source B has higher authority",
evidence=["Source A: published 2024", "Source B: cited 500 times"],
conclusion="Use Source B as primary",
)
Manual recording
waxell_ctx.record_reasoning(
step="evaluate_sources",
thought="Source A is more recent but Source B has higher authority",
evidence=["Source A: published 2024", "Source B: cited 500 times"],
conclusion="Use Source B as primary, Source A as supplement",
)
| Parameter | Type | Default | Description |
|---|---|---|---|
step | str | (required) | Reasoning step name |
thought | str | (required) | The reasoning text/thought process |
evidence | list[str] | None | None | Supporting evidence or references |
conclusion | str | "" | Conclusion reached at this step |
Retries
@retry decorator
Wrap a function with retry logic AND automatic retry recording:
import waxell_observe as waxell
@waxell.retry_dec(max_attempts=3, strategy="retry")
async def call_llm(prompt: str) -> str:
response = await client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
# On failure, retries up to 3 times, recording each attempt as a retry span.
# After exhausting attempts, re-raises the last exception.
| Parameter | Type | Default | Description |
|---|---|---|---|
max_attempts | int | 3 | Maximum number of attempts (including first) |
strategy | str | "retry" | "retry", "fallback", or "circuit_break" |
fallback_to | str | "" | Name of fallback target |
Convenience function
waxell.retry(
attempt=1,
reason="OpenAI rate limited",
strategy="fallback",
original_error="429 Too Many Requests",
fallback_to="claude-sonnet-4",
)
Manual recording
waxell_ctx.record_retry(
attempt=1,
reason="OpenAI rate limited",
strategy="fallback",
original_error="429 Too Many Requests",
fallback_to="claude-sonnet-4",
max_attempts=3,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
attempt | int | (required) | Current attempt number (1-based) |
reason | str | (required) | Why a retry/fallback occurred |
strategy | str | "retry" | "retry", "fallback", or "circuit_break" |
original_error | str | "" | The error that triggered the retry |
fallback_to | str | "" | Name of fallback target |
max_attempts | int | None | None | Maximum attempts configured |
Steps
@step decorator
Auto-record function calls as execution steps:
import waxell_observe as waxell
@waxell.step_dec(name="preprocess")
async def preprocess_query(query: str) -> dict:
cleaned = query.strip().lower()
return {"original": query, "cleaned": cleaned}
# Auto-records: step(name="preprocess", output={"original": ..., "cleaned": ...})
| Parameter | Type | Default | Description |
|---|---|---|---|
name | str | None | None | Step name. Defaults to function name |
Convenience function
waxell.step("preprocess", output={"cleaned": query.strip()})
Manual recording
waxell_ctx.record_step("retrieve", output={"doc_count": len(docs)})
How It Works
Behavior tracking methods buffer data in two ways:
- Steps -- Each call creates a step record (e.g.,
tool:web_search,retrieval:pinecone,decision:route_to_agent) that appears in the run's step list. - Spans -- Each call creates a behavior span with structured input/output data, flushed to the server via
POST /runs/{run_id}/spans/on context exit.
Both are sent automatically when the WaxellContext exits -- you don't need to flush manually.
Full Example
This example uses decorators for zero-boilerplate recording:
import waxell_observe as waxell
waxell.init(api_key="wax_sk_...", api_url="https://acme.waxell.dev")
@waxell.retrieval(source="pinecone")
async def search_docs(query: str, top_k: int = 10) -> list[dict]:
return await vector_store.search(query, top_k=top_k)
@waxell.decision(name="approach", options=["summarize", "compare", "deep_dive"])
async def choose_approach(query: str) -> dict:
return {"chosen": "deep_dive", "reasoning": "Query asks for detailed analysis"}
@waxell.tool(tool_type="api")
async def run_analysis(docs: list) -> dict:
return await analysis_service.analyze(docs)
@waxell.reasoning_dec(step="quality_check")
async def check_quality(result: dict) -> dict:
return {"thought": "Analysis is thorough", "conclusion": "Ready to present"}
@waxell.observe(agent_name="research-pipeline")
async def run_pipeline(query: str):
docs = await search_docs(query, top_k=10)
approach = await choose_approach(query)
analysis = await run_analysis(docs)
quality = await check_quality(analysis)
waxell.score("quality", 0.92)
return {"result": analysis, "approach": approach["chosen"]}
Next Steps
- Decorator Pattern --
@observe,@tool,@decision, and more - Context Manager -- Full reference for
WaxellContext - REST API Reference --
POST /runs/{run_id}/spans/endpoint