Working Memory
The Waxell runtime ships with a three-tier memory primitive that solves the context-window explosion problem in agentic loops. Tool results auto-capture into a fast scratchpad, the LLM sees compact summaries with reference handles, and references resolve to full data on the runtime side -- so the LLM never carries the bytes.
The problem it solves
A typical agentic loop has the LLM call a tool, see the full result, then decide the next action. After 10 turns of fetching prices, dividends, research notes, and benchmarks, the conversation history is 60KB+ of JSON the LLM has to re-read every turn. You either:
- Run out of context window
- Burn tokens re-reading data you already have
- Have the LLM hallucinate data it already fetched but can't see anymore
- Truncate history and lose information
Working memory inverts this. Tool results go to a scratchpad keyed by tool_name.turn. The LLM sees a one-line summary per result and references full data by name ($ref:prices.fetch.1) when it needs to pass it to the next tool. The runtime resolves the reference before dispatching, so the actual data flows tool-to-tool without ever entering the LLM's conversation.
Measured impact (finance portfolio_analyst, 6-phase pipeline):
| Metric | Without scratchpad | With scratchpad |
|---|---|---|
| Tokens per turn | 6K–28K (growing) | ~7K (flat) |
| Turns to complete | Runs out of context | 16 turns |
| Tool result re-reads | Every turn | Zero |
Three tiers at a glance
| Tier | Backend | Lifecycle | Purpose |
|---|---|---|---|
working | Redis | Per-run, cleared on completion | Scratchpad for in-flight tool results, auto-captured |
session | Postgres | 24h default | Conversation/plan state within a user session |
episodic | Postgres | TTL-based | Cross-run state: conversation history, cached results |
semantic | pgvector | Indefinite | Long-term facts with vector search |
All four tiers share the same MemorySpec API and the same tenant isolation guarantees. The tier= field selects the backend and lifecycle.
Python: declaring memory on an agent
Use the convenience constructors from waxell_sdk.core.specs.memory_spec:
from waxell_sdk import agent
from waxell_sdk.core.specs.memory_spec import (
scratchpad,
conversation_memory,
semantic_memory,
)
@agent(
name="portfolio_analyst",
memory={
# Tier 1: auto-captures every domain tool call during a run
"scratchpad": scratchpad(ttl="1h"),
# Tier 2: per-user conversation across runs
"conversation": conversation_memory(scope="user_id", ttl="30d"),
# Tier 3: per-client facts the agent has learned
"client_context": semantic_memory(
scope=["user_id", "portfolio_id"],
description="Facts about this advisor's relationship with this client",
),
},
...
)
class PortfolioAnalyst:
...
Or use MemorySpec directly when you need control:
from waxell_sdk.core.specs.memory_spec import MemorySpec
memory = {
"scratchpad": MemorySpec(
tier="working",
scope=["run_id"],
auto_capture=True,
ttl="1h",
max_size_mb=20,
),
"cached_analysis": MemorySpec(
tier="episodic",
scope=["user_id", "portfolio_id"],
type="dict",
ttl="24h",
),
}
YAML: declaring memory in waxell.yaml
memory:
scratchpad:
tier: working
scope: [run_id]
auto_capture: true
ttl: 1h
conversation:
tier: episodic
scope: user_id
type: list
ttl: 30d
max_items: 100
client_knowledge:
tier: semantic
scope: [user_id, portfolio_id]
searchable: true
description: Facts about this client relationship
For all available fields and validation rules, see the waxell.yaml memory reference.
How working memory captures tool results
When tier="working" and auto_capture=true, the runtime intercepts every successful domain call and stores the result:
- Agent calls
market_data.fetch_prices(tickers=["AAPL"], days=90) - The HTTP call returns 15KB of OHLC data
- Runtime stores the value in the scratchpad as
prices.fetch.1(turn-1) - The LLM's next system prompt rebuild includes a summary block:
Available scratchpad refs:
- $ref:market_data.fetch_prices.1 — OHLC for [AAPL] (90 days, 15KB) - The LLM sees the summary, not the 15KB blob
- When the LLM calls the next tool, it can pass the reference:
calculate_performance(prices="$ref:market_data.fetch_prices.1", ...) - Runtime resolves
$ref:market_data.fetch_prices.1to the full data before dispatch - Next tool gets the real data; the LLM stays compact
The $ref prefix is configurable via reference_syntax on the MemorySpec. Input signals are also seeded into the scratchpad automatically (as input_<name>.1) so the LLM can pass them around the same way.
Tenant isolation
Two layers of isolation, either of which catches a bug in the other:
Physical layer. The Redis backend prefixes every key with tenant:{tenant_id}:. The _tenant_context() context manager extracts tenant_id from the scope key and sets the ContextVar around each Redis operation -- so isolation holds even when the Celery worker has no upstream tenant context set.
Logical layer. The scope key itself always starts with tenant_id, enforced by MemoryScopeResolver regardless of what dimensions the developer declares in scope=[...].
Result: a memory write under tenant A cannot be read by tenant B even if scope keys collide, even if a worker has stale context, even if the dev forgot to include tenant_id in the scope list. tenant_id is always prepended automatically.
Scope dimensions
Multi-dimensional scoping composes on top of tenant_id:
| Dimension | Source | Use case |
|---|---|---|
tenant_id | Always (automatic) | Outermost boundary |
agent | Agent name | Per-agent within tenant |
agent_version | Agent version | Per-version state |
user_id | Sub-user identity | Per end-user |
user_group | Sub-user identity | Per user group |
session_id | Conversation session | Per conversation |
workflow | Workflow name | Per workflow within agent |
run_id | Execution run | Per execution (ephemeral) |
channel_id | Slack/chat | Per chat channel |
thread_ts | Slack | Per thread |
Combine them as needed:
# Per-(advisor, client) cached analysis
MemorySpec(
tier="episodic",
scope=["user_id", "portfolio_id"],
type="dict",
ttl="24h",
)
When to use which tier
| Need | Tier | Why |
|---|---|---|
| Pass tool results between LLM turns without burning tokens | working | Auto-captures, refs resolve before dispatch |
| Remember the last 100 messages per user | episodic (conversation_memory()) | Postgres-backed, TTL-eviction |
| Cache an expensive calculation per (advisor, client) for 24h | episodic | Multi-dimensional scope, TTL |
| Conversation/plan state within a session that spans multiple workflow runs | session | First-class tier with 24h default TTL |
| Learn facts about a client over time and recall by similarity | semantic (semantic_memory()) | Vector search, indefinite retention |
Typed memory schemas
Memory slots can declare a Pydantic schema for read/write validation. The runtime stamps a schema_version on every stored value so migrations can pick the right deserializer when the schema evolves:
from pydantic import BaseModel
class ClientNote(BaseModel):
summary: str
confidence: float
last_updated: str
memory = {
"notes": MemorySpec(
tier="semantic",
scope=["user_id", "portfolio_id"],
schema=ClientNote,
schema_version=2,
),
}
On write, the runtime validates against the schema and serializes the model. On read, it rehydrates back to a ClientNote instance. A validation failure on read raises ValueError so the agent can catch and migrate.
Schemas are optional -- omit schema= for pre-3.4 dict-like behavior.
Common gotchas
tenant_idis implicit. Don't include it in yourscope=[...]list -- the runtime always prepends it. Listing it explicitly works but is redundant.- Working tier ignores
type. Scratchpad entries are alwaysScratchpadEntrydataclass values;type="dict"/type="list"only applies to episodic/session tiers. auto_captureis per-spec, not global. If you declare twotier="working"slots, you can opt one in and one out.$refresolution is recursive. Pass$ref:foo.1as a value, a nested dict field, or inside a list -- the runtime walks the structure and resolves all refs.- Local dev uses in-memory backends.
InMemoryWorkingMemorymimics Redis for tests; the Redis backend takes over in production via the infrasetup.pyready hook.
Reference
- waxell.yaml memory field reference -- field-by-field YAML schema
- Execution Context -- how
ctx.scratchpadis plumbed through agent runs - Workflow Envelope -- run lifecycle that drives scratchpad TTL