Letta (MemGPT)
A Letta-style stateful conversation pipeline with long-term memory management across 3 agents. The runner loads agent state, reviews conversation history, searches archival memory, and sends messages. The evaluator synthesizes the response, assesses quality against archival sources, and updates agent memory.
Environment variables
This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.
Architecture
Key Code
Runner with Letta API operations as @tool and archival @retrieval
Each Letta API call is recorded as a tool call, with archival memory search as retrieval.
@waxell.tool(tool_type="agent_framework")
def letta_get_agent(agent_id: str) -> dict:
"""Load a Letta agent by ID."""
return {"agent_id": agent_id, "name": MOCK_AGENT_NAME,
"memory_state": MOCK_MEMORY_STATE}
@waxell.tool(tool_type="agent_framework")
def letta_send_message(agent_id: str, message: str) -> dict:
"""Send a message to a Letta agent and get response."""
response_messages = MOCK_SEND_RESPONSE["messages"]
return {"response_messages_count": len(response_messages),
"function_calls": len([m for m in response_messages
if m["message_type"] == "function_call"])}
@waxell.retrieval(source="letta_archival")
def search_archival_memory(query: str, agent_id: str) -> list[dict]:
"""Search a Letta agent's archival memory for relevant entries."""
return [{"text": entry["text"], "score": entry["score"],
"source": "archival_memory"} for entry in MOCK_ARCHIVAL_RESULTS]
Evaluator with @reasoning, memory update, and scores
The evaluator synthesizes the response, assesses quality, and updates archival memory.
@waxell.reasoning_dec(step="response_quality_assessment")
async def evaluate_response_quality(answer: str, archival_results: list, history_count: int) -> dict:
coverage = len([r for r in archival_results
if any(word in answer.lower() for word in r["text"].lower().split()[:3])])
return {
"thought": f"Evaluating against {len(archival_results)} archival sources.",
"evidence": [f"Archival: {r['text'][:60]}... (score: {r['score']})"
for r in archival_results],
"conclusion": "Response well-grounded in archival memory"
if coverage > 0 else "Response may lack archival grounding",
}
# Update agent memory
letta_archival_memory_insert(agent_id=agent_id, text=f"User asked about: {query[:100]}")
waxell.score("response_quality", 0.88, comment="against archival memory")
waxell.score("memory_coherence", True, data_type="boolean", comment="consistent with stored state")
What this demonstrates
@waxell.observe-- parent-child agent hierarchy with automatic lineage@waxell.step_dec-- pipeline initialization recorded as step@waxell.tool-- Letta API operations withtool_type="agent_framework"@waxell.retrieval-- archival memory search withsource="letta_archival"@waxell.decision-- memory strategy selection (archival, recall, or hybrid)waxell.decide()-- manual agent routing decision@waxell.reasoning_dec-- response quality assessment against archival sourceswaxell.score()-- response quality and memory coherence scores- Auto-instrumented LLM calls -- synthesis call captured automatically
- Letta pattern -- stateful agent with archival memory, recall buffer, and function calls
Run it
# Dry-run (no API keys needed)
cd dev/waxell-dev
python -m app.demos.letta_agent --dry-run
# Live (real OpenAI)
export OPENAI_API_KEY="sk-..."
python -m app.demos.letta_agent
# Custom query
python -m app.demos.letta_agent --query "Tell me about deployment patterns"