Prompt Management
Waxell Observe includes a full prompt management system. Version your prompts, assign deployment labels like "production" and "staging", retrieve them at runtime via the SDK, and test variants in the playground -- all with content hashing that links prompts to their LLM call traces.
Core Concepts
Versions
Each prompt has a sequential version history (v1, v2, v3, ...). When you update a prompt's content, a new version is created. Old versions are preserved for auditing and rollback.
Labels
Labels are named pointers to specific versions. Common labels:
| Label | Purpose |
|---|---|
production | The version currently served to users |
staging | The version being tested before promotion |
latest | The most recently created version |
Labels can be moved between versions at any time. For example, promoting staging to production is a single label update.
Content Types
| Type | Content Format | Use Case |
|---|---|---|
text | Plain string with {{variable}} placeholders | System prompts, simple templates |
chat | Array of {role, content} messages | Multi-message conversation templates |
Content Hashing
Every prompt version gets a SHA-256 hash of its content. When the SDK retrieves a prompt and uses it in an LLM call, the prompt_hash field on the LlmCallRecord links back to the exact version used. This gives you full traceability from production call to prompt version.
SDK Usage
Retrieving Prompts
Use the client to fetch a prompt by name, optionally specifying a label or version:
from waxell_observe import WaxellObserveClient
client = WaxellObserveClient()
# Fetch the "production" label (recommended for production code)
prompt = await client.get_prompt("welcome-message", label="production")
# Fetch a specific version
prompt = await client.get_prompt("welcome-message", version=3)
# Fetch the latest version (default when no label or version specified)
prompt = await client.get_prompt("welcome-message")
Synchronous version:
prompt = client.get_prompt_sync(name="welcome-message", label="production")
Compiling Templates
The returned PromptInfo object has a compile() method that substitutes {{variable}} placeholders:
prompt = await client.get_prompt("welcome-message", label="production")
# Text prompt: returns a string
rendered = prompt.compile(user_name="Alice", company="Acme Corp")
# "Hello Alice! Welcome to Acme Corp."
# Chat prompt: returns a list of messages
rendered = prompt.compile(user_name="Alice")
# [{"role": "system", "content": "You are a helpful assistant for Alice."},
# {"role": "user", "content": "Hello!"}]
Full Example
from waxell_observe import WaxellContext, WaxellObserveClient
client = WaxellObserveClient()
async with WaxellContext(agent_name="chat-agent", client=client) as ctx:
# Fetch the production prompt
prompt = await client.get_prompt("chat-system-prompt", label="production")
# Compile with variables
system_message = prompt.compile(
user_name=user.display_name,
context=relevant_docs,
)
# Use in LLM call
response = await openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_message},
{"role": "user", "content": user_query},
],
)
ctx.record_llm_call(
model="gpt-4o",
tokens_in=response.usage.prompt_tokens,
tokens_out=response.usage.completion_tokens,
)
PromptInfo Object
The get_prompt methods return a PromptInfo dataclass:
| Field | Type | Description |
|---|---|---|
name | str | Prompt name |
version | int | Version number |
prompt_type | str | "text" or "chat" |
content | str | list | Raw content (string for text, message list for chat) |
config | dict | Associated configuration (model, temperature, etc.) |
labels | list[str] | Labels pointing to this version |
REST API
All prompt management endpoints require session authentication (UI).
Prompts CRUD
| Endpoint | Method | Description |
|---|---|---|
/api/v1/prompts/ | GET | List all prompts with latest version info and labels |
/api/v1/prompts/ | POST | Create a prompt with initial version |
/api/v1/prompts/{id}/ | GET | Prompt detail with all versions and labels |
/api/v1/prompts/{id}/ | PUT | Update prompt metadata (name, description, tags) |
/api/v1/prompts/{id}/ | DELETE | Delete prompt and all versions/labels |
Create a Prompt
curl -X POST "https://acme.waxell.dev/api/v1/prompts/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"name": "chat-system-prompt",
"description": "System prompt for the chat agent",
"prompt_type": "text",
"content": "You are a helpful assistant for {{user_name}}. Answer questions about {{topic}}.",
"config": {"model": "gpt-4o", "temperature": 0.7},
"tags": ["chat", "production"],
"commit_message": "Initial version"
}'
Versions
| Endpoint | Method | Description |
|---|---|---|
/api/v1/prompts/{id}/versions/ | GET | List all versions |
/api/v1/prompts/{id}/versions/ | POST | Create a new version |
/api/v1/prompts/{id}/versions/{num}/ | GET | Get specific version with full content |
Create a new version:
curl -X POST "https://acme.waxell.dev/api/v1/prompts/{prompt_id}/versions/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"content": "You are a helpful assistant for {{user_name}}. Answer questions about {{topic}} concisely.",
"config": {"model": "gpt-4o", "temperature": 0.5},
"commit_message": "Added conciseness instruction, lowered temperature"
}'
Labels
| Endpoint | Method | Description |
|---|---|---|
/api/v1/prompts/{id}/labels/{label}/ | PUT | Set or move a label to a version |
/api/v1/prompts/{id}/labels/{label}/ | DELETE | Remove a label |
Set the "production" label to version 3:
curl -X PUT "https://acme.waxell.dev/api/v1/prompts/{prompt_id}/labels/production/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{"version": 3}'
Promote staging to production (two calls):
# Get the version that "staging" points to
STAGING_VERSION=$(curl -s ".../api/v1/prompts/{id}/" -H "Cookie: sessionid=..." \
| jq '.labels[] | select(.label=="staging") | .version')
# Move "production" to that version
curl -X PUT ".../api/v1/prompts/{id}/labels/production/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d "{\"version\": $STAGING_VERSION}"
Playground
Test prompts with variable substitution and compare variants side by side.
Execute a single prompt:
POST /api/v1/prompts/playground/
curl -X POST "https://acme.waxell.dev/api/v1/prompts/playground/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"content": "Summarize the following in one sentence: {{text}}",
"config": {"model": "gpt-4o-mini", "temperature": 0.3, "max_tokens": 256},
"variables": {"text": "Waxell is an observability platform for AI agents..."}
}'
Response:
{
"output": "Waxell provides observability and governance for AI agents.",
"model": "gpt-4o-mini",
"tokens_in": 42,
"tokens_out": 12,
"cost": 0.0001,
"latency_ms": 340
}
Compare multiple variants:
POST /api/v1/prompts/playground/compare/
curl -X POST "https://acme.waxell.dev/api/v1/prompts/playground/compare/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"variants": [
{
"content": "Summarize: {{text}}",
"config": {"model": "gpt-4o-mini", "temperature": 0.3}
},
{
"content": "Give a one-sentence summary of: {{text}}",
"config": {"model": "gpt-4o-mini", "temperature": 0.7}
},
{
"content": "Summarize: {{text}}",
"config": {"model": "gpt-4o", "temperature": 0.3}
}
]
}'
Up to 10 variants can be compared in a single request. Each result includes output, token counts, cost, and latency for direct comparison.
Prompt Metrics
GET /api/v1/prompts/{id}/metrics/
Shows usage metrics per version, linked via content hash to LlmCallRecord:
{
"prompt_name": "chat-system-prompt",
"totals": {
"call_count": 1520,
"total_tokens": 456000,
"total_cost": 2.345678
},
"versions": [
{
"version": 3,
"content_hash": "a1b2c3...",
"call_count": 1200,
"total_tokens": 360000,
"total_cost": 1.845678
},
{
"version": 2,
"content_hash": "d4e5f6...",
"call_count": 320,
"total_tokens": 96000,
"total_cost": 0.500000
}
]
}
Workflow: Prompt Lifecycle
- Create a prompt with an initial version
- Test in the playground with different variables and configurations
- Label the tested version as
staging - Deploy by pointing
productionto the staging version - Monitor via prompt metrics to compare version performance
- Iterate by creating new versions and repeating the cycle
Next Steps
- LLM Call Tracking -- See how prompts map to production LLM calls via content hashing
- Scoring -- Attach quality scores to runs that use versioned prompts