Prompt Management
Waxell Observe includes a full prompt management system. Version your prompts, assign deployment labels like "production" and "staging", retrieve them at runtime via the SDK, and test variants in the playground -- all with content hashing that links prompts to their LLM call traces.
Core Concepts
Versions
Each prompt has a sequential version history (v1, v2, v3, ...). When you update a prompt's content, a new version is created. Old versions are preserved for auditing and rollback.
Labels
Labels are named pointers to specific versions. Common labels:
| Label | Purpose |
|---|---|
production | The version currently served to users |
staging | The version being tested before promotion |
latest | The most recently created version |
Labels can be moved between versions at any time. For example, promoting staging to production is a single label update.
Content Types
| Type | Content Format | Use Case |
|---|---|---|
text | Plain string with {{variable}} placeholders | System prompts, simple templates |
chat | Array of {role, content} messages | Multi-message conversation templates |
Content Hashing
Every prompt version gets a SHA-256 hash of its content. When the SDK retrieves a prompt and uses it in an LLM call, the prompt_hash field on the LlmCallRecord links back to the exact version used. This gives you full traceability from production call to prompt version.
SDK Usage
Retrieving Prompts
Use the client to fetch a prompt by name, optionally specifying a label or version:
from waxell_observe import WaxellObserveClient
client = WaxellObserveClient()
# Fetch the "production" label (recommended for production code)
prompt = await client.get_prompt("welcome-message", label="production")
# Fetch a specific version
prompt = await client.get_prompt("welcome-message", version=3)
# Fetch the latest version (default when no label or version specified)
prompt = await client.get_prompt("welcome-message")
Synchronous version:
prompt = client.get_prompt_sync(name="welcome-message", label="production")
Compiling Templates
The returned PromptInfo object has a compile() method that substitutes {{variable}} placeholders:
prompt = await client.get_prompt("welcome-message", label="production")
# Text prompt: returns a string
rendered = prompt.compile(user_name="Alice", company="Acme Corp")
# "Hello Alice! Welcome to Acme Corp."
# Chat prompt: returns a list of messages
rendered = prompt.compile(user_name="Alice")
# [{"role": "system", "content": "You are a helpful assistant for Alice."},
# {"role": "user", "content": "Hello!"}]
Full Example
Use @observe for the run and let init()'s auto-instrumentation capture the LLM call. Fetch the prompt inside the function and use it normally -- the SDK links the LLM call to the prompt version via content hashing.
import waxell_observe as waxell
from waxell_observe import WaxellObserveClient
waxell.init() # BEFORE importing the LLM SDK -- enables auto-capture
import openai
openai_client = openai.OpenAI()
prompt_client = WaxellObserveClient()
@waxell.observe(agent_name="chat-agent")
async def chat(user_query: str, user_display_name: str, relevant_docs: str) -> str:
# Fetch the production prompt
prompt = await prompt_client.get_prompt("chat-system-prompt", label="production")
# Compile with variables
system_message = prompt.compile(
user_name=user_display_name,
context=relevant_docs,
)
# LLM call -- auto-captured by init(), linked to prompt version via content hash
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_message},
{"role": "user", "content": user_query},
],
)
return response.choices[0].message.content
PromptInfo Object
The get_prompt methods return a PromptInfo dataclass:
| Field | Type | Description |
|---|---|---|
name | str | Prompt name |
version | int | Version number |
prompt_type | str | "text" or "chat" |
content | str | list | Raw content (string for text, message list for chat) |
config | dict | Associated configuration (model, temperature, etc.) |
labels | list[str] | Labels pointing to this version |
REST API
All prompt management endpoints require session authentication (UI).
Prompts CRUD
| Endpoint | Method | Description |
|---|---|---|
/api/v1/prompts/ | GET | List all prompts with latest version info and labels |
/api/v1/prompts/ | POST | Create a prompt with initial version |
/api/v1/prompts/{id}/ | GET | Prompt detail with all versions and labels |
/api/v1/prompts/{id}/ | PUT | Update prompt metadata (name, description, tags) |
/api/v1/prompts/{id}/ | DELETE | Delete prompt and all versions/labels |
Create a Prompt
curl -X POST "https://acme.waxell.dev/api/v1/prompts/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"name": "chat-system-prompt",
"description": "System prompt for the chat agent",
"prompt_type": "text",
"content": "You are a helpful assistant for {{user_name}}. Answer questions about {{topic}}.",
"config": {"model": "gpt-4o", "temperature": 0.7},
"tags": ["chat", "production"],
"commit_message": "Initial version"
}'
Versions
| Endpoint | Method | Description |
|---|---|---|
/api/v1/prompts/{id}/versions/ | GET | List all versions |
/api/v1/prompts/{id}/versions/ | POST | Create a new version |
/api/v1/prompts/{id}/versions/{num}/ | GET | Get specific version with full content |
Create a new version:
curl -X POST "https://acme.waxell.dev/api/v1/prompts/{prompt_id}/versions/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"content": "You are a helpful assistant for {{user_name}}. Answer questions about {{topic}} concisely.",
"config": {"model": "gpt-4o", "temperature": 0.5},
"commit_message": "Added conciseness instruction, lowered temperature"
}'
Labels
| Endpoint | Method | Description |
|---|---|---|
/api/v1/prompts/{id}/labels/{label}/ | PUT | Set or move a label to a version |
/api/v1/prompts/{id}/labels/{label}/ | DELETE | Remove a label |
Set the "production" label to version 3:
curl -X PUT "https://acme.waxell.dev/api/v1/prompts/{prompt_id}/labels/production/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{"version": 3}'
Promote staging to production (two calls):
# Get the version that "staging" points to
STAGING_VERSION=$(curl -s ".../api/v1/prompts/{id}/" -H "Cookie: sessionid=..." \
| jq '.labels[] | select(.label=="staging") | .version')
# Move "production" to that version
curl -X PUT ".../api/v1/prompts/{id}/labels/production/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d "{\"version\": $STAGING_VERSION}"
Playground
Test prompts with variable substitution and compare variants side by side.
Execute a single prompt:
POST /api/v1/prompts/playground/
curl -X POST "https://acme.waxell.dev/api/v1/prompts/playground/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"content": "Summarize the following in one sentence: {{text}}",
"config": {"model": "gpt-4o-mini", "temperature": 0.3, "max_tokens": 256},
"variables": {"text": "Waxell is an observability platform for AI agents..."}
}'
Response:
{
"output": "Waxell provides observability and governance for AI agents.",
"model": "gpt-4o-mini",
"tokens_in": 42,
"tokens_out": 12,
"cost": 0.0001,
"latency_ms": 340
}
Compare multiple variants:
POST /api/v1/prompts/playground/compare/
curl -X POST "https://acme.waxell.dev/api/v1/prompts/playground/compare/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"variants": [
{
"content": "Summarize: {{text}}",
"config": {"model": "gpt-4o-mini", "temperature": 0.3}
},
{
"content": "Give a one-sentence summary of: {{text}}",
"config": {"model": "gpt-4o-mini", "temperature": 0.7}
},
{
"content": "Summarize: {{text}}",
"config": {"model": "gpt-4o", "temperature": 0.3}
}
]
}'
Up to 10 variants can be compared in a single request. Each result includes output, token counts, cost, and latency for direct comparison.
Prompt Metrics
GET /api/v1/prompts/{id}/metrics/
Shows usage metrics per version, linked via content hash to LlmCallRecord:
{
"prompt_name": "chat-system-prompt",
"totals": {
"call_count": 1520,
"total_tokens": 456000,
"total_cost": 2.345678
},
"versions": [
{
"version": 3,
"content_hash": "a1b2c3...",
"call_count": 1200,
"total_tokens": 360000,
"total_cost": 1.845678
},
{
"version": 2,
"content_hash": "d4e5f6...",
"call_count": 320,
"total_tokens": 96000,
"total_cost": 0.500000
}
]
}
Label Cascade (Fallback to Latest)
By default, requesting a label that does not exist raises an error. You can opt into fallback behavior where a missing label resolves to the latest version instead:
from waxell_observe import WaxellObserveClient
client = WaxellObserveClient()
# Strict mode (default) -- raises if "staging" label doesn't exist
prompt = await client.get_prompt("welcome-message", label="staging")
# Cascade mode -- falls back to latest if "staging" is missing
prompt = await client.get_prompt(
"welcome-message",
label="staging",
fallback_to_latest=True,
)
Cascade rules:
- Triggers only when
fallback_to_latest=True, a label is specified, and the label does not exist on this prompt - Pinned-version lookups (
version=3) never cascade -- they fail loudly so you notice the version is gone - A warning is logged when the fallback fires:
label 'staging' missing for 'welcome-message' -- falling back to latest - The result is cached under the original label key for the standard 30-second TTL
Do not enable fallback_to_latest=True as a global default. It silently weakens label guarantees. Use it only in environments where "any version is better than none" (e.g., local development, demo instances).
Rollback
Labels have an audit trail that records every move (e.g., production moved from v2 to v3). The rollback endpoint lets you revert a label to the version it was on before a specific move, with a reason for the audit log:
# Rollback the most recent label move
curl -X POST "https://acme.waxell.dev/api/v1/prompts/{prompt_id}/label-history/{event_id}/rollback/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{"reason": "incident-12: prod regression"}'
Response:
{
"label": "production",
"version": 2,
"rolled_back_from_event_id": "...",
"prior_version": 3,
"created": false
}
The rollback creates a new history entry so the full chain of moves remains auditable. Rollback is refused in these cases:
| Status | Reason |
|---|---|
| 400 | The event is a CREATED event (no prior version to revert to) |
| 403 | The label is protected -- use the normal label PUT flow which routes through approval |
| 410 | The target version was deleted |
Prompt Discovery
Discovery surfaces recurring unregistered prompts -- LLM calls that aren't linked to any prompt in the registry. Waxell clusters them by a SHA-256 fingerprint of the prompt content.
Viewing Discovered Prompts
# List discovered prompt clusters
curl "https://acme.waxell.dev/api/v1/prompts/discover/?days=7" \
-H "Cookie: sessionid=..."
Each cluster shows the fingerprint, a content preview, the agents that used it, and a call count.
Registering Discovered Prompts
Register clusters one at a time or in batch. Batch registration auto-names prompts from the agent name and handles collisions by appending _2, _3, etc.:
# Register all discovered clusters in one call
curl -X POST "https://acme.waxell.dev/api/v1/prompts/discover/register-batch/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{
"items": [
{"fingerprint": "a1b2c3d4e5f6a7b8"},
{"fingerprint": "f8e7d6c5b4a39281", "name": "custom-name"}
],
"default_label": "production",
"default_tags": ["discovered"]
}'
Response:
{
"registered": [
{"fingerprint": "a1b2c3d4e5f6a7b8", "name": "support-bot_system", "prompt_id": "...", "version": 1},
{"fingerprint": "f8e7d6c5b4a39281", "name": "custom-name", "prompt_id": "...", "version": 1}
],
"skipped": []
}
Ignoring Clusters
Dismiss noise (test data, one-off scripts) so it stops appearing in the discover view:
# Ignore a cluster for 30 days
curl -X POST "https://acme.waxell.dev/api/v1/prompts/discover/ignored/" \
-H "Cookie: sessionid=..." \
-H "Content-Type: application/json" \
-d '{"fingerprint": "a1b2c3d4e5f6a7b8", "days": 30, "note": "PII test data"}'
Ignored clusters automatically resurface when the ignore window expires. You can also remove an ignore early with DELETE /api/v1/prompts/discover/ignored/{fingerprint}/.
CLI
The wax CLI provides the same discover and register workflow:
# View unregistered prompt clusters
wax prompt discover
# Register all clusters (with dry-run first)
wax prompt register-all-clusters --dry-run
wax prompt register-all-clusters --label production --tags discovered
Workflow: Prompt Lifecycle
- Create a prompt with an initial version
- Test in the playground with different variables and configurations
- Label the tested version as
staging - Deploy by pointing
productionto the staging version - Monitor via prompt metrics to compare version performance
- Rollback if a regression is detected -- revert the label with an audit trail
- Discover unregistered prompts in production and bring them into the registry
- Iterate by creating new versions and repeating the cycle
Next Steps
- LLM Call Tracking -- See how prompts map to production LLM calls via content hashing
- Scoring -- Attach quality scores to runs that use versioned prompts
- Evaluators -- Attach automated quality checks to prompt versions
- Datasets & Experiments -- Compare prompt versions systematically