Manage Prompts in Production
Manage your prompts as versioned, labeled artifacts that you can update, test, and deploy without changing code or redeploying your application.
Prerequisites
- Python 3.10+
waxell-observeinstalled and configured with an API key- A running Waxell instance
What You'll Learn
- Create and version prompts via the REST API
- Use labels (
production,staging) for safe deployments - Fetch and compile prompts in your application code
- Test prompts in the playground before deploying
- Promote prompts from staging to production
Step 1: Create a Prompt
Create a new prompt definition via the REST API:
curl -X POST "https://acme.waxell.dev/api/v1/prompts/" \
-H "X-Wax-Key: wax_sk_..." \
-H "Content-Type: application/json" \
-d '{
"name": "support-responder",
"description": "Generates customer support responses",
"type": "text"
}'
This creates a prompt container. Next, add a version with actual content.
Step 2: Add Version 1
Add the first version of the prompt content. Use {{variable}} syntax for template placeholders:
curl -X POST "https://acme.waxell.dev/api/v1/prompts/support-responder/versions/" \
-H "X-Wax-Key: wax_sk_..." \
-H "Content-Type: application/json" \
-d '{
"content": "You are a helpful customer support agent for {{company_name}}.\n\nCustomer question: {{question}}\n\nContext from knowledge base:\n{{context}}\n\nProvide a clear, friendly response. If you are unsure, say so honestly.",
"config": {
"model": "gpt-4o",
"temperature": 0.7,
"max_tokens": 1024
}
}'
Response:
{
"version": 1,
"content": "You are a helpful customer support agent for {{company_name}}...",
"config": {
"model": "gpt-4o",
"temperature": 0.7,
"max_tokens": 1024
},
"created_at": "2025-01-15T10:00:00Z"
}
Step 3: Label for Production
Labels point to specific versions. The production label tells your application which version to use:
curl -X POST "https://acme.waxell.dev/api/v1/prompts/support-responder/labels/" \
-H "X-Wax-Key: wax_sk_..." \
-H "Content-Type: application/json" \
-d '{
"label": "production",
"version": 1
}'
Labels are mutable pointers. You can reassign production to a different version at any time, and all applications fetching that label will pick up the new version on their next request.
Step 4: Fetch the Prompt in Your Application
Use the SDK to fetch the production prompt and compile it with your variables:
from waxell_observe import WaxellObserveClient
client = WaxellObserveClient()
# Fetch the production version
prompt = client.get_prompt_sync(
name="support-responder",
label="production",
)
# Compile with template variables
compiled = prompt.compile(
company_name="Acme Corp",
question="How do I reset my password?",
context="Password reset is available in Settings > Security > Reset Password.",
)
print(compiled)
# Output:
# You are a helpful customer support agent for Acme Corp.
#
# Customer question: How do I reset my password?
#
# Context from knowledge base:
# Password reset is available in Settings > Security > Reset Password.
#
# Provide a clear, friendly response. If you are unsure, say so honestly.
The prompt object also carries the config dict, so you can use it for model parameters:
import openai
oai = openai.OpenAI()
response = oai.chat.completions.create(
model=prompt.config.get("model", "gpt-4o"),
messages=[{"role": "user", "content": compiled}],
temperature=prompt.config.get("temperature", 0.7),
max_tokens=prompt.config.get("max_tokens", 1024),
)
Step 5: Iterate with Version 2
Improve the prompt based on evaluation results or user feedback. Create a new version:
curl -X POST "https://acme.waxell.dev/api/v1/prompts/support-responder/versions/" \
-H "X-Wax-Key: wax_sk_..." \
-H "Content-Type: application/json" \
-d '{
"content": "You are a helpful, empathetic customer support agent for {{company_name}}.\n\nCustomer question: {{question}}\n\nRelevant knowledge base articles:\n{{context}}\n\nInstructions:\n1. Address the customer by acknowledging their concern\n2. Provide a clear, step-by-step solution\n3. If you are unsure, offer to escalate to a human agent\n4. Keep your response under 200 words",
"config": {
"model": "gpt-4o",
"temperature": 0.5,
"max_tokens": 512
}
}'
Label it as staging for testing:
curl -X POST "https://acme.waxell.dev/api/v1/prompts/support-responder/labels/" \
-H "X-Wax-Key: wax_sk_..." \
-H "Content-Type: application/json" \
-d '{
"label": "staging",
"version": 2
}'
Now your production application continues to use version 1 while you test version 2.
Step 6: Test in the Playground
Test your new prompt version before promoting it:
curl -X POST "https://acme.waxell.dev/api/v1/prompts/playground/" \
-H "X-Wax-Key: wax_sk_..." \
-H "Content-Type: application/json" \
-d '{
"content": "You are a helpful, empathetic customer support agent for {{company_name}}.\n\nCustomer question: {{question}}\n\nRelevant knowledge base articles:\n{{context}}\n\nInstructions:\n1. Address the customer by acknowledging their concern\n2. Provide a clear, step-by-step solution\n3. If you are unsure, offer to escalate to a human agent\n4. Keep your response under 200 words",
"variables": {
"company_name": "Acme Corp",
"question": "How do I reset my password?",
"context": "Password reset is available in Settings > Security > Reset Password."
}
}'
Or fetch the staging version in a test environment:
staging_prompt = client.get_prompt_sync(
name="support-responder",
label="staging",
)
compiled = staging_prompt.compile(
company_name="Acme Corp",
question="How do I reset my password?",
context="Password reset is available in Settings > Security > Reset Password.",
)
print(f"Version: {staging_prompt.version}")
print(compiled)
Step 7: Promote to Production
Once you are satisfied with version 2, update the production label to point to it:
curl -X PUT "https://acme.waxell.dev/api/v1/prompts/support-responder/labels/production/" \
-H "X-Wax-Key: wax_sk_..." \
-H "Content-Type: application/json" \
-d '{
"version": 2
}'
All applications fetching label="production" will now receive version 2 on their next prompt fetch. No code changes. No redeployment.
To roll back, simply reassign the production label back to version 1. Previous versions are never deleted.
Step 8: Track Prompt Usage
LLM calls that use prompts fetched through the SDK are automatically linked to the prompt version via a content hash. This means you can:
- See which prompt version produced which runs in the dashboard
- Compare performance across versions -- Did version 2 get better helpfulness scores than version 1?
- Identify drift -- If someone hardcodes a prompt instead of fetching it, the content hash will not match any known version
Complete Integration Example
Here is a full example combining prompt management with the WaxellContext:
import asyncio
import openai
from waxell_observe import WaxellContext, WaxellObserveClient
client = WaxellObserveClient()
oai = openai.OpenAI()
async def support_agent(question: str, context: str) -> str:
# Fetch production prompt
prompt = client.get_prompt_sync(
name="support-responder",
label="production",
)
compiled = prompt.compile(
company_name="Acme Corp",
question=question,
context=context,
)
async with WaxellContext(
agent_name="support-bot",
inputs={"question": question},
) as ctx:
response = oai.chat.completions.create(
model=prompt.config.get("model", "gpt-4o"),
messages=[{"role": "user", "content": compiled}],
temperature=prompt.config.get("temperature", 0.7),
)
answer = response.choices[0].message.content
usage = response.usage
ctx.record_llm_call(
model=prompt.config.get("model", "gpt-4o"),
tokens_in=usage.prompt_tokens,
tokens_out=usage.completion_tokens,
task="support_response",
prompt_preview=compiled[:200],
response_preview=answer[:200],
)
ctx.set_metadata("prompt_version", prompt.version)
ctx.set_result({"answer": answer})
return answer
async def main():
answer = await support_agent(
question="How do I reset my password?",
context="Password reset: Settings > Security > Reset Password.",
)
print(answer)
if __name__ == "__main__":
asyncio.run(main())
Next Steps
- Track a RAG Pipeline -- Use managed prompts in a RAG pipeline
- Scoring -- Attach quality scores to runs using versioned prompts
- Decorator Pattern -- Use
@observewith managed prompts