Meta Llama
A comprehensive Meta Llama ecosystem demo exercising two instrumentor paths: the direct meta_llama_instrumentor (wrapping Inference.chat_completion) and the llama_stack_instrumentor (wrapping InferenceResource.chat_completion, completion, embeddings, AgentsResource.create, SessionResource.create). The orchestrator dispatches a direct inference child agent and a Llama Stack child agent, then compares both paths with @reasoning and synthesizes with an auto-instrumented OpenAI call.
This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.
Architecture
Key Code
Tool-decorated LLM and embedding operations
Each Llama operation is wrapped with @tool for automatic span creation. Both llm and embedding tool types are used.
@waxell.tool(tool_type="llm")
def llama_chat_completion(query: str, model: str = "meta-llama/Llama-3.2-3B-Instruct") -> dict:
"""Call Meta Llama chat completion via meta_llama_instrumentor path."""
response = _meta_client.inference.chat_completion(
model=model, messages=[{"role": "user", "content": query}],
)
return {
"content": response.completion_message.content,
"model": response.model,
"tokens_in": response.usage.prompt_tokens,
"tokens_out": response.usage.completion_tokens,
}
@waxell.tool(tool_type="embedding")
def stack_embeddings(texts: list, model: str = "meta-llama/Llama-3.2-3B-Instruct") -> dict:
"""Generate embeddings via Llama Stack inference.embeddings."""
response = _stack_client.inference.embeddings(model=model, contents=texts)
return {
"embedding_count": len(response.embeddings),
"dimensions": len(response.embeddings[0]) if response.embeddings else 0,
}
Path comparison with @reasoning
The reasoning decorator compares response characteristics across both Meta Llama and Llama Stack paths.
@waxell.reasoning_dec(step="compare_llama_paths")
def compare_llama_paths(meta_text: str, stack_text: str, completion_text: str) -> dict:
meta_len = len(meta_text)
stack_len = len(stack_text)
completion_len = len(completion_text)
return {
"thought": f"Meta Llama: {meta_len} chars. Stack chat: {stack_len} chars. Stack completion: {completion_len} chars.",
"evidence": [
f"Meta Llama: {meta_len} chars, focuses on model capabilities",
f"Stack chat: {stack_len} chars, focuses on standardized interface",
f"Stack completion: {completion_len} chars, focuses on text generation",
],
"conclusion": "Both paths successfully generated responses. Stack provides additional operations beyond chat.",
}
What this demonstrates
@waxell.observe-- parent orchestrator with 2 child agents@waxell.step_dec-- query preprocessing@waxell.decision-- inference path selection (meta_llama/llama_stack/both)@waxell.tool(llm)-- LLM call operations (3 different functions)@waxell.tool(embedding)-- embedding generation@waxell.reasoning_dec-- cross-path comparisonwaxell.tag()-- provider, ecosystem, and agent role taggingwaxell.score()-- path and operations coverage scores- Auto-instrumented OpenAI -- synthesis call traced automatically
- Two instrumentor paths --
meta_llama_instrumentorandllama_stack_instrumentor
Run it
cd dev/waxell-dev
python -m app.demos.meta_llama_agent --dry-run