Skip to main content

Meta Llama

A comprehensive Meta Llama ecosystem demo exercising two instrumentor paths: the direct meta_llama_instrumentor (wrapping Inference.chat_completion) and the llama_stack_instrumentor (wrapping InferenceResource.chat_completion, completion, embeddings, AgentsResource.create, SessionResource.create). The orchestrator dispatches a direct inference child agent and a Llama Stack child agent, then compares both paths with @reasoning and synthesizes with an auto-instrumented OpenAI call.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

Tool-decorated LLM and embedding operations

Each Llama operation is wrapped with @tool for automatic span creation. Both llm and embedding tool types are used.

@waxell.tool(tool_type="llm")
def llama_chat_completion(query: str, model: str = "meta-llama/Llama-3.2-3B-Instruct") -> dict:
"""Call Meta Llama chat completion via meta_llama_instrumentor path."""
response = _meta_client.inference.chat_completion(
model=model, messages=[{"role": "user", "content": query}],
)
return {
"content": response.completion_message.content,
"model": response.model,
"tokens_in": response.usage.prompt_tokens,
"tokens_out": response.usage.completion_tokens,
}


@waxell.tool(tool_type="embedding")
def stack_embeddings(texts: list, model: str = "meta-llama/Llama-3.2-3B-Instruct") -> dict:
"""Generate embeddings via Llama Stack inference.embeddings."""
response = _stack_client.inference.embeddings(model=model, contents=texts)
return {
"embedding_count": len(response.embeddings),
"dimensions": len(response.embeddings[0]) if response.embeddings else 0,
}

Path comparison with @reasoning

The reasoning decorator compares response characteristics across both Meta Llama and Llama Stack paths.

@waxell.reasoning_dec(step="compare_llama_paths")
def compare_llama_paths(meta_text: str, stack_text: str, completion_text: str) -> dict:
meta_len = len(meta_text)
stack_len = len(stack_text)
completion_len = len(completion_text)
return {
"thought": f"Meta Llama: {meta_len} chars. Stack chat: {stack_len} chars. Stack completion: {completion_len} chars.",
"evidence": [
f"Meta Llama: {meta_len} chars, focuses on model capabilities",
f"Stack chat: {stack_len} chars, focuses on standardized interface",
f"Stack completion: {completion_len} chars, focuses on text generation",
],
"conclusion": "Both paths successfully generated responses. Stack provides additional operations beyond chat.",
}

What this demonstrates

  • @waxell.observe -- parent orchestrator with 2 child agents
  • @waxell.step_dec -- query preprocessing
  • @waxell.decision -- inference path selection (meta_llama/llama_stack/both)
  • @waxell.tool(llm) -- LLM call operations (3 different functions)
  • @waxell.tool(embedding) -- embedding generation
  • @waxell.reasoning_dec -- cross-path comparison
  • waxell.tag() -- provider, ecosystem, and agent role tagging
  • waxell.score() -- path and operations coverage scores
  • Auto-instrumented OpenAI -- synthesis call traced automatically
  • Two instrumentor paths -- meta_llama_instrumentor and llama_stack_instrumentor

Run it

cd dev/waxell-dev
python -m app.demos.meta_llama_agent --dry-run

Source

dev/waxell-dev/app/demos/meta_llama_agent.py