Meta Llama

A comprehensive Meta Llama ecosystem demo exercising two instrumentor paths: the direct meta_llama_instrumentor (wrapping Inference.chat_completion) and the llama_stack_instrumentor (wrapping InferenceResource.chat_completion, completion, embeddings, AgentsResource.create, SessionResource.create). The orchestrator dispatches a direct inference child agent and a Llama Stack child agent, then compares both paths with @reasoning and synthesizes with an auto-instrumented OpenAI call.

Environment variables

This example requires OPENAI_API_KEY, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

Tool-decorated LLM and embedding operations

Each Llama operation is wrapped with @tool for automatic span creation. Both llm and embedding tool types are used.

@waxell.tool(tool_type="llm")
def llama_chat_completion(query: str, model: str = "meta-llama/Llama-3.2-3B-Instruct") -> dict:
    """Call Meta Llama chat completion via meta_llama_instrumentor path."""
    response = _meta_client.inference.chat_completion(
        model=model, messages=[{"role": "user", "content": query}],
    )
    return {
        "content": response.completion_message.content,
        "model": response.model,
        "tokens_in": response.usage.prompt_tokens,
        "tokens_out": response.usage.completion_tokens,
    }


@waxell.tool(tool_type="embedding")
def stack_embeddings(texts: list, model: str = "meta-llama/Llama-3.2-3B-Instruct") -> dict:
    """Generate embeddings via Llama Stack inference.embeddings."""
    response = _stack_client.inference.embeddings(model=model, contents=texts)
    return {
        "embedding_count": len(response.embeddings),
        "dimensions": len(response.embeddings[0]) if response.embeddings else 0,
    }

Path comparison with `@reasoning`

The reasoning decorator compares response characteristics across both Meta Llama and Llama Stack paths.

@waxell.reasoning_dec(step="compare_llama_paths")
def compare_llama_paths(meta_text: str, stack_text: str, completion_text: str) -> dict:
    meta_len = len(meta_text)
    stack_len = len(stack_text)
    completion_len = len(completion_text)
    return {
        "thought": f"Meta Llama: {meta_len} chars. Stack chat: {stack_len} chars. Stack completion: {completion_len} chars.",
        "evidence": [
            f"Meta Llama: {meta_len} chars, focuses on model capabilities",
            f"Stack chat: {stack_len} chars, focuses on standardized interface",
            f"Stack completion: {completion_len} chars, focuses on text generation",
        ],
        "conclusion": "Both paths successfully generated responses. Stack provides additional operations beyond chat.",
    }

What this demonstrates

@waxell.observe -- parent orchestrator with 2 child agents
@waxell.step_dec -- query preprocessing
@waxell.decision -- inference path selection (meta_llama/llama_stack/both)
@waxell.tool(llm) -- LLM call operations (3 different functions)
@waxell.tool(embedding) -- embedding generation
@waxell.reasoning_dec -- cross-path comparison
waxell.tag() -- provider, ecosystem, and agent role tagging
waxell.score() -- path and operations coverage scores
Auto-instrumented OpenAI -- synthesis call traced automatically
Two instrumentor paths -- meta_llama_instrumentor and llama_stack_instrumentor

Run it

cd dev/waxell-dev
python -m app.demos.meta_llama_agent --dry-run

Source

dev/waxell-dev/app/demos/meta_llama_agent.py

Architecture​

Key Code​

Tool-decorated LLM and embedding operations​

Path comparison with @reasoning​

What this demonstrates​

Run it​

Source​