HuggingFace

A multi-agent summarize-and-explain pipeline using HuggingFace's Inference API with the meta-llama/Llama-3.2-3B-Instruct model. The orchestrator dispatches a summarizer child agent for concise topic summaries and an explainer child agent that evaluates the summary quality before generating a detailed technical explanation.

Environment variables

This example requires HF_TOKEN, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

HuggingFace Inference API integration

The child agents use HuggingFace's text_generation API, which takes a prompt string and returns generated text directly.

@waxell.observe(agent_name="hf-summarizer", workflow_name="hf-summarization", capture_io=True)
async def run_summarizer(query: str, client, *, dry_run=False, waxell_ctx=None) -> dict:
    waxell.tag("task", "summarization")
    waxell.tag("model", "meta-llama/Llama-3.2-3B-Instruct")

    prompt = f"Summarize the following topic in 2-3 sentences: {query}"
    summary = client.text_generation(prompt, max_new_tokens=150)

    waxell.score("summary_quality", 0.80, comment="HF summary quality")
    return {"summary": summary, "model": "meta-llama/Llama-3.2-3B-Instruct"}

Summary quality assessment before explanation

The explainer evaluates the summary's quality before generating a detailed explanation, checking for key technical terms and conciseness.

@waxell.reasoning_dec(step="assess_summary_quality")
def assess_summary_quality(summary: str) -> dict:
    word_count = len(summary.split())
    has_key_terms = any(w in summary.lower() for w in ["mechanism", "layer", "weight", "query", "key", "value"])
    is_concise = 20 < word_count < 100
    quality = 0.5 + (0.2 if has_key_terms else 0) + (0.15 if is_concise else 0)
    return {
        "word_count": word_count,
        "has_key_terms": has_key_terms,
        "quality_score": round(min(quality, 1.0), 2),
    }

What this demonstrates

@waxell.observe -- parent orchestrator with 2 child agents
@waxell.step_dec -- query preprocessing with technical topic detection
@waxell.decision -- model size selection (small, medium, large)
@waxell.reasoning_dec -- summary quality evaluation before next step
waxell.tag() -- task and model tagging
waxell.score() -- summary and explanation quality scores
waxell.metadata() -- model hub and model metadata
HuggingFace Inference API -- text_generation with max_new_tokens

Run it

cd dev/waxell-dev
python -m app.demos.huggingface_agent --dry-run

Source

dev/waxell-dev/app/demos/huggingface_agent.py

Architecture​

Key Code​

HuggingFace Inference API integration​

Summary quality assessment before explanation​

What this demonstrates​

Run it​

Source​