Skip to main content

HuggingFace

A multi-agent summarize-and-explain pipeline using HuggingFace's Inference API with the meta-llama/Llama-3.2-3B-Instruct model. The orchestrator dispatches a summarizer child agent for concise topic summaries and an explainer child agent that evaluates the summary quality before generating a detailed technical explanation.

Environment variables

This example requires HF_TOKEN, WAXELL_API_KEY, and WAXELL_API_URL. Use --dry-run to run without any API keys.

Architecture

Key Code

HuggingFace Inference API integration

The child agents use HuggingFace's text_generation API, which takes a prompt string and returns generated text directly.

@waxell.observe(agent_name="hf-summarizer", workflow_name="hf-summarization", capture_io=True)
async def run_summarizer(query: str, client, *, dry_run=False, waxell_ctx=None) -> dict:
waxell.tag("task", "summarization")
waxell.tag("model", "meta-llama/Llama-3.2-3B-Instruct")

prompt = f"Summarize the following topic in 2-3 sentences: {query}"
summary = client.text_generation(prompt, max_new_tokens=150)

waxell.score("summary_quality", 0.80, comment="HF summary quality")
return {"summary": summary, "model": "meta-llama/Llama-3.2-3B-Instruct"}

Summary quality assessment before explanation

The explainer evaluates the summary's quality before generating a detailed explanation, checking for key technical terms and conciseness.

@waxell.reasoning_dec(step="assess_summary_quality")
def assess_summary_quality(summary: str) -> dict:
word_count = len(summary.split())
has_key_terms = any(w in summary.lower() for w in ["mechanism", "layer", "weight", "query", "key", "value"])
is_concise = 20 < word_count < 100
quality = 0.5 + (0.2 if has_key_terms else 0) + (0.15 if is_concise else 0)
return {
"word_count": word_count,
"has_key_terms": has_key_terms,
"quality_score": round(min(quality, 1.0), 2),
}

What this demonstrates

  • @waxell.observe -- parent orchestrator with 2 child agents
  • @waxell.step_dec -- query preprocessing with technical topic detection
  • @waxell.decision -- model size selection (small, medium, large)
  • @waxell.reasoning_dec -- summary quality evaluation before next step
  • waxell.tag() -- task and model tagging
  • waxell.score() -- summary and explanation quality scores
  • waxell.metadata() -- model hub and model metadata
  • HuggingFace Inference API -- text_generation with max_new_tokens

Run it

cd dev/waxell-dev
python -m app.demos.huggingface_agent --dry-run

Source

dev/waxell-dev/app/demos/huggingface_agent.py