Logan Kelly
Gartner: 40% of enterprise apps run AI agents by end of 2026. The teams failing ask one question too late: who controls them in production?

In March 2026, Simon Willison published "Agentic Engineering Patterns" — a guide to getting the best results out of coding agents like Claude Code and Codex. The Hacker News discussion surfaced quickly. One practitioner comment captured something that applies far beyond coding agents: "Test harness is everything. If you don't have a way of validating the work, the loop will go stray."
The instinct is correct. The diagnosis is incomplete.
The problem — runaway agent loops, unchecked outputs, agents exceeding authorized scope — isn't fundamentally about test harnesses. It's about where control lives in the architecture. In most agentic systems today, control lives entirely inside the agent code. The engineer who builds the agent is the same person who defines what it can do, who it can call, and when it stops. When that agent reaches production, the operator who runs it has no independent lever to pull — not without a new deployment.
That's not a testing gap. It's a structural gap. And with Gartner projecting that 40% of enterprise applications will integrate task-specific AI agents by end of 2026 — up from less than 5% in 2025 — it's a structural gap at scale.
The Single-Layer Problem in Agentic Architecture
Most agentic systems today have one authority layer: the developer. The agent's behavior, constraints, allowed tools, cost limits, and exit conditions are encoded in the agent's own logic. Guardrails live in the prompt or in code. Approval gates are hardwired into the workflow.
This is a natural consequence of how agents get built. The developer understands the task, designs the tool calls, and adds the safety checks they can anticipate. It works well in development and survives early production.
It breaks down when organizations scale to multiple agents, or when operators — compliance teams, security teams, product owners — need to update constraints without triggering an engineering cycle.
A realistic scenario: a financial services firm runs an AI agent that queries customer data and drafts communications. The developer built a guardrail limiting external API calls to an approved vendor list. Three months post-deployment, the compliance team changes the approved vendors. To update the whitelist, engineering modifies the agent code, tests the change, and deploys it. The agent is offline during that window. Every future policy update follows the same path.
This is the single-authority-layer architecture in practice: every policy change is a code change. Code changes have lead times, review cycles, and deployment risk. In a world where compliance requirements evolve continuously, this isn't a sustainable design.
Developer Authority and Operator Authority Are Different Things
The structural fix isn't to make governance "part of the developer's job." The developer is already responsible for what the agent can do — the task logic, tool integrations, reasoning loop, output format. That's developer authority: the decisions that determine an agent's capabilities.
Operator authority is different. It covers what the agent is allowed to do in a specific deployment context: which data it can access, how much it can spend per session, when a human must approve an action, what output patterns are blocked. These constraints are deployment-specific, compliance-driven, and owned by business and operations teams — not engineering teams.
The architectural principle that resolves this is familiar from software engineering generally: separation of concerns. Developer concerns belong in the agent. Operator concerns belong in a separate layer, above the agent, that can be updated independently.
In agentic governance architecture, this separate layer is the governance plane — a control surface that sits between agents and the systems they act on, enforcing operator-defined policies without requiring changes to agent code.
Why Observability Doesn't Solve This
The current market response to production agent risk has been observability. Arize, LangSmith, Helicone, and Braintrust provide visibility into what agents are doing: traces, evaluation scores, token usage, response latency. These are valuable tools. They let operators see what's happening.
Seeing isn't controlling.
An observability platform can tell you that an agent exceeded its token budget by 3,000 tokens on Tuesday. It cannot prevent that from happening again on Wednesday. An evaluation framework can score an output as low-confidence. It cannot require human approval before the agent acts on that low-confidence reasoning.
This isn't a criticism of observability tooling — it's a note about architectural scope. The governance plane is an execution layer, not an observation layer. It intercepts agent actions before they reach production systems, applies operator-defined policies, and either permits, modifies, or blocks the action in real time. That's a distinct architectural component. The observability vendors don't provide it and weren't designed to.
The gap matters especially because Arize's own agent architecture guidance — one of the more thorough treatments of the topic available — explicitly frames governance as something developers embed in agent code: "Incorporating domain and business heuristics into the agent's guidance system" and "being explicit about action intentions." Both of these are developer-layer solutions. Neither provides operator-layer control that survives a code freeze.
The Two-Layer Architecture in Practice
A well-structured agentic system has a clear boundary between developer scope and operator scope.
Developer layer: The agent's task, available tools, reasoning loop, internal logic, and output format. These live in agent code or configuration. Changing them requires an engineering cycle — and that's appropriate, because they define capability.
Operator layer: Which data sources the agent can access, spending limits, which action categories require human approval, what output content is blocked, which versions are active in each environment. These live in the governance plane, not in agent code. Changing them does not require a deployment — and that's the point.
This separation has an important consequence for external agents. Third-party integrations, vendor automations, and MCP-native agents arrive without embedded governance. The operator has no access to code they didn't write. A governance plane that operates at the protocol level — intercepting calls before they reach production systems — is the only practical approach for governing agents the operator didn't build.
Waxell Connect addresses exactly this scenario: it governs the agents an operator didn't build — vendor agents, third-party integrations, MCP-native agents — with no SDK and no code changes required on the agent side. The operator defines policies in the governance plane; Connect enforces them before actions execute.
Waxell Runtime enforces the operator layer for agents teams do build, applying 26 policy categories across inputs, outputs, tool calls, and execution state — without modifying agent code.
The Registry Is What Makes This Systematic
Neither authority layer works without a system of record for what's running.
Developer authority requires knowing which version of which agent is deployed. Operator authority requires knowing which agent should follow which policies. Without an agent registry — a structured catalog that maps agent identity, version, assigned policies, and deployment context — the governance plane enforces nothing consistently. Policies exist but aren't applied reliably because there's no authoritative mapping of which agents are active and which operator rules govern each one.
This gap becomes visible at scale. At five agents, teams manage it manually or with a shared document. At fifty agents across multiple departments, manual tracking breaks down. An agent gets updated without triggering a policy review. A new deployment goes live before compliance has reviewed the data access scope. Incidents get attributed to the wrong version because nobody knows which version was running when.
The registry is what makes operator authority systematic rather than episodic.
Controlled Data Interfaces Close the Loop
The developer/operator separation surfaces a third architectural question: data access design.
Most agentic systems give agents relatively open access to data — a database connection with broad permissions, a file system, an internal API that returns far more than any given task requires. This works at the prototype stage and becomes a liability when agents run autonomously in production.
The Signal and Domain pattern addresses this at the architecture level. The Signal layer is a controlled read interface — agents receive validated, typed representations of data without direct access to raw production systems. The Domain layer is the write boundary — agents express intent through a structured interface rather than directly mutating state. This isn't only a security measure; it's an architectural decision that makes operator authority enforceable.
When agents read from controlled interfaces and write through validated boundaries, the governance plane has clear interception points. When agents have raw database access, governance has to be applied inconsistently at the application layer — which means it gets missed.
How Waxell Implements the Two-Layer Model
Waxell's architecture directly reflects the developer/operator separation:
Waxell Observe (2 lines of code to initialize, 200+ libraries auto-instrumented) instruments the developer layer — giving engineers visibility into actual agent behavior, traces, and output quality, so developer-side improvements are grounded in production data rather than assumptions.
Waxell Runtime enforces the operator layer at execution time — 26 policy categories applied across inputs, outputs, tool calls, and execution state, without requiring rebuilds of the agent itself.
Waxell Connect extends operator authority to agents the team didn't build — vendor integrations, MCP-native agents, third-party automations — governed through a protocol-level control plane with no SDK and no agent code changes required.
The agent registry ties all three together: a persistent system of record that links each agent to its identity, version history, and active policies, so operator authority is systematic rather than dependent on someone remembering to update a spreadsheet.
The resulting architecture separates what agents are capable of (developer authority, in agent code) from what agents are allowed to do in a given deployment (operator authority, in the governance plane), with controlled data interfaces at the boundary and a registry as the connective tissue.
For teams scaling from five agents to fifty, this separation isn't an optimization. It's the only architecture that makes the jump without every policy change becoming a deployment event.
Get access to Waxell at waxell.ai/get-access.
FAQ
What is the governance plane in agentic architecture?
The governance plane is a control layer above agent code that enforces operator-defined policies at execution time. It intercepts agent actions — tool calls, data requests, outputs — before they reach production systems and applies rules that can be changed without modifying agent code. It is architecturally separate from both the agent logic and the observability stack.
Why shouldn't governance logic live inside the agent?
When governance lives in agent code, it can only be changed through an engineering deployment. This makes compliance updates, policy changes, and incident responses dependent on engineering cycle times — which can be days or weeks. It also means agents an operator didn't build — vendor agents, third-party integrations — cannot be governed at all. A separate governance plane solves both problems.
What is the difference between developer authority and operator authority in agentic systems?
Developer authority covers what an agent is capable of doing: its task logic, tool integrations, reasoning loop, and output format. Operator authority covers what an agent is permitted to do in a specific deployment: data access scope, spending limits, human approval requirements, and output restrictions. The two should be managed independently, in separate architectural layers with separate update cycles.
How does an agent registry support the two-layer model?
The registry maps each agent to its identity, version, deployment context, and assigned policies. Without it, the governance plane can't apply policies consistently — there's no authoritative record of which agents are active and which operator rules apply to each. At scale, this becomes an incident in waiting.
What is the Signal and Domain pattern and how does it fit this architecture?
Signal and Domain is a data interface design pattern for agentic systems. The Signal layer gives agents validated, typed reads from production data without raw system access. The Domain layer mediates writes — agents express intent through a structured interface rather than directly modifying production state. Together, they give the governance plane clear interception points and reduce the blast radius of ungoverned agent actions.
Does the two-layer architecture apply to agents built on external frameworks?
Yes, and it matters most there. Agents built on LangGraph, CrewAI, or vendor platforms arrive with no embedded governance the operator can update. The governance plane operates independently of the framework — intercepting calls at the protocol level before they reach production systems — so operator authority doesn't depend on framework-level controls or code access.
Sources
Simon Willison, "Agentic Engineering Patterns" (2026) — cited for March 2026 publication and the practitioner hook. URL: https://simonwillison.net/guides/agentic-engineering-patterns/
Hacker News, "Agentic Engineering Patterns" thread (March 2026, item 47243272) — practitioner comment on test harness and loop control. URL: https://news.ycombinator.com/item?id=47243272
Gartner 2025 forecast: 40% of enterprise apps to integrate AI agents by end of 2026 — widely cited; secondary reference via NexAgile and IBM's 2026 AI Agent guide. Original Gartner report not directly accessed. URL: https://nextagile.ai/blogs/gen-ai/agentic-ai-architecture-framework-enterprises/
Arize AI, "Agent Architectures" (August 2024, updated) — cited as competitive reference; confirms observability-layer framing of agent governance. URL: https://arize.com/blog-course/llm-agent-how-to-set-up/agent-architecture/
Agentic Governance, Explained




