Logan Kelly

When Your AI Agent Can Find Zero-Days, Who Decides What It Does Next?

When Your AI Agent Can Find Zero-Days, Who Decides What It Does Next?

Google confirmed the first AI-built zero-day on May 11, 2026. When AI agents autonomously discover vulnerabilities, governance policy — not observability — is what controls what happens next.

Waxell blog cover: AI agent zero-day governance — governing autonomous vulnerability discovery

On May 11, 2026, Google's Threat Intelligence Group published a finding that reframed the conversation about AI agents and security: according to Bloomberg and SecurityWeek, a threat actor had used AI to develop a working zero-day exploit — a two-factor authentication (2FA) bypass — with plans to deploy it in a mass exploitation event. Google detected it before it could be used.

The defensive side of this story matters. But the question it raises for any team running AI agents is more uncomfortable: if attackers can now instruct AI to autonomously find and weaponize unknown vulnerabilities, what does that same capability look like inside your own stack — and what governance do you have in place for when your AI agent discovers something it wasn't supposed to find?

AI agent security governance is the set of policies, enforcement mechanisms, and boundary definitions that determine what systems an AI agent is authorized to interact with, what actions it may take autonomously, and what conditions trigger immediate termination of a session. In the context of autonomous security research, it is the difference between an AI agent that identifies a vulnerability in a scoped target and one that continues probing adjacent systems because no policy told it to stop. Governance is distinct from observability: observability records what the agent did; governance determines what the agent is permitted to do before it acts.

What did Google actually detect, and why does it matter for enterprise AI?

Google's Threat Intelligence Group (GTIG) confirmed in May 2026 that a threat actor used generative AI to develop a working zero-day exploit targeting two-factor authentication — the first publicly documented case of AI being used to discover and weaponize a previously unknown vulnerability for offensive use. GTIG's chief analyst John Hultquist described it as "a taste of what's to come" (per a New York Times interview) and "the tip of the iceberg."

This is not the same story as Big Sleep, Google's own AI agent developed by DeepMind and Project Zero, which has been autonomously hunting for vulnerabilities in third-party software since late 2024 — including finding a real-world SQLite flaw that would otherwise have remained unknown. Big Sleep operates defensively: find the bug first, disclose it, get it patched. The May 2026 GTIG finding is about the adversarial mirror of that capability: attackers pointing the same kind of autonomous reasoning at production systems to find exploitable weaknesses.

Both stories are, at their core, about the same underlying shift: AI agents can now do autonomously what took skilled human security researchers days or weeks. The acceleration cuts both ways.

For enterprise teams, the relevant question is not whether your organization will be attacked by AI-built exploits. The relevant question is whether the AI agents you've already deployed — your automated code analyzers, your vulnerability scanners, your documentation crawlers with broad tool access — have governance boundaries that prevent them from doing something analogous in a direction you didn't intend.

Is the same capability your security agent uses also running in your production stack without you knowing?

Most organizations running AI agents in 2026 are not running AI security agents. They're running agents that automate support tickets, synthesize documentation, draft code, and query internal databases. Those agents were not designed to discover vulnerabilities.

But many of them have the access required to do so inadvertently. An agent with read access to source code repositories, the ability to make API calls, and a sufficiently broad system prompt is structurally capable of identifying security weaknesses — not intentionally, but as a side effect of doing the task it was given. The capability doesn't require intent.

This is the specific failure mode that governance is designed to prevent. Not the dramatic scenario of a rogue AI agent deliberately exploiting production systems. The mundane scenario: an agent doing legitimate work that, because its signal-domain boundary was never defined, wanders into systems or actions its operators never authorized.

A joint April 2026 advisory from NSA, CISA, FBI, and Five Eyes partner agencies on agentic AI adoption made exactly this point: governance controls for AI agents should be harmonized with Zero Trust principles, meaning no agent should be granted permissions beyond what it needs for its defined task, and every action against sensitive systems should be validated against a policy before execution — not logged after the fact.

The difference between those two framings — validated before versus logged after — is the difference between governance and observability. Observability tells you the agent queried a system it shouldn't have. Governance stops it from completing the query.

What does governing an AI security agent actually require?

The answer depends on how you categorize the agent's intended scope, but three policy types apply regardless:

Signal-domain boundaries define the systems an agent is authorized to interact with. For a code analysis agent, this might be a specific repository or a scoped set of API endpoints. For a security research agent, it might be a sandboxed environment with no production access. The boundary is not enforced by the agent's instructions — instructions can be overridden by prompt injection, misunderstood by the model, or simply ignored in edge cases. The boundary is enforced by a governance layer that sits above the agent and validates tool calls before they execute.

Control policies determine which actions require human approval before proceeding. An agent that identifies a potential vulnerability might be authorized to log the finding autonomously, but not to attempt to verify it by probing the affected system further. A control policy catches the second action — the verification — and routes it to a human approver before allowing it to proceed. This is human-in-the-loop governance applied to a specific class of high-risk actions rather than to every session interaction.

Kill policies define the conditions under which a session terminates immediately, without waiting for human review. An agent that begins making API calls to systems outside its authorized scope, or that exceeds a defined threshold of external probe attempts, should not wait for a human to notice and intervene. A kill policy triggers automatic termination when the defined condition is met.

OWASP's Top 10 for Agentic Applications (2026) identifies "tool misuse" and "rogue agents" as two of the ten primary risk categories for deployed AI agents — both of which describe scenarios where an agent's legitimate capability is exercised outside its authorized scope. Tool misuse, in OWASP's framing, is not about malicious intent: it's about capability without constraint.

How Waxell handles this

Waxell Runtime applies pre-execution governance to AI agents across any framework without requiring a rebuild. Before an agent executes a tool call — before it makes an API request, queries an external system, or returns an output — Waxell evaluates the action against the active policy set. If the action violates a Control policy (unauthorized system access), a Kill policy (defined termination condition), or a signal-domain boundary (scope constraint), the action is blocked. The agent never completes the call.

The enforcement happens in sub-millisecond time, at the governance layer, not in the agent's own instruction set. That distinction matters: instructions are soft constraints. Governance policies are hard constraints, enforced externally regardless of what the model decides to do next.

Waxell Runtime supports 26 policy categories spanning cost, content, control, quality, and kill conditions. Two specific policy types are directly relevant to the scenario the Google GTIG report describes:

  • Signal-domain policies define the authorized scope of external system interaction. An agent operating on source code repositories cannot make API calls to production infrastructure; an agent doing documentation synthesis cannot query authentication endpoints.

  • Kill policies define automatic termination conditions. An agent that makes a threshold number of probe attempts to systems outside its defined scope triggers an automatic session kill — no human review required, no waiting for the next log scrape.

Waxell installs with two lines of init and supports 200+ agent libraries. No architecture changes. No rebuilds. The governance layer is external to the agent, which is the only configuration in which governance is durable — if it lives inside the agent, the agent can ignore it.

To add pre-execution enforcement to your agent stack before the next autonomous security finding surprises your team: request early access to Waxell Runtime.

Frequently Asked Questions

What is the governance challenge posed by AI-generated zero-day exploits?
The challenge is not primarily defensive — it's internal. If attackers can now use AI agents to autonomously discover and weaponize unknown vulnerabilities, the same autonomous discovery capability exists in any AI agent with broad system access. The governance question for enterprise teams is: what policies prevent your legitimate AI agents from probing systems outside their authorized scope, either intentionally or as a side effect of doing their assigned task? Without explicit signal-domain boundaries and control policies enforced by a governance layer, the answer is often "nothing."

Is observability enough to govern AI security agents?
No. Observability records what an agent did after it did it. Governance enforces what an agent is permitted to do before it acts. For AI agents with access to sensitive systems, post-hoc logging does not constitute control — it constitutes a forensics capability for after an incident. Pre-execution policy enforcement, which blocks unauthorized actions before they complete, is the correct governance mechanism.

What is a signal-domain boundary for an AI agent?
A signal-domain boundary is a governance-layer definition of the external systems and data sources an agent is authorized to interact with. It is distinct from the agent's system prompt or tool list: those are soft constraints that the model interprets. A signal-domain boundary is enforced externally, before tool calls execute, regardless of what the model decided to do. An agent authorized to query a documentation database cannot make calls to production APIs if a signal-domain policy prohibits it, regardless of what instructions it received.

What is the NSA/CISA guidance on agentic AI adoption?
In April 2026, NSA, CISA, FBI, and Five Eyes partner agencies jointly published "Careful Adoption of Agentic AI Services," which recommended aligning AI agent governance controls with Zero Trust principles: agents should be granted permissions only for their defined task scope, and all actions against sensitive systems should be validated against a policy before execution. The guidance reflects the same principle as pre-execution governance: logging what agents do is not a substitute for controlling what they are permitted to do.

How does Waxell Runtime differ from agent observability platforms like LangSmith or Arize?
LangSmith and Arize are observability platforms: they record what agents do, surface traces, and help diagnose failures after they occur. Waxell Runtime enforces governance policies before actions execute. The distinction is the same as the difference between logging a file write and a filesystem permission: one records the action, the other prevents it if unauthorized. Waxell Runtime's 26 policy categories cover cost, content, control, quality, and kill conditions, enforced at sub-millisecond latency with no changes to your agent's existing architecture.

What triggered Google's detection of the first AI-built zero-day?
According to Google's Threat Intelligence Group (GTIG), threat actors in May 2026 used generative AI to develop a working zero-day exploit targeting two-factor authentication, planning a mass exploitation event. Google detected the exploit before it could be deployed through threat intelligence work — finding artifacts in the exploit code that were inconsistent with human developers, including highly annotated Python code and a hallucinated CVSS score. (Big Sleep, Google's AI vulnerability-hunting agent, is a separate capability that operates proactively to find bugs in software before attackers do; it was not the detection mechanism in the May 2026 incident.)

Sources

Waxell

Waxell provides observability and governance for AI agents in production. Bring your own framework.

© 2026 Waxell. All rights reserved.

Patent Pending.

Waxell

Waxell provides observability and governance for AI agents in production. Bring your own framework.

© 2026 Waxell. All rights reserved.

Patent Pending.

Waxell

Waxell provides observability and governance for AI agents in production. Bring your own framework.

© 2026 Waxell. All rights reserved.

Patent Pending.