The Attack Surface of Agentic Systems

The security community spent 2025 realizing that AI systems aren’t just new software. They’re a new category of attack surface. An application that reads instructions from untrusted sources and executes them against trusted resources has a fundamentally different threat model than a traditional API or CLI tool.

This is a working taxonomy of the main ways agents get compromised. Not exhaustive, but useful.

Prompt injection

The foundational problem. A prompt injection happens when an agent processes content from an untrusted source and that content changes the agent’s behavior in ways the operator didn’t intend.

The straightforward version is a user typing malicious instructions directly. More interesting is indirect injection: the agent fetches a web page, reads a document, processes an API response, and that content contains embedded instructions. The agent can’t reliably distinguish “data I was asked to process” from “instructions I should follow.”

The Clinejection attack (named after the Cline agent) is a clean example. A threat actor creates a GitHub issue with a title containing hidden instructions. When the agent parses the issue as part of a coding task, it executes them. The attacker never touched the target machine. They just got text in front of the model at the right moment.

This is why OS-layer enforcement matters. You can’t fix prompt injection at the model layer reliably. You can make dangerous actions impossible to execute even if the agent gets manipulated into attempting them.

Tool poisoning

Agents operate through tools, functions that take actions in the world. Tool poisoning attacks target the tools themselves rather than the agent’s reasoning.

An MCP server that looks legitimate but returns malicious tool descriptions is the current version of this. The agent trusts the tool’s schema. The schema lies about what the tool does, or what data it returns, or what side effects it has. By the time the agent notices something is wrong, if it notices at all, the damage is done.

This is going to get worse as agent ecosystems grow and MCP servers proliferate. The mental model of “trust the tool registry” is the same mistake we made with npm, and we’ve been paying for that one for a decade.

Supply chain attacks on agent workflows

Agents are increasingly composed. One agent orchestrates others. Shared memory stores let agents communicate across tasks. Tool results from one step feed into the next. Each of those composition points is an injection surface.

A compromised agent in an orchestration chain can poison every downstream step. Shared memory that one agent writes and another reads is an indirect injection vector. This is what makes supply chain attacks on agent infrastructure particularly effective: you don’t need to compromise the agent itself. You compromise something the agent trusts.

The LiteLLM backdoor in March 2026 was a clean example of this. The malware ran before any user code and harvested credentials silently in the background. The agents running in that environment weren’t compromised. The environment they ran in was.

Credential and secret exposure

Agents need credentials to do useful work. Those credentials end up in context windows, tool call parameters, log files, and sometimes in plaintext in memory during execution. Any of those paths can become an exposure point.

The useful thing about canary tokens here is that they reveal when agent execution is touching credentials unexpectedly, which often means something is wrong before anything is obviously broken. A well-placed fake AWS key in ~/.aws/credentials tells you immediately if an agent (or malware running in the same environment) tried to use it.

Capability drift

Agents authorized for narrow tasks sometimes accumulate broader permissions through legitimate-looking workflows. Read access becomes write access for a specific task. A restricted API key gets replaced with an admin key temporarily. The scope of what the agent can do expands incrementally until it’s much larger than what was originally intended.

This is a policy failure more than a technical one, but it’s extremely common. The fix is explicit, enumerated policies enforced at a layer the agent can’t modify, enforced consistently rather than when someone remembers to check.

The field is moving fast and this taxonomy will look incomplete in a year. What matters is the underlying shift in mental model: in agentic systems, the attack surface is the application itself. Defense has to happen at multiple layers simultaneously, not at any single one.