← All posts

Why Your AI Agent Needs a Firewall

Claude Code runs as you. One malicious prompt is all it takes to wipe your files or exfiltrate your secrets. Here's why OS-layer interception is the only real defense.

· 3 min read

When you run Claude Code with --dangerously-skip-permissions, you’re handing an AI system full, unsupervised access to everything your user account can touch. Your SSH keys, your .env files, your AWS credentials, your codebase. That’s not a bug or an edge case. That’s the design. Claude Code is built to be autonomous.

The question is what happens when that autonomy gets pointed somewhere it shouldn’t.

The threat model

AI coding agents follow instructions. They parse prompts, form plans, and execute them, including shell commands, file operations, and network requests. The instructions they follow come from many places: your chat input, repository content, documentation, GitHub issues, web pages they fetch while working.

Any of those surfaces can carry a malicious payload.

In early 2026, Adnan Khan documented what got called the Clinejection attack against the Cline repository. Cline had deployed a Claude-powered bot to triage GitHub issues automatically. A threat actor opened an issue with a title containing embedded prompt injection instructions. The bot processed the issue, followed the injected instructions, and poisoned a GitHub Actions cache. That gave the attacker npm publishing credentials, which they used to push a malicious cline@2.3.0 to roughly 4,000 developer machines.

The attacker never touched a developer machine directly. They filed a GitHub issue and let the agent do the rest.

Prompt injection is the same category as SQL injection or XSS. Untrusted data crossing a trust boundary and being interpreted as code. The patterns are old. The context is new.

Why sandboxing doesn’t solve it

The obvious answer is “just sandbox it.” Container, VM, restricted filesystem. That works for some cases but falls apart wherever the agent needs real access, which is usually the whole point. You can’t sandbox your way out of legitimate permissions.

You could also rely on the model refusing dangerous commands. Claude does sometimes. But model-level defenses are probabilistic. They get bypassed by rephrasing, by multi-step instructions, by framing the request in a context the model considers safe. A firewall that works most of the time isn’t a firewall.

OS-layer interception

The approach we took with Rampart is to intercept at the OS layer, below the application, before execution happens. When an agent attempts a network request, a subprocess call, or a write to a sensitive path, the policy engine sees it regardless of how the agent got there. If the action matches a rule, it gets blocked and logged. The agent has no visibility into this and can’t reason around it.

A concrete example: if your policy restricts outbound HTTP to your known API endpoints, and an injected instruction tries to exfiltrate your .env to an external server, the call fails. The agent sees a network error. Nothing leaves.

This doesn’t prevent the injection itself. The agent still gets manipulated, still forms the intent. What it prevents is the intent becoming an action. That’s the job.

What this doesn’t cover

There are categories of attack that OS-layer enforcement doesn’t address. If the agent is instructed to exfiltrate data through an allowed channel, like committing secrets to a repo or emailing them through an authorized integration, the firewall won’t catch it. Allowed is allowed.

The right approach is layered: carefully scoped permissions, OS-level enforcement, and a human in the loop for high-consequence actions. None of these is sufficient alone. The combination makes it substantially harder to do damage without being noticed.

We’re still early in building security tooling for AI agents. This is the first version of these problems, not the last.