A reported vulnerability in Context7's MCP server exposes a pattern that is going to repeat across the agent ecosystem. When developer agents consume context from MCP servers, that context becomes an execution channel. Documentation, tickets, and knowledge bases become supply chain attack surfaces. The fix isn't better prompting. It's enforcement at the tool-call layer.

This matters because MCP adoption is accelerating. Cursor, Claude Code, Windsurf, and other developer agents use MCP to pull context from external sources. Every one of them inherits the same vulnerability class when the context they trust turns out to be adversarial.

What Happened

Researchers reported that Context7's MCP server could deliver hidden or malicious instructions through metadata fields such as "custom rules" for a library. Developer agents fetch that context through MCP and execute actions on the developer's machine using the permissions already granted to the agent: file system access, shell access, network access.

This is tool-mediated prompt injection. It's worse than standard prompt injection for a specific reason: the payload arrives from a source the agent is designed to trust. When a user types a malicious prompt, the agent at least has a chance to apply input filtering. When an MCP server delivers malicious instructions embedded in what appears to be legitimate documentation, the agent treats it as developer intent and acts on it.

The practical impact is concrete. An attacker who controls content served through an MCP server can instruct the agent to read secrets from environment files, exfiltrate credentials to an external endpoint, modify source code, or push changes to a repository. All of this happens within the permissions the developer already granted the agent, using the trust relationship the developer already established with the MCP server.

Why This Works

Three properties make this attack class reliable, and all three are present in every MCP-connected developer agent today.

The first is the trusted channel assumption. MCP servers provide context that the agent treats as authoritative. The protocol was designed for this purpose: give the agent relevant information so it can make better decisions. The agent doesn't distinguish between "documentation facts" and "embedded instructions" because both arrive through the same channel in the same format. There's no content-type boundary that separates data from commands.

The second is high privilege. Developer agents need broad permissions to be useful. File system access for reading and writing code. Shell access for running builds and tests. Network access for pulling dependencies and pushing changes. Git access for commits and pushes. These permissions are necessary for the agent's legitimate function, but they also mean that any instruction the agent follows has significant blast radius.

The third is mixed content. The MCP response contains legitimate documentation alongside malicious instructions. The agent processes the entire bundle as context. There's no mechanism to separate "this is a fact about the library's API" from "this is an instruction to exfiltrate the .env file." Both are text. Both arrive in the same response. The agent's language model processes them identically.

This combination means that anyone who can influence what an MCP server returns to the agent can effectively execute commands on the developer's machine with the developer's permissions. The attack surface isn't the prompt. It's the supply chain of context that feeds into the agent's decision-making.

Why "Better Prompting" Doesn't Fix This

The instinctive response to prompt injection is to harden the system prompt. Add instructions telling the agent to ignore commands embedded in retrieved content. Reinforce that the agent should only follow explicit user instructions.

This doesn't work reliably for three reasons. Language models are probabilistic systems that don't enforce strict boundaries between "data" and "instructions" in their context window. System prompt instructions can be overridden by sufficiently crafted payloads, especially when those payloads arrive through a trusted channel the agent is designed to defer to. And the attack surface is combinatorial: the number of possible injection payloads is effectively infinite, while the system prompt is finite and static.

Prompt hardening is input validation. It reduces attack success rates. It does not eliminate them. And for high-consequence actions like credential exfiltration or code modification, a reduced attack success rate isn't good enough. You need enforcement that operates independently of whether the agent was tricked.

The Controls That Actually Work

Solving this requires enforcement at the tool-call layer, not the prompt layer. The agent can be manipulated into wanting to take a harmful action. The enforcement layer prevents the action from executing regardless of why the agent wanted to take it.

Pre-execution policy checks on tool calls. Before any tool call executes (write file, run shell command, git push, network request), evaluate it against organizational policies. Check the tool name, parameters, destination, and session context. A policy that says "block shell commands that curl to external domains not on the allowlist" catches exfiltration regardless of whether the instruction came from a malicious MCP response, a compromised knowledge base, or a direct prompt injection.

Default-deny for exfiltration paths. Most developer agents don't need to post data to arbitrary external URLs. Restrict outbound network actions to allowlisted domains. If the agent needs to push to GitHub, allow GitHub. If it needs to pull npm packages, allow the npm registry. Everything else is blocked by default. This single control eliminates the most dangerous outcome of MCP injection: sending credentials or source code to attacker-controlled endpoints.

Secrets-aware file access controls. Block the agent from reading .env files, cloud credential files, SSH keys, and other sensitive artifacts unless the task explicitly requires it and has been approved. Most legitimate coding tasks don't need access to production credentials. An agent that can't read the secrets can't exfiltrate them, even if it's been instructed to.

Human approval gates for irreversible actions. Require explicit developer approval before the agent pushes to main branches, deletes files, runs package installs that modify lockfiles, or uploads files to external services. This creates a checkpoint where a human reviews what the agent is about to do before the action has consequences that can't be undone.

Full decision chain logging. Record the complete chain: what the MCP server returned, what instructions the agent extracted, what tool calls it attempted, what the policy evaluation decided, and what the outcome was. If you can't replay the decision chain from MCP response to tool execution, you can't investigate an incident and you can't prove to an auditor that your controls were operating.

The Threat Model for Internal Teams

If you need to communicate this risk to your security team or engineering leadership, here's the simple threat model.

The adversary is anyone who can publish content that becomes agent context. That includes maintainers of documentation sites, contributors to knowledge bases, authors of library readmes, and anyone who can create or modify tickets in project management tools. The attack surface is broader than most teams realize because MCP servers can pull from many sources.

The asset at risk is the developer's machine, credentials, repositories, and anything accessible through the permissions granted to the agent.

The entry point is content injection through an MCP server response. The attacker doesn't need to compromise the MCP server itself. They need to influence the content that the MCP server indexes and returns.

The impact ranges from secrets exfiltration and repository compromise to lateral movement into production systems if the developer's credentials have broader access.

The Pattern Will Repeat

Context7 is the first widely reported MCP injection incident. It won't be the last. Every MCP server that ingests content from sources that aren't fully controlled by the organization creates the same attack surface. Documentation aggregators, ticket system integrations, knowledge base connectors, and code search tools all potentially serve content that originated from untrusted sources.

As organizations connect more MCP servers to their agent workflows, the supply chain of context grows. Each new connection is a potential injection vector. The organizations that treat MCP context as untrusted input and enforce what tools can do regardless of what the context says will be secure. The ones that trust the context because "it comes from our MCP server" will eventually learn the same lesson McKinsey learned with Lilli: trust in the source doesn't guarantee safety of the content.

The fix is architectural, not behavioral. Enforce policies on what agents do, not just what they're told.

We're building Aguardic to enforce organizational policies across agent tool calls, code, documents, and AI outputs. When an agent is tricked by poisoned context, policy enforcement catches the action before it executes. If you're thinking about securing MCP-connected agent workflows, take a look.

MCP Prompt Injection Is a Supply Chain Problem, Not a Prompt Problem

What Happened

Why This Works

Why "Better Prompting" Doesn't Fix This

The Controls That Actually Work

The Threat Model for Internal Teams

The Pattern Will Repeat

Answer the AI questions with controls Aguardic enforces

Related Posts

Lab Tests Show AI Agents Leaking Passwords and Disabling Antivirus. Here's the Real Lesson.

What AI Agent Governance Actually Looks Like

The EU AI Act Was Written for Models. Your Agents Need Runtime Compliance.

Enjoyed this post?

Ready to Govern Your AI?