In the past week, Microsoft announced Agent 365 — a unified control plane for observing, governing, and securing AI agents across the enterprise — and Palo Alto Networks published research showing how their contextual red teaming approach uncovered a $440,000 financial manipulation vulnerability that standard security testing completely missed. Both announcements matter. Together, they reveal both where agent security is heading and where significant gaps remain.
The short version: Microsoft is solving agent visibility and identity. Palo Alto is solving agent vulnerability discovery. Neither is solving organizational policy enforcement. And that's the layer that regulated industries actually need most.
What Microsoft Built
Agent 365, generally available May 1 at $15 per user per month, is Microsoft's answer to a problem every enterprise is quietly panicking about: they have no idea how many AI agents are running in their environment. Microsoft's own internal deployment found over 500,000 agents across the company. More than 80% of Fortune 500 companies are already using active AI agents built with low-code and no-code tools, which means agents are being created by people who have never thought about security governance.
Agent 365 provides a unified control plane where IT, security, and business teams can see which agents exist, understand how they behave, manage who has access to them, and identify security risks. It extends Microsoft's existing security infrastructure — Entra for identity, Defender for threat protection, Purview for data governance — to cover non-human actors operating at scale.
The framing Microsoft uses is telling: AI agents should be held to the same standards as employees or service accounts. Zero Trust principles — least privilege access, explicit verification, assume compromise — applied to autonomous systems. This is the right conceptual model. An agent that can query databases, call APIs, send emails, and modify records needs the same identity controls you'd apply to a new hire. Agent 365 is essentially HR onboarding for AI systems.
What this solves is real. Shadow AI — agents running without IT knowledge — is a genuine risk. Agent inventory is a prerequisite for governance. You can't govern what you can't see. Microsoft is closing the visibility gap, and they're doing it at platform scale.
What Palo Alto Proved
Palo Alto's contextual red teaming research is the more technically interesting announcement, and it contains a lesson every organization deploying agents should internalize.
They tested an internal AI financial assistant — a representative agent that authenticates users, manages wallet balances, and provides investment guidance. First, they ran a standard attack library scan: thousands of generic jailbreak prompts, content safety tests, and prompt injection attempts. The result was a risk score of 11 out of 100. Low risk. Safety-class attacks achieved a 0% bypass rate. By conventional standards, this agent was secure.
Then they ran contextual red teaming. Instead of generic attacks, their profiling agent first discovered what the target could actually do: which tools it could invoke, what data it could access, what authorization dependencies existed between tools. Armed with that context, the red team crafted a targeted attack using a movie roleplay scenario that granted fictional authorization for portfolio rebalancing. On the fifth attempt, the agent moved $440,000 across 88 wallets.
No code access. No infrastructure compromise. No malware. Just conversational manipulation combined with tool authority.
The standard library had no knowledge of the withdraw_funds tool, the database schema, or the permissive SQL query scope. It tested pattern resistance. It didn't validate authorization boundaries. For agentic AI, that gap is the difference between measuring risk and missing it entirely.
This is a critical insight: agent security testing that doesn't understand what the agent can do is security theater. Generic jailbreak libraries catch generic risks. The real vulnerabilities are contextual — specific to the agent's tools, permissions, and operational environment. Palo Alto's Prisma AIRS approach treats every agent as a unique attack surface that requires profiling before testing.
What Neither Announcement Covers
Microsoft gives you visibility into which agents exist and what they can access. Palo Alto gives you the ability to discover vulnerabilities before attackers do. Both are necessary. Neither addresses the most common governance failure in production: an agent doing something that is technically authorized but violates organizational policy.
The $440,000 attack Palo Alto demonstrated was a security vulnerability — the agent shouldn't have been able to execute that transaction. But most real-world governance failures aren't security breaches. They're policy violations by agents operating within their authorized scope.
A healthcare agent that has legitimate access to patient records and legitimate access to email sends a referral summary to a physician who isn't authorized to receive that specific patient's information. The agent had access to both systems. The action wasn't a security breach. It was a HIPAA violation.
A financial advisory agent that is authorized to generate client communications sends a response containing language that implies guaranteed investment returns. The agent had access to the communication channel. The content wasn't toxic or unsafe by any generic safety standard. It violated SEC compliance requirements specific to that organization.
An AI coding assistant with full repository access generates a pull request that includes a test fixture containing production customer data. The commit passed all security scans. No secrets were detected. But the organization's data governance policy prohibits customer data in test environments.
These aren't attacks. They're agents doing their jobs without awareness of organizational rules. Microsoft's identity controls won't catch them because the agent was operating within its authorized scope. Palo Alto's red teaming won't catch them because they aren't security vulnerabilities — they're compliance violations specific to rules that exist in that organization's policy documents, not in any generic safety framework.
The Missing Layer: Organizational Policy Enforcement
What's missing between "this agent is authorized" (Microsoft) and "this agent is secure" (Palo Alto) is "this agent's actions comply with our specific organizational rules."
This layer requires three capabilities that neither platform provides today.
First, organization-specific policy definition. The rules that govern an agent's behavior in a healthcare company are different from a financial services company, a legal firm, or a SaaS vendor. These rules come from HIPAA compliance documents, SEC regulations, internal brand guidelines, customer contracts, and industry-specific standards. They can't be pre-built by a security vendor because they're unique to each organization. The governance system needs to ingest an organization's own documents and extract enforceable rules from them.
Second, session-aware evaluation across actions. Agent governance failures emerge from sequences, not individual actions. An agent reading patient data is fine. The same agent sending that data externally three steps later might be a violation. Evaluating individual actions against policies catches obvious violations. Evaluating the full session — what data was accessed, what tools were used, what actions followed — catches the contextual violations that are far more common and far more costly.
Third, multi-surface enforcement beyond the agent itself. An agent doesn't operate in isolation. The code it generates gets committed to repositories. The documents it creates get shared through storage platforms. The emails it sends go through email systems. The messages it posts appear in Slack channels. Governing the agent's tool calls is necessary, but the content the agent produces flows across surfaces that each need their own enforcement. A single policy — "no PII in external communications" — needs to work whether the communication is an agent's API call, an email, a Slack message, or a shared document.
How the Pieces Fit Together
The right way to think about this isn't "which product is the answer?" It's "which layers does a complete governance stack need?"
Agent Inventory and Identity (Microsoft Agent 365): Know which agents exist, manage their permissions, apply Zero Trust principles to non-human identities. This is the foundation. Without it, everything else operates on incomplete information.
Vulnerability Discovery (Palo Alto Prisma AIRS): Continuously red-team agents to find security vulnerabilities before attackers do. Contextual testing that understands each agent's specific tools and permissions. This catches technical weaknesses in the agent's design.
Organizational Policy Enforcement: Evaluate every agent action — and every piece of content the agent produces across every surface — against the organization's specific rules. Session-aware evaluation that tracks data access across multi-step workflows. Graduated enforcement (block, warn, monitor) based on violation severity. Full audit trail generated automatically as a byproduct of enforcement.
The first two layers tell you what agents exist and whether they're technically secure. The third layer tells you whether what they're doing complies with your actual business rules. For regulated industries, the third layer is the one the auditor asks about.
The Market Signal
Microsoft pricing Agent 365 at $15 per user per month tells you they see this as a core enterprise infrastructure layer, not a niche security product. Palo Alto publishing a detailed case study showing a six-figure financial manipulation missed by standard testing tells you the threat model is real and growing. Both of these companies are investing heavily in agent governance because the market is demanding it.
But their solutions operate at the infrastructure and security layers. The compliance and policy enforcement layer — the layer that answers "does this action comply with our specific organizational rules?" — is a different product category. It sits above identity management and security testing, consuming their outputs while applying organizational context that neither platform has access to.
This is the architectural gap that will define the next wave of AI governance tooling. Visibility plus security testing plus organizational policy enforcement is the complete stack. We're watching the first two layers mature in real time. The third is where the opportunity and the urgency is greatest.
We're building Aguardic as the organizational policy enforcement layer. Extract rules from your compliance docs, enforce them across every surface where AI-generated content flows, and generate audit-ready evidence automatically. If you're thinking about where policy enforcement fits in your agent governance stack, take a look.



