What AI Agent Governance Actually Looks Like

Everyone is talking about AI agent governance. NIST just launched an initiative to develop standards for it. Enterprise security teams are adding "how do you govern your AI agents?" to vendor questionnaires. Gartner predicts that by 2028, a third of enterprise software will include agentic AI — up from less than one percent today.

But when you look for concrete answers about what agent governance actually involves — what gets checked, when it gets checked, how enforcement works in a real system — you find almost nothing. The conversation is stuck at the level of principles: "agents should be transparent," "agents need human oversight," "agents must operate within boundaries." These are correct and completely insufficient. They're the equivalent of saying "software should be secure" without explaining authentication, authorization, or encryption.

This post is about the mechanics. What does it actually look like to govern an AI agent — not in theory, not in a framework document, but in a running system where an agent is about to take an action and something needs to decide whether that action is allowed?

Why Agents Are Different

The governance model that works for LLM outputs doesn't work for agents. With an LLM, you're evaluating a single request-response pair. The user sends a prompt, the model generates a response, you check the response against your policies, and you either allow it or block it. It's a stateless evaluation. Each check is independent.

Agents don't work this way. An agent receives a goal, breaks it into steps, calls tools, reads results, makes decisions based on those results, calls more tools, and eventually produces an outcome. The individual steps might be perfectly fine in isolation. The sequence might be a policy violation.

Consider a healthcare AI agent that handles patient intake. Step one: the agent reads the patient's medical history from the EHR system. Allowed — the agent needs this information to do its job. Step two: the agent drafts a summary of the patient's conditions. Allowed — that's the task. Step three: the agent emails the summary to the referring physician. This is where it gets complicated. Did the agent just send protected health information over an unsecured channel? Did the referring physician have authorization to receive this specific patient's records? Was the content minimized to only what's necessary?

No single step in that chain is obviously wrong. The violation emerges from the combination — what data the agent accessed, what it did with that data, and where it sent the result. Governing agents means evaluating actions in context, not in isolation.

This is the fundamental architectural difference. LLM governance is stateless. Agent governance is stateful. And that distinction drives everything about how you design the system.

The Four Layers of Agent Governance

After building enforcement for AI agents across real-world deployments, we've found that governance breaks down into four distinct layers. Each layer answers a different question, operates at a different point in the agent's execution, and catches a different category of risk.

Layer 1: Pre-Execution Policy Checks

The most basic form of agent governance: before the agent takes an action, check whether that action is allowed.

This is the agent equivalent of what we already do for LLM outputs and pull requests — evaluate the action against a set of policies and return a decision. The difference is that the "action" isn't a text response or a code change. It's a tool call, an API request, a database query, or an external communication.

The mechanics are straightforward. The agent's orchestration framework — whether it's LangChain, CrewAI, AutoGen, a custom framework, or an MCP-connected tool — makes a call to the governance engine before executing each action. The call includes the action type, the target system, any parameters, and contextual metadata. The governance engine evaluates the action against the organization's policies and returns one of three results: allow, warn, or block.

Agent: "I want to call the Stripe API to process a refund of $1,200"

Evaluation:
  → Policy: "Financial transactions over $500 require human approval"
  → Action type: api_call
  → Target: stripe.refunds.create
  → Amount: $1,200
  → Result: BLOCK — approval required

The agent receives the block decision and handles it — escalating to a human, retrying with different parameters, or aborting the workflow. The key point is that the unsafe action never executes. The evaluation happens before the side effect, not after.

For this to work in practice, the governance check needs to be fast. If every agent action waits three seconds for a policy evaluation, the agent becomes unusable. This is why a layered evaluation engine matters. Deterministic rules — pattern matching, allowlists, numeric thresholds — evaluate in under ten milliseconds. They handle the majority of checks: is this tool in the approved list? Is this amount under the threshold? Is this target system in the allowed scope? Only the rules that require semantic understanding — intent analysis, context evaluation — invoke the slower AI-powered evaluation. The result is that most agent actions are governed with negligible latency.

Layer 2: Tool-Level Access Control

Pre-execution checks evaluate individual actions. Tool-level access control operates at a higher level: which tools is this agent allowed to use at all?

Every agent has a set of capabilities defined by the tools it can access. A customer support agent might have access to the ticketing system, the knowledge base, and the CRM. A data analysis agent might have access to the data warehouse and a visualization tool. A code review agent might have access to the repository and the CI pipeline.

The governance question isn't just "is this specific API call safe?" — it's "should this agent have access to this API in the first place?"

Tool-level access control defines the boundary. You specify which tools each agent is authorized to use, and any tool call outside that boundary is blocked before the pre-execution check even runs. Think of it as the IAM layer for AI agents — identity-based access control applied to autonomous systems.

Agent: "customer-support-bot"
Allowed tools:
  ✅ zendesk.tickets.read
  ✅ zendesk.tickets.reply
  ✅ knowledge_base.search
  ❌ stripe.refunds.create
  ❌ crm.contacts.delete
  ❌ email.send_external

Agent attempts: stripe.refunds.create
→ BLOCKED — tool not in allowed scope for this agent

This catches an entire class of risk that per-action evaluation would miss. If an agent's prompt is manipulated — through injection, hallucination, or misconfiguration — and it attempts to call a tool outside its scope, the tool-level check blocks it regardless of whether the specific parameters look reasonable. The agent simply doesn't have the permission.

In Aguardic, this maps to our entity system. Each agent is registered as an entity with a type of AGENT. The entity profile defines the agent's purpose, its allowed tools, and its operating environment. Policies are derived from and bound to the entity — so the governance rules are specific to each agent's identity and role.

This registry also solves the visibility problem. Before you can govern agents, you need to know what agents exist. When a new agent identity appears in evaluation requests that doesn't match any registered entity, the system flags it. Unregistered agents are visible immediately, not discovered months later during an audit.

Layer 3: Session-Aware Evaluation

This is where agent governance diverges most sharply from LLM governance. Individual actions that are each allowed in isolation can combine into a policy violation. Catching this requires evaluating actions in the context of what happened earlier in the same session.

An evaluation session tracks the full chain of actions an agent takes while working toward a goal. Each action is evaluated individually against the relevant policies — that's Layer 1. But each action is also evaluated in the context of the session — what tools have been called, what data has been accessed, what actions have already been taken.

The session accumulates context as the agent works. When the agent reads a customer's medical records, the session records that PHI has been accessed. When the agent later attempts to send an email to an external address, the session context — "PHI was accessed in this session" — becomes part of the evaluation input. A policy rule that says "block external communications in sessions where PHI has been accessed" now has the information it needs to enforce the rule.

Session: session-abc-123
Entity: patient-intake-agent

Action 1: ehr.patient.read(patient_id: "P-1234")
  → Session context updated: data_tags = ["PHI", "medical_history"]
  → Policy check: ALLOW

Action 2: document.draft(content: "Patient summary...")
  → Session context: data_tags = ["PHI", "medical_history"]
  → Policy check: ALLOW

Action 3: email.send(to: "external@hospital.org", body: "...")
  → Session context: data_tags = ["PHI", "medical_history"]
  → Policy: "Block external communication when PHI accessed in session"
  → Policy check: BLOCK
  → Reason: "Agent accessed PHI in this session and is attempting
     to send content to an external recipient"

The implementation requires a session store that persists across evaluation calls. In Aguardic, this is the evaluation session module. An external caller — the agent framework — creates a session at the start of a workflow. Each subsequent evaluation call includes the session ID. The session accumulates metadata: tools used, data tags accessed, action count, and the full action chain. Policy rules can reference session context directly — fields.session.dataTags CONTAINS "PHI" — making cross-action rules as straightforward to define as single-action rules.

Sessions have a lifecycle. They start as active, and they end as completed (the agent finished its task), expired (a timeout was reached), or terminated (a policy or human ended the session early). The session's action chain — every action, every evaluation result, every policy decision — is preserved as an immutable audit trail.

This is the capability that matters most for regulated industries. When a regulator asks "what did the agent do and why was it allowed?", the session provides the complete answer: every action in sequence, every policy evaluated, every decision made, and the accumulated context that informed each decision.

Layer 4: Decision Chain Auditability

The first three layers are about prevention — stopping bad actions before they happen. The fourth layer is about evidence — proving that governance was applied, consistently, to every action the agent took.

This might sound like an afterthought. It's not. For any organization deploying AI agents in a regulated context — healthcare, financial services, government — the ability to reconstruct what an agent did and why it was allowed is not a nice-to-have. It's a regulatory requirement.

Decision chain auditability means that every agent session produces a complete, immutable record of:

What the agent was asked to do — the initial goal or trigger
What actions it took — every tool call, API request, and data access
What policies were evaluated — which rules were checked at each step
What the results were — allow, warn, or block for each evaluation
What context informed the decision — the session state at the time of each evaluation
What the final outcome was — whether the task completed, was blocked, or required human intervention

This record isn't assembled after the fact from scattered logs. It's generated as a natural output of the enforcement pipeline. Every evaluation writes to the session's action chain. Every policy decision is recorded with the policy version, the rule that triggered, the evaluation result, and the explanation. The audit trail is a byproduct of enforcement, not a separate process.

When something goes wrong — and with autonomous systems, something eventually will — the decision chain provides the evidence needed for incident investigation, regulatory response, and root cause analysis. When everything goes right, the same decision chain provides the evidence needed for compliance reporting, audit reviews, and enterprise customer assurance.

How This Works in Practice

The four layers operate together, not independently. Here's the full flow for a single agent action:

Agent: patient-intake-agent (registered entity)
Session: session-abc-123 (active, 3 prior actions)
Action: email.send(to: "dr.smith@external.org", body: "Patient P-1234 summary...")

Step 1 — Tool Access Check (Layer 2):
  → Is email.send in this agent's allowed tool list?
  → Result: YES — proceed to policy evaluation

Step 2 — Session Context (Layer 3):
  → Load session state: data_tags = ["PHI", "medications", "diagnosis"]
  → Prior actions: ehr.read, document.draft, document.review

Step 3 — Policy Evaluation (Layer 1):
  → Deterministic rule: "external email recipients must be on approved list"
    → dr.smith@external.org: NOT on approved list
    → Result: VIOLATION (HIGH)

  → Session-aware rule: "block external comms when PHI accessed in session"
    → Session data_tags contain "PHI"
    → Result: VIOLATION (CRITICAL)

  → Semantic rule: "email content must not exceed minimum necessary PHI"
    → LLM evaluates email body against minimum necessary standard
    → Result: VIOLATION (MEDIUM) — body contains full medication list

Step 4 — Enforcement:
  → Highest severity: CRITICAL
  → Enforcement action: BLOCK
  → Agent receives: { status: "BLOCK", violations: [...], sessionId: "..." }

Step 5 — Audit Trail (Layer 4):
  → Action recorded in session chain with all three violations
  → Session state updated
  → Violation records created with full context
  → Timeline entries logged

The agent never sends the email. Three independent policy violations were caught — one by tool-level access control logic, one by session-aware cross-action rules, and one by semantic evaluation of the content. The entire decision is recorded in the session's action chain for audit purposes.

The Integration Model

For agent governance to work, the governance engine needs to sit in the execution path — not alongside it, not after it, but before each action. This requires integration at the agent framework level.

There are two primary integration patterns:

REST API. The agent framework makes an HTTP call to the governance engine before each action. The request includes the action type, target, parameters, session ID, and any relevant context. The response includes the evaluation result and any violations. This works with any agent framework — LangChain, CrewAI, AutoGen, custom implementations — because it's a simple HTTP call that can be inserted into the tool execution pipeline.

MCP Server. For agents built on the Model Context Protocol, Aguardic operates as an MCP tool that the agent queries natively. The agent doesn't make a separate HTTP call — it uses the governance engine the same way it uses any other tool. "Check whether this action is allowed" becomes a tool call in the agent's native workflow.

The MCP integration is worth expanding on because it changes the relationship between the agent and the governance system. In the REST API model, governance is an external check — the orchestration framework calls the governance API before executing a tool. In the MCP model, governance is a tool available to the agent itself. The agent can proactively check whether an action is allowed before attempting it, request information about what policies apply to it, and adjust its behavior based on governance responses.

This isn't a subtle distinction. An agent that can query its own governance constraints is fundamentally more capable than one that simply gets blocked when it tries something unauthorized. The agent can plan around restrictions, choose alternative approaches that comply with policies, and explain to users why certain actions aren't available — all without hitting a block and failing.

Both patterns support evaluation sessions. The external caller creates a session at the start of the agent's workflow and includes the session ID in every subsequent evaluation call. The governance engine accumulates session context across calls, enabling the cross-action policy rules described above.

What This Doesn't Solve

Agent governance through policy enforcement has real limitations, and being honest about them matters more than overpromising.

Emergent behavior from multi-agent systems. When multiple agents collaborate — passing results between each other, triggering each other's workflows — the governance model described here governs each agent individually within its own session. Cross-agent governance — evaluating the collective behavior of a group of agents working together — is a harder problem that requires a different architectural approach. Session linking and multi-agent coordination policies are on our roadmap, not in production.

Intent verification. Policy evaluation can check what an agent does and what data it accesses. It cannot reliably determine why the agent chose a particular action. If an agent's reasoning is flawed but its actions happen to comply with policies, the governance system won't catch it. This is a fundamental limitation of external governance — you're evaluating observable behavior, not internal reasoning.

Novel attack vectors. Prompt injection, adversarial tool manipulation, and other attacks on agent systems are evolving rapidly. Policy-based governance catches known patterns and enforces defined boundaries. It doesn't detect novel attacks that operate within those boundaries. Defense in depth — combining policy governance with model-level safety, application-level security, and human oversight — remains essential.

Latency-sensitive workflows. Adding a governance check before every agent action introduces latency. For most enterprise workflows, the overhead is negligible — deterministic rule evaluation adds under ten milliseconds per check. But for agents operating in high-frequency or real-time contexts, the per-action governance model may need to be supplemented with batch evaluation or pre-approved action scopes.

These limitations don't diminish the value of what policy-based governance does solve. They define the boundary of what it solves, which matters for organizations making deployment decisions.

The Compliance Case

For regulated industries, the compliance case for agent governance is straightforward: if your AI agents are taking actions that affect patients, customers, borrowers, or citizens, regulators will ask how you control those actions. "We trained the model to be safe" is not an acceptable answer. "We have logs" is better but insufficient. What regulators want to see is evidence of enforceable controls applied consistently to every agent action.

The four-layer model produces this evidence as a natural byproduct. Every session generates a decision chain. Every action is evaluated against versioned policies. Every violation is recorded with full context. When the auditor asks "what controls do you have on your AI agents?", the answer is a report — not a story.

This is also the enterprise sales case. When your customer's security team asks "how do you govern your AI agents?", the same evidence applies. The evaluation sessions, the policy registry, the violation records — these are the artifacts that get attached to security review responses, SOC 2 evidence packages, and vendor qualification questionnaires.

The organizations deploying AI agents without governance infrastructure aren't saving time — they're accumulating a compliance debt that compounds with every agent action taken without enforceable rules and auditable evidence. The question isn't whether agent governance is worth the investment. It's whether you build the infrastructure before or after the regulator asks for it.

AI agents are the most powerful and most dangerous capability organizations have adopted since they connected their systems to the internet. They operate continuously, autonomously, and at machine speed. They make decisions that used to require human judgment. And unlike the humans they're augmenting, they don't get tired, don't second-guess themselves, and don't stop to ask whether what they're about to do is allowed.

Governing them requires more than principles and frameworks. It requires enforceable rules evaluated at every decision point, session-aware context that catches cross-action violations, tool-level access control that limits agent capabilities to their intended scope, and complete decision chain auditability that proves governance was applied.

This is what agent governance actually looks like. Not a checklist. Not a monitoring dashboard. A policy enforcement layer that sits between the agent and every action it takes — evaluating, enforcing, and recording, continuously, with evidence.

What AI Agent Governance Actually Looks Like

Why Agents Are Different

The Four Layers of Agent Governance

Layer 1: Pre-Execution Policy Checks

Layer 2: Tool-Level Access Control

Layer 3: Session-Aware Evaluation

Layer 4: Decision Chain Auditability

How This Works in Practice

The Integration Model

What This Doesn't Solve

The Compliance Case

Related Posts

5 Policies Every AI Agent Should Follow Before Taking Action

The US Government Just Made AI Agent Governance a National Priority

OpenAI Acquired OpenClaw. Who's Building the Governance Layer?

Enjoyed this post?

Ready to Govern Your AI?