Aguardic logoAguardic

Healthcare AI Programs Don't Fail at Policy. They Fail at Enforcement.

Every healthcare organization running AI has a governance document. Almost none have enforcement that runs where AI runs. The gap between NIST AI RMF frameworks and operational compliance is the missing layer.

Aguardic Team·April 4, 2026·9 min read

Every healthcare organization running AI has a binder. Sometimes it is a SharePoint folder. Sometimes it is a 40-page PDF titled "AI Governance Framework" that three people have read. The binder describes principles. It references NIST. It mentions responsible use. And none of it touches the systems where AI actually runs.

A recent HIT Consultant piece by Marty Barrack, CISO and Chief Legal and Compliance Officer at XiFin, makes a useful argument: healthcare enterprises should stop treating AI adoption as a series of disconnected pilots and start building governance that spans procurement, risk management, and operations. The recommended approach is to use NIST AI RMF as the operating framework for risk and trustworthiness, and layer ISO 42001 on top as a certifiable management system.

That advice is directionally right. The frameworks are sound. The problem is what happens after the frameworks are selected.

The Gap Between Frameworks and Enforcement

Frameworks describe what good looks like. They define categories of risk, outline governance functions, and establish the vocabulary for managing AI responsibly. What they do not do is prevent an AI chatbot from disclosing a patient's medication list in an unsecured channel at 2 a.m. on a Tuesday.

This is the gap that healthcare AI programs keep falling into. The governance document says "ensure appropriate safeguards for PHI." The clinical support tool runs with no runtime check against HIPAA disclosure rules. The compliance team discovers the exposure during a quarterly review, three months after the first violation.

The missing layer is enforcement. Not principles, not risk categories, not management system clauses. Executable checks that run where AI work happens, in real time, continuously.

A Three-Layer Stack for Healthcare AI Governance

Think about the relationship between NIST AI RMF, ISO 42001, and daily operations as three layers that must connect or nothing works.

The first layer is framework intent. This is what NIST and ISO define: trustworthiness characteristics, risk functions (Govern, Map, Measure, Manage), management system requirements, and continuous improvement obligations. It answers the question "what does responsible AI look like for our organization?"

The second layer is operational policy. This is where framework language becomes specific to your environment. "Ensure transparency" becomes "every AI-generated patient communication must include a disclosure that the content was AI-assisted." "Manage data governance" becomes "no model may be trained on PHI without a signed data use agreement and BAA." These are the rules your organization commits to following.

The third layer is enforcement. This is where rules become checks that actually run against AI outputs, agent actions, code commits, and document generation. A policy that says "no diagnosis language unless explicitly authorized" must translate into a runtime evaluation that flags or blocks an AI response containing diagnostic terminology when the use case does not permit it.

Most healthcare organizations have the first layer. Many have started on the second. Almost none have the third.

Inventory Is the Control Plane

Both NIST AI RMF and ISO 42001 emphasize inventorying AI systems. In healthcare, that inventory must go deeper than a spreadsheet of model names and vendors.

A meaningful AI inventory tracks use cases and their risk classification (clinical decision support vs. operational scheduling vs. patient-facing communication), the data sources each system touches (PHI, claims data, imaging, clinical notes), vendors and subcontractors with their contractual obligations, integration surfaces where AI connects to production systems (EHR, patient portals, call centers, email, billing), and the specific permissions each agent or tool holds (can it write orders, send messages to patients, modify billing codes).

If you cannot answer "which AI system touched this patient's data, when, and what action did it take," you cannot meet ISO 42001's governance expectations or HIPAA's audit requirements. The inventory is not a compliance checkbox. It is the control plane for everything that follows.

Procurement as Testable Requirements

Barrack's article rightly emphasizes that governance must extend to procurement and contracting. The practical translation is to stop treating vendor contracts as one-time questionnaires and start treating contractual claims as continuously testable requirements.

When a vendor says "we provide complete audit logging," that becomes a verification target: does the integration actually emit structured logs for every AI-generated action? When a contract specifies "customer data will not be used for model training," that becomes a monitoring requirement: is there evidence that the training exclusion is being enforced? When the agreement includes a 72-hour incident notification timeline, that becomes an SLA you can measure against.

The pattern is consistent. Take the contractual language, extract the testable claim, define the evidence that proves compliance, and check it on an ongoing basis rather than once during procurement review.

Controls That Matter in Production

Healthcare AI governance gets concrete at the point where an AI system takes an action that affects a patient, a record, or a financial transaction. These are the controls that matter in real deployments.

Human approval gates belong on any irreversible action: sending a message to a patient, placing an order, modifying a billing code, changing a treatment plan. The AI system can draft, recommend, and prepare. A qualified human confirms before the action executes.

Context constraints define where an AI system can look. A clinical summarization tool should retrieve from the patient's own record and approved reference sources. It should not pull from other patients' records, external databases without a BAA, or training data that contains PHI from a different institution.

Output constraints define what an AI system can say. No diagnosis language unless the use case is explicitly classified as clinical decision support with appropriate oversight. Citation requirements for any clinical content. Disclosure language on all patient-facing AI-generated communications.

Access constraints enforce least privilege at the tool level. An agent that schedules appointments should not have write access to clinical notes. An agent that drafts billing summaries should not be able to modify payment records. Every permission should be justified by the use case and revocable when the use case changes.

Continuous Evaluation Is the ISO 42001 Differentiator

ISO 42001's value over a standalone NIST AI RMF implementation is the management system structure: defined ownership, change control, corrective actions, and evidence of continuous improvement. For AI, that structure must translate into operational practices that go beyond periodic reviews.

Revalidation should trigger whenever a prompt changes, a retrieval corpus is updated, a tool permission is added, or a model version changes. Any of these can alter the behavior of an AI system in ways that existing policy checks may not catch. Automated regression testing should verify that clinical content style, safety constraints, and disclosure requirements still hold after changes. This is the AI equivalent of running your test suite after a code deploy, except the "code" is prompts, retrieval sources, and model weights.

Drift monitoring should track changes in retrieval patterns and tool usage over time, not only output text. An agent that starts accessing a data source it was not originally configured to use is a governance event even if the outputs look normal. ISO 42001 asks for evidence that you are managing change. Continuous evaluation produces that evidence automatically.

Ten Policies Every Healthcare AI Program Should Enforce

Governance frameworks become real when you can point to specific, enforceable rules. Here are ten that map directly to NIST AI RMF trustworthiness characteristics and ISO 42001 management system requirements.

First: all AI-generated patient communications must include disclosure language identifying the content as AI-assisted. Second: no AI system may generate diagnostic language unless classified as clinical decision support with documented physician oversight. Third: PHI may only be processed by AI systems with a current BAA and documented data use agreement. Fourth: AI-generated clinical summaries must cite the source record for every factual claim. Fifth: any AI action that modifies a patient record, billing code, or treatment plan requires human approval before execution. Sixth: AI agents must operate under least-privilege access, scoped to the minimum permissions required by their documented use case. Seventh: model or prompt changes to production AI systems require documented review and revalidation before deployment. Eighth: AI systems must log every input, output, and action with sufficient detail for HIPAA audit requirements. Ninth: retrieval sources for clinical AI must be restricted to approved, validated reference materials and the patient's own record. Tenth: any AI system processing PHI must undergo risk assessment and classification before connecting to production data.

These are not aspirational principles. Each one translates to a check that can run against an AI system's behavior in real time, producing evidence of compliance or flagging a violation the moment it occurs.

From Framework Compliance to Engineering Practice

The HIT Consultant article concludes that healthcare organizations need to become "AI-ready" through framework adoption. That is the right starting point. The next step is recognizing that frameworks do not enforce themselves.

The fastest path from NIST AI RMF guidance and ISO 42001 certification requirements to operational governance is to treat policies as executable checks that run across the surfaces where AI work happens: runtime API calls, agent tool use, code commits, document generation, and patient-facing communications. That is how "framework compliance" stops being a binder on a shelf and becomes part of routine engineering practice.

Governance that only exists in documents is policy theater. Governance that runs where AI runs is operational compliance. The frameworks tell you what to build. The enforcement layer is what makes it real.


We're building Aguardic to turn governance frameworks into enforceable policy checks across AI outputs, agent actions, code, and documents. If you're working on AI governance in healthcare, try extracting policies from your existing compliance documents and see what enforceable rules are already hiding in your binder.

Enjoyed this post?

Subscribe to get the latest on AI governance, compliance automation, and policy-as-code delivered to your inbox.

Ready to Govern Your AI?

Start enforcing policies across code, AI outputs, and documents in minutes.