FDA and EMA Perspectives on AI in Pharmaceutical Manufacturing
The regulatory landscape for AI in pharmaceutical manufacturing is moving faster than most quality professionals realise. In the span of a few years, both the FDA and the European Medicines Agency have shifted from cautious observation to active guidance development and the details matter enormously for organisations deciding how to deploy AI in GMP-regulated environments.
This blog attempts to give a clear picture of where each agency stands today, what the key documents say, and what the practical compliance implications are for pharmaceutical manufacturers.
Two regulators, two frameworks, one shared concern
The FDA and EMA approach AI governance from different regulatory traditions, but they share a fundamental concern: AI systems in GMP environments must not compromise patient safety, product quality, or data integrity. Everything else flows from that.
What differs is the mechanismby which each regulator has chosen to address AI — and the specificity of what they've said so far.
The FDA position: existing frameworks, expanding engagement
The FDA has not yet published a dedicated GMP-specific AI guidance document in the way the EMA has with draft Annex 22. Instead, the FDA's position on AI in pharmaceutical manufacturing is built on a foundation of existing cGMP regulations, supplemented by broader AI/ML guidance and active engagement programmes.
The cGMP foundation
For pharmaceutical manufacturers, the relevant FDA regulations are well established:
21 CFR Part 211 governs current good manufacturing practice for finished pharmaceuticals. Section 211.68 permits the use of computers and automated data processing systems provided they perform satisfactorily and are subject to a written programme for routine calibration and checking, with records maintained. This provision has always applied to AI systems. The regulation does not carve out an exception, and it does not require explicit AI-specific validation approaches beyond what applies to any computerised system.
Section 211.192 requires the quality unit to review and approve production and control records before batch release, and mandates thorough investigation of any unexplained discrepancy or failure to meet specification. This is the provision that frames the human accountability requirement for batch release. An AI system can assemble evidence, but the quality unit responsibility for review and disposition does not transfer to the model.
21 CFR Part 11 governs electronic records and electronic signatures. Any AI system that creates, modifies, or maintains records subject to predicate rule requirements must satisfy Part 11 controls: audit trails, access controls, system controls for record integrity, and electronic signature requirements where applicable. This applies regardless of whether the system uses AI.
FDA data integrity guidance
FDA's data integrity Q&A guidance defines data integrity in cGMP terms as completeness, consistency, and accuracy of data across the lifecycle, aligned to ALCOA principles: attributable, legible, contemporaneous, original, and accurate. For AI systems, this creates specific requirements around provenance; records generated or modified by an AI system must be attributable to the system and the human who reviewed and approved the output, legible in a human-readable format, and the original AI output (before human modification) should be preserved alongside the approved version.
The audit trail requirement is particularly important for agentic AI systems that take multiple automated steps across different data sources. If only the final output is logged, the audit trail is incomplete. FDA's data integrity framing expects that the full sequence of events can be reconstructed.
FDA's AI/ML framework for Software as a Medical Device
For pharmaceutical manufacturers, the FDA's most developed thinking on AI governance is actually in the Software as a Medical Device space, where the FDA has published an action plan for AI/ML-based software functions. While this framework is not directly applicable to GMP manufacturing software, it introduces concepts that are informative: the distinction between locked and adaptive algorithms, the concept of predetermined change control plans, and the emphasis on performance monitoring over the product lifecycle.
These concepts translate meaningfully to pharmaceutical manufacturing. The idea of a predetermined change control plan, pre-specifying in advance what kinds of model changes are acceptable without full resubmission, and what kind of performance monitoring will detect when the model has drifted outside acceptable parameters, maps closely to what good AI governance in a GMP environment should look like.
FDA's Process Analytical Technology (PAT) guidance
The FDA PAT framework, published in 2004, remains one of the clearest regulatory signals that the FDA supports the use of advanced analytical and process control technologies in pharmaceutical manufacturing. PAT encourages science-based, risk-based approaches to process understanding and real-time process monitoring. ML algorithms coupled with PAT sensors (NIR, Raman, UV-Vis) fall squarely within this intent, provided the system is designed, validated, and monitored appropriately.
The PAT framework is often underappreciated as an AI governance anchor. For in-process monitoring and anomaly detection use cases, it provides a well-established regulatory reference that predates the current AI conversation by two decades.
FDA's engagement posture
The FDA has been actively engaging with industry on AI through several mechanisms: the Emerging Technology Program allows companies to meet with FDA staff to discuss novel manufacturing technologies before regulatory submission; the Digital Health Center of Excellence provides guidance and engagement pathways for digital technologies; and the FDA has participated in ICH working groups that are developing AI-relevant guidance.
The overall FDA posture can be characterised as: apply existing frameworks rigorously, engage early for novel applications, and watch this space for more specific guidance.
The EMA position: more prescriptive, and moving quickly
The EMA's approach has been more prescriptive, moving from the existing Annex 11 framework toward AI-specific guidance with the draft EU GMP Annex 22 published for consultation in July 2025.
EU GMP Annex 11: the existing foundation
Before Annex 22, every computerised system used in GMP activities in the EU was subject to Annex 11. This remains true. Annex 22 is positioned as additional guidance layered on top of Annex 11, not a replacement.
Annex 11's requirements for AI systems are substantial even before Annex 22 is considered:
- Validation of the application and qualification of IT infrastructure, with risk management applied across the lifecycle
- Supplier oversight: formal agreements with suppliers of GxP-relevant systems, with supplier audit information available to inspectors
- Audit trails: for any GMP-relevant record creation, modification, or deletion
- Security and access control: system and data protection, with access restricted to authorised individuals
- Periodic evaluation: confirming the system remains in a validated state over time
- Batch release controls: only a Qualified Person can certify and release batches, via electronic signature with appropriate controls
Organisations that have established a mature Annex 11 compliance programme have already built the scaffolding onto which AI-specific requirements attach.
Draft EU GMP Annex 22: what it actually says
The draft Annex 22 is specific enough to be actionable, and specific enough to create real constraints. Several aspects of it are frequently misunderstood.
Scope: what it covers and what it excludes
Draft Annex 22 explicitly limits its scope to static AI models, i.e. models whose parameters are fixed after training and do not update during deployment. It covers use cases like classification, prediction, and process monitoring where the model produces deterministic or controlled outputs.
Dynamic and continuously learning models, probabilistic-output models, and generative AI / large language models are noted as being outside the scope of draft Annex 22 for critical GMP applications, and the guidance states they should not be used in such applications.
This is the clause that has generated the most discussion. A common reading is to interpret it as a blanket prohibition on LLMs in GMP. That is not correct. Out-of-scope does not mean forbidden. What it means is that LLMs fall back to the existing GMP framework (Annex 11, data integrity requirements), and must be governed under those rules. The practical implication is that LLMs are realistically suited to non-critical GMP support roles (drafting, summarising, retrieving) with documented human-in-the-loop (HITL) review, not to autonomous decision-making in quality-critical processes.
Intended use: the discipline Annex 22 demands
The draft Annex 22 places significant emphasis on intended use definition. The intended use must be described in detail, including characterisation of the input sample space, limitations, rare variations, and erroneous or biased inputs, with subject matter expert responsibility and approval before acceptance testing begins.
This is a higher bar than many organisations are used to for computerised system validation. It requires thinking carefully about what the model is being asked to do, what data it will receive in practice, what happens when the input is atypical, and who is accountable for those design decisions.
Acceptance testing: metrics, independence, and documentation
For critical GMP AI applications, the draft requires test metrics (confusion matrix, sensitivity, specificity, accuracy, precision, F1) with pre-defined acceptance criteria. The key constraint is that acceptance must be at least as high as the process being replaced. AI cannot be introduced if it performs worse than the existing manual process.
Test data independence is explicitly required, with technical and procedural controls and audited access. The guidance requires recording what data was used for testing, when, and how many times, preventing data leakage that would inflate apparent performance.
Explainability, confidence, and “undecided” states
The draft introduces an explicit requirement for explainability elements and confidence scoring with defined thresholds. Critically, it requires the ability to flag outcomes as “undecided” when confidence falls below a threshold, rather than forcing a classification that may be unreliable.
This is an important practical design requirement. A validated anomaly detection model should not simply produce binary pass/fail outputs. It should communicate confidence, and it should have a defined behaviour when confidence is low.
Change control: comprehensive and explicit
The draft Annex 22's change control requirements are broad. Any change to the model, the system, the process it automates or assists, or the physical objects used as input must be evaluated for whether retesting is needed. Decisions not to retest must be justified and approved. This has direct implications for cloud-hosted AI services where model versions may change or discontinued without explicit notification.
Drift monitoring: a lifecycle expectation
The draft expects ongoing performance monitoring and drift detection, not just a one-time validation at deployment. This requires defining what drift means for a given use case, establishing thresholds that trigger review, and having a plan for what happens when drift is detected.
The EU AI Act: the horizontal layer
For EU-based pharmaceutical manufacturers, the EU Artificial Intelligence Act (Regulation (EU) 2024/1689) adds a horizontal regulatory layer above the GMP-specific guidance. The Act entered into force in August 2024 and becomes fully applicable in August 2026.
The AI Act classifies AI systems by risk. AI systems used in regulated products (including pharmaceuticals) may be subject to high-risk AI system requirements under the Act, which include conformity assessments, technical documentation, and registration obligations. The interaction between the AI Act's requirements and GMP-specific guidance is still being worked through, but manufacturers deploying AI in the EU need to track both frameworks.
Where FDA and EMA converge
Despite the different approaches, the FDA and EMA are aligned on the substantive requirements that matter most:
Validation is non-negotiable. Both regulators expect AI systems to be validated as fit for intended use before deployment in GMP-relevant applications. Neither accepts that AI is inherently different in kind from other computerised systems in a way that excuses it from validation obligations.
Human accountability cannot be delegated to a model. Both regulators are clear that AI can support decisions but cannot replace the qualified human who is accountable for them. Batch release decisions, root cause determinations, and disposition decisions require human review and documented approval.
Audit trails must support full reconstruction of events. Both frameworks require that AI-generated or AI-modified records be auditable. The source data, the transformation, and the human approval should all be traceable.
Change control applies to models as well as code. Both regulators expect that changes to AI systems, including model version updates, are subject to the same change management discipline as any other change to a validated GxP system.
Ongoing monitoring is a lifecycle expectation. Neither regulator accepts that validation at deployment is sufficient. AI systems must be monitored in production, and their performance must be periodically evaluated.
For both jurisdictions, a two-lane architecture is the most defensible approach: static, validated models with full lifecycle governance for critical GMP applications; LLM-based tools with documented human review and process validation for non-critical support. This architecture is consistent with both regulatory frameworks and can be evolved as guidance matures.
GMP Bench evaluates AI models on realistic pharmaceutical GMP tasks, from regulatory knowledge questions to document generation challenges. If you're evaluating which models are actually capable in a regulated manufacturing context, explore the test cases and leaderboard.