Add a bookmark to get started

15 January 202514 minute read

FDA issues Artificial Intelligence-Enabled Device Software Functions draft guidance

The US Food and Drug Administration (FDA) recently issued its draft guidance, Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submission Recommendations on January 7, 2025. The draft guidance provides insights on how FDA plans to apply total product life cycle (TPLC)[1] principles that have historically applied to traditional hardware medical devices and artificial intelligence (AI)-enabled software device functions.

What is in the guidance?

The draft guidance highlights FDA’s continued focus on transparency, AI bias, data quality, human factors, change management, and cybersecurity as important criteria for premarket submissions and overall safety and effectiveness. It also includes several detailed appendices with key considerations and recommended formats for premarket submission topics such as performance validation, usability evaluations, and labeling, including appropriate formats for AI model cards.

While the draft guidance is a helpful roadmap for aspects of quality and performance management that FDA deems important to demonstrate the safety and effectiveness of AI-enabled device software functions, it also leaves open to interpretation several practical and operational issues that are increasingly important for the development of AI-enabled devices. The draft guidance does not directly address, for example, issues such as the use and integration of third-party or open-source foundation models for AI-enabled devices, the practical limitations of data provenance disclosures that increasingly influence the development of AI-enabled devices, or unique considerations for applying quality system requirements to generative AI.

The draft guidance affirms FDA’s expectation that developers and sponsors apply comprehensive quality system and post-market elements of TPLC to AI-enabled software, but it does not provide detailed guidance on how sponsors should address issues such as supplier controls, quality agreements, complaint handling, or adverse event reporting for these products. These and other issues present an opportunity to seek further clarification from the Agency by submitting comments to the draft guidance.

Overall, however, the draft guidance provides helpful references for new entrants and experienced developers alike regarding the content of premarket submissions for AI-enabled software devices. We highlight a few of the key takeaways below.

Transparency and explainability matter

FDA emphasized that transparency and explainability are important factors in demonstrating and ensuring safety and effectiveness. The Agency wants to ensure that multiple stakeholders, beyond end users, understand the device architecture, logic, and how to interpret and use outputs. With respect to transparency, the guidance focuses on three primary areas: (1) information about the model and logic behind the device, (2) user experience and workflows, and (3) labeling and other disclosures.

FDA encourages submitters to include detailed information about the technical characteristics of the model and the algorithms, as well as methods used to develop the model. Specifically, the Agency requests information about model inputs (including quality control criteria or algorithms) and outputs, information about model architecture, a description of features and feature selection process, loss functions used for model design and optimization, model parameters, and methods applied to the input or output data (e.g., pre-processing, post-processing, data augmentation, or synthesis). Where there are user customizable features, submitters are advised to flag them.

The Agency also encourages submitters to focus on novel ways to ensure FDA reviewers understand the technology and user experience. To that end, the draft guidance encourages sponsors to incorporate graphics, screen captures or wireframes, or video demonstrations into premarket submissions to help FDA visualize the device in operation. While FDA does not explicitly address open source or third-party models, the Agency states that submitters should include an explanation about pre-trained models that were used, including datasets used for training and how the pre-trained model was obtained. However, it remains unclear how to present this information if it is unavailable, or how much information would be sufficient about these models.

The draft guidance also addresses transparency and explainability for users and operators. FDA emphasizes the need for developers to focus on accurately describing the user workflow, adjacent or connected systems or applications, and the user interface in premarket submissions. It is advised that such considerations should also be a focus of human factors testing, and other testing for the device. A key portion of the guidance focuses on the inclusion of a “Risk Management File,” which involves detailing a risk management plan and risk assessment throughout the TPLC so that FDA may understand risks associated with issues such as device installation, performance of the device over time, and risks associated with user interpretation of the results. To that end, developers may need to adapt human factors testing and development processes to consider knowledge management, training needs, and instructions for operators and installers, back-end technical support, or helpdesk personnel.

Another area of transparency and explainability focuses on labeling and disclosures. The draft guidance states that sponsors should include clear disclosures in end-user labeling. The guidance states that such disclosures should describe the model, its performance characteristics, and how it is integrated into the device, as well as information on the device throughout the course of use. For novel user interfaces, including multiple connected applications with separate user interfaces or tasks, it will be important for labeling to reflect risk controls associated with specific user tasks. Risk control disclosures should also appropriately address limitations or risks associated with the interpretation of software outputs.

The draft guidance also clarifies that unique device identifier (UDI) labeling requirements apply to AI-enabled software devices. A new UDI is necessary when there is a new version or model of a device. For AI-enabled devices, the guidance states that the software version history should include the model version tested, and the differences between it and the released version.

“Validation” might mean something different for AI-enabled software devices

For hardware devices, validation is a critical element of safety and quality. Typical device validation processes involve testing that is specifically designed to confirm that the particular requirements for a specific intended use may be consistently fulfilled. This may involve a series of controlled mechanical, chemical, biocompatibility, or other tests to confirm that a device has the properties identified in the design specifications, or that it can perform the required function or calculation within pre-determined statistical ranges for accuracy.

However, the Agency acknowledges that the AI community uses the term “validation” to refer to “[d]ata curation or model tuning that can be combined with the model training phase to optimize the model selection.” Data curation is the selection, management, and assessment of the independent and dependent attributes or labels of data sets. Model tuning is the phase of development during which a model is tuned or optimized.

Therefore, the Agency emphasizes that the key issues for AI-enabled device validation focus on change control and data drift (ie, a device’s potential sensitivity to differences in input data) between the data used during development, versus actual deployment. Data drift may also occur over the lifecycle of an AI-enabled device in ways that the user may not detect. Therefore, FDA also refers sponsors to its Marketing Submission Recommendations for a Predetermined Change Control Plan for Artificial Intelligence-Enabled Device Software Functions guidance, which the Agency encourages sponsors to use to prospectively identify potential changes and seek clearance for intended modifications without needing additional marketing submissions. The Agency also notes that validation methods should be pre-specified, and not defined post hoc.

In addition to developing comprehensive risk analysis programs and documentation to manage risks related to unintended or undesired changes in device performance, the Agency indicates that developers should consider proactively monitoring, identifying, and addressing device performance changes, changes to device inputs, and the context in which the device is used that could lead to changes in performance. While the guidance does not provide specifics on how these risk analysis programs and documentation or monitoring could differ from traditional devices, FDA notes that details on the submitter’s post-market quality system regulation compliance plan (eg, performance monitoring plans) may be appropriate to ensure adequate ongoing performance. Additional guidance from the Agency on its expectations for post-market quality system characteristics appropriate for AI-enabled devices is anticipated.

FDA also highlights that when evaluating performance of AI-enabled devices, it is important to understand the performance of a “human in the loop,” or how a human interprets the AI and ultimately makes clinical decisions, rather than merely performance of the model in isolation – ie, understanding the impact of human factors is important to performance validation. Notably, in describing the device design, submitters should include information about the degree of automation the device provides compared to the current standard of care, a description of configurable elements of the AI-enabled device, and the potential impact of the configurable elements on the end user’s decision making.

How reliable is your data, and where did it come from?

Having emphasized that data validation is a critical element of validation of AI-enabled software device functions, the Agency doubles down on the importance of data management and data fitness for model development as key indicia of safety and effectiveness. With respect to data management, FDA states that “[f]or an AI-enabled device, the model is a part of the mechanism of action,” and that “a clear explanation of the data management . . . and characterization of data used in the development and validation of the AI-enabled device is critical for FDA to understand how the device was developed and validated.”

Important aspects of data management include data provenance and data collection. FDA also details extensively the type of information submitters should include about data underlying AI-enabled device software, including data provenance, a description of the quality assurance processes related to the data, and information on how the data is generalizable to the broadest subpopulations of intended users of device. Submitters should also include differences in data management and characteristics between the development and validation phases.

Also related to data collection is the significance FDA places on bias, which includes the representativeness of data and performance across subgroups of intended use. For example, models may be over-trained to place too much emphasis on unique attributes that are not generalizable, or data may be overfitted through underrepresentation within the data set (eg, the data fit the potential biases of training data too closely.) This may be especially important when a model is trained on populations outside the US, which submitters should disclose to FDA, and address how and whether the training data reflects or can be generalized to the US population (eg, based on demographics or the clinical standard of care). The Agency does not, however, provide any guidance on specific methods or thresholds for bias testing.

The draft guidance also clarifies FDA’s expectation that sponsors include information to demonstrate that the data powering the AI-enabled device is fit for each purpose for which the data are used throughout the product lifecycle. Two key areas of focus are processes and controls around selecting data and ensuring that data align with the reference standard[2] selected as the ground truth for clinical tasks and functions. With regard to processes and controls, FDA emphasizes fitness for purpose and quality processes and controls around data collection, referring back to its Use of Real-World Evidence to Support Regulatory Decision-Making for Medical Devices guidance. Therefore, companies should revisit their protocols and practices around vetting third-party data sources or third-party data aggregators to ensure alignment with FDA expectations. Data aggregators should also revisit their processes to ensure alignment with FDA and customer expectations.

Regarding clinical performance and validation against an appropriate reference standard, the draft guidance focuses on the importance of using the right data sources to measure device performance and accuracy. When choosing reference standards, FDA states that the standard should reflect the clinical task. FDA requests that submitters include how the standard was established, any inherent uncertainty in the standard, and a description of the strategy for addressing cases where results obtained may be equivocal or missing. Where the standard is based on clinician evaluations, submitters should also include the grading protocol, how the evaluations are collected or adjudicated (including the blinding protocol and number of participating clinicians and their qualifications), an assessment of variability for each clinician as well as among clinicians, and if this variability is commonly accepted for the task. These inputs are intended to help FDA evaluate whether the chosen reference standard is, in fact, reliable, verifiable, and relevant for the particular clinical task.

All of these data issues raise questions around the use of open-source or third-party models, where submitters may not have full visibility into these issues. Regulatory mechanisms such as technical master files or equivalent product dossiers, which have been employed by general use hardware and software developers for years, may be a viable approach for distributors or marketers of general-purpose foundation models or data aggregators to consider. These technical files provide a mechanism by which a party may provide critical technical or operational information about its technology to FDA in a confidential or proprietary file, which may be referenced by medical device sponsors who wish to use the third-party data or foundation models for medical device development purposes.

This may not completely obviate the need for sponsors and submitters to obtain information about third-party models, but it may present a possible pathway to satisfying both FDA requirements and the sponsors’ regulatory obligations.

Recognized consensus standards that matter

Although the draft guidance does not provide a comprehensive roadmap on the application of the quality system regulation (QSR) and effective manufacturing practice requirements for AI-enabled software functions, the Agency identifies relevant recognized consensus standards that provide a framework for important quality management functions, such as risk management. FDA identifies ANSI/AAMI/ISO 14971 Medical devices - Applications of risk management to medical devices and AAMI CR34971 Guidance on the Application of ISO 14971 to Artificial Intelligence and Machine Learning as relevant resources for developing risk management plans for AI-enabled devices. These FDA-recognized voluntary consensus standards specify processes for developers to identify the hazards associated with medical devices, and in particular, AI-enabled devices (eg, energy, biological, chemical, information, functional).

Cybersecurity

Finally, the draft guidance refers back to FDA’s Cybersecurity in Medical Devices: Quality System Considerations and Content of Premarket Submissions guidance for recommendations on what to include regarding cybersecurity controls and security risk management. FDA also provides several examples of cybersecurity threats that may be specific to AI, including: data poisoning (deliberately injecting inauthentic or maliciously modified data), model inversion or stealing (intentional use of forged or altered data to infer details from or replicate models), model evasion (intentionally crafting or modifying input samples to deceive models), data leakage (accessing sensitive training or inference data), overfitting (deliberately exposing the AI components adversarial attacks to result in skewed outcomes), model bias (manipulating of training data to introduce or amplify biases), and performance drift (changing the underlying data distribution to degrade model performance).

Key takeaways

The draft guidance is primarily a technical document that reflects guidance on FDA’s expectations for sponsors to meet threshold safety and effectiveness criteria in premarket submissions. It does not address the increasingly complex and multi-faceted development processes for AI-enabled devices that often involves multiple parties, organizations, and collaborative relationships that do not fit neatly into existing product development and quality management frameworks for hardware device component suppliers and incoming product acceptance activities.

The draft guidance does not directly address special considerations for generative AI-enabled devices. However, it clarifies FDA’s expectations that the traditional rules for hardware devices apply to AI-enabled device software functions, albeit with some allowances for the special and constantly evolving nature of the technology. There is an opportunity to further refine and clarify FDA’s interpretation of these requirements by submitting comments on draft guidance through April 7, 2025.

For more information, please contact the authors.

[1] FDA uses the device TPLC framework to evaluate the safety and effectiveness of medical devices all stages of the product life, from conception to retirement.
[2] A reference standard is the best-available representative or ground truth, ie, a reliable and verifiable reference point against which a model’s accuracy can be measured.

Print