DEV Community

Cover image for Medical device classification for software and AI: EU, UK and FDA basics
Alexander Thomas
Alexander Thomas

Posted on

Medical device classification for software and AI: EU, UK and FDA basics

Medical device classification for software and AI rests on two fundamental questions: Is the software a medical device at all (qualification), and if so, which risk class applies (classification)?

Unlike physical hardware, software and AI risk is driven primarily by the medical purpose of the output and its impact on clinical decisions, rather than physical invasiveness.

Under the EU MDR, classification for medical device software is commonly determined via Rule 11: software intended to provide information used to take diagnostic or therapeutic decisions is Class IIa unless those decisions may cause serious deterioration/surgical intervention (Class IIb) or death/irreversible deterioration (Class III).

In the US, FDA class (I, II, or III) depends on the specific device type and classification regulation, so it cannot be generalised from clinical "seriousness" alone.

Many founders realize too late that their "wellness app" is actually a regulated medical device. It is often a harsh realization. You may believe you are building a consumer tech product, only to find yourself facing a notified body audit and complex clinical evaluation requirements.

This guide aims to navigate the complexities of qualification and classification without getting lost in legal jargon. While this is not legal advice, it is based on the current regulatory text and common pitfalls observed in the industry.

We will focus on the EU MDR and IVDR first, as they represent comprehensive modern frameworks, followed by the FDA context for the US market. While the underlying logic is often similar across regions, the vocabulary differs, which can be a source of frustration for global companies.

Qualification vs classification (why it matters for software)

A critical distinction that teams often overlook is the difference between qualification and classification. Qualification asks "is it a device?", whereas classification asks "how risky is it?". You cannot classify a product until it has been qualified.

Debating whether a product is Class I or Class IIa is futile if you have not yet established that it is a medical device. It is akin to arguing over speed limits before deciding if you are driving a car or riding a bicycle.

Qualification is a binary step. Under the EU MDR 2017/745, a medical device is any software intended by the manufacturer to be used for specific medical purposes. These include diagnosis, prevention, monitoring, prediction, prognosis, treatment, or alleviation of disease.

The inclusion of "prediction and prognosis" in the MDR was a significant shift, effectively pulling a generation of AI predictive tools into the regulatory net. If your software claims to predict the onset of sepsis within four hours, that is a medical purpose.

Once you accept the device status, you move to classification, which determines your regulatory burden. The EU uses classes I, IIa, IIb, and III, while the US uses Class I, II, and III. A higher class correlates with a higher evidence requirement.

A Class I device may only require a self-declaration and basic documentation, whereas a Class III device demands extensive clinical data and pre-market approval.

Misclassification can be fatal for a startup; classifying too high wastes capital on unnecessary trials, while classifying too low risks enforcement action by authorities.

EU guidance is explicit that qualification and classification are separate steps for software. You must first apply the MDR or IVDR definitions, and only then apply the classification rules. Mixing these steps leads to confusion.

While some consultants may complicate this process, the core logic is sound and centered entirely on the "intended purpose."

Your intended purpose is defined by what you claim the device does, not just what the code is capable of. It is determined by your marketing, labeling, and instructions for use.

If the software has no intended medical purpose, it may fall outside the MDR definition of a medical device; if it is intended for a medical purpose, even basic functions (including display) can be within scope.

EU approach: MDR + MDCG software guidance

The European regulatory landscape shifted significantly with the introduction of the MDR. Under the previous MDD, most standalone software was Class I. Those days are over. Relying on outdated advice from the pre-MDR era is a significant risk.

The central component of the new framework is Rule 11 in Annex VIII of the MDR.

Rule 11 is the specific classification rule for software. While complex, the core principle is that software providing information used for diagnostic or therapeutic decisions is classified based on the risk associated with that decision.

If the decision could cause death or irreversible deterioration of health, the software is Class III. If it could cause a "serious deterioration" or lead to surgical intervention, it is Class IIb. If it could cause a non-serious deterioration, it is Class IIa.

Consequently, software performing decision-support functions often falls into Class IIa at a minimum. If your software informs a decision concerning a medical condition, it is difficult to argue that an error would not cause at least a "non-serious deterioration."

Therefore, many digital health applications in Europe now require a notified body assessment, preventing self-certification.

It is essential to consult MDCG 2019-11. This guidance document is the authority on the qualification and classification of software. It explains how to interpret Rule 11 with specific examples.

It clarifies that software intended to monitor physiological processes is Class IIa, unless it monitors vital parameters where variations could result in immediate danger, in which case it elevates to Class IIb.

Common software claims that trigger medical purpose

Specific vocabulary can trigger regulatory scrutiny. Verbs that imply agency, intelligence, or definitive action are usually indicators of a medical claim.

"Detects" is a major trigger (e.g., "Detects atrial fibrillation from wrist data"). This is a diagnostic claim. "Diagnoses" is explicitly medical.

"Calculates" can be nuanced; calculating BMI is generally acceptable, but calculating a proprietary heart failure risk score classifies the software as a device.

"Predicts" is increasingly scrutinized. A claim like "Predicts 30-day readmission risk" is a prognosis, which constitutes a medical purpose.

Another critical term is "Recommends" (e.g., "Recommends insulin dose" or "Recommends triage priority"). If software instructs a human on clinical management, it is considered high risk.

Even with a disclaimer stating "for information only," regulators assess the functional reality. If the user is expected to rely on the output, it is a medical device.

Risk thinking for SaMD (IMDRF concepts)

The International Medical Device Regulators Forum (IMDRF) is a collaborative group of global regulators. They developed a framework for "Software as a Medical Device" (SaMD) that is instrumental in understanding risk. Both the EU and US regulations borrow heavily from this conceptual framework.

The IMDRF framework utilizes a two-axis matrix to categorize risk, forcing manufacturers to be honest about their product's function.

The first axis is the Significance of the Information (what the output is used for):

  1. Treat or Diagnose (Highest risk)
  2. Drive Clinical Management
  3. Inform Clinical Management (Lowest risk)

The second axis is the State of the Healthcare Situation (the patient's condition):

  1. Critical (Life-threatening)
  2. Serious
  3. Non-serious

Combining these axes results in a category from I to IV.

Category IV represents the highest risk, such as software that treats or diagnoses a critical condition (e.g., AI that identifies a stroke on a CT scan to guide surgery).

Category I represents the lowest risk, such as software informing clinical management for a non-serious condition (e.g., an app tracking physical therapy exercises for a minor injury).

This matrix is an excellent tool for internal strategy. If a product manager claims a tool is "low risk," this chart can verify whether the software drives the management of a serious condition.

If it does, it is not low risk; it is likely Category II or III. In the EU, this translates to Class IIa or IIb. In the US, it likely indicates a Class II device requiring clinical data.

FDA classification overview for software (high level)

The FDA operates differently than the EU. Rather than a single rule like Rule 11, the FDA maintains a comprehensive database of "product codes" and regulations. Classification involves identifying the code that matches your device.

Class I represents the lowest risk. These devices are subject to "General Controls," including company registration, device listing, adverse event reporting, and quality system regulations (transitioning to the Quality Management System Regulation, QMSR, effective February 2026).

Most Class I devices are exempt from premarket notification, meaning you do not need prior permission to sell.

However, very few AI products fit this category, with exceptions for simple image viewers or calculators for standard medical formulas.

Class II covers the majority of digital health and AI software. These are moderate-risk devices subject to General Controls plus "Special Controls," which may include performance standards, labeling requirements, or specific guidance documents.

Most Class II devices require a 510(k) premarket notification, where you must prove "substantial equivalence" to a predicate device already on the market.

Finding a predicate for novel AI can be challenging. If a device has no predicate but is low-to-moderate risk, a De Novo request may be necessary. This process creates a new classification regulation. It requires more effort than a 510(k) but is less burdensome than a PMA.

Class III is reserved for the highest risk devices—those that sustain life, support life, or present a high risk of illness or injury. These require Premarket Approval (PMA), involving a full scientific review that takes years and costs millions.

Few standalone software products are Class III unless they are integral to a high-risk hardware system, such as a pacemaker or artificial pancreas.

The FDA has shown progressiveness through its Digital Health Center of Excellence. Guidance on Clinical Decision Support (CDS) attempts to exempt certain software from regulation if the physician can independently review the basis of the recommendation.

However, for "black box" AI where the algorithm's logic is opaque, this exemption rarely applies.

When IVDR becomes relevant (IVD-related software)

The In Vitro Diagnostic Regulation (IVDR) is frequently overlooked. As the sibling of the MDR, it covers tests performed on biological samples such as blood, urine, or tissue. If software processes data derived from an IVD, it may itself be classified as IVD software.

This is particularly relevant in genomics and precision medicine. An AI analyzing genetic sequencing data to predict cancer risk is likely an IVD medical device, not an MDR device.

The IVDR uses a different classification system: Classes A, B, C, and D, with Class A being the lowest and Class D the highest.

Under the previous directive, most IVDs were self-certified. Under the IVDR, the vast majority now require a notified body.

Software providing information for diagnosis or treatment based on IVD data will typically fall into Class C (high risk). Screening for transmissible diseases like HIV pushes this to Class D.

Complexity arises when software utilizes both IVD data and other physiological data—for example, a risk calculator using blood pressure (MDR) and cholesterol levels (IVD).

MDCG guidance suggests looking at the principal intended purpose, but manufacturers often must satisfy the stricter of the two regulations.

Specialized consultancy is highly recommended for IVDR, as general MDR expertise may be insufficient.

Evidence and documentation (intended purpose, clinical association, validation)

Regardless of the jurisdiction (EU vs. US) or regulation (MDR vs. IVDR), evidence is mandatory. The depth of evidence required correlates with the device classification.

First, you must establish Scientific Validity (or Clinical Association). Does the underlying concept make scientific sense? Is there a proven link between the input data and the clinical condition?

For example, detecting diabetes from voice patterns requires strong peer-reviewed literature proving that voice patterns change specifically due to diabetes. Without this, the algorithm lacks a foundation.

Next requires Analytical Performance, or technical validation. Does the software accurately and reliably process input data?

For AI, this includes testing for bias, robustness across demographics, and performance on different hardware configurations.

Finally, you must prove Clinical Performance. Does the software output actually aid the patient or yield a correct diagnosis in a real-world setting?

For Class IIa/IIb (EU) or Class II (FDA), this usually necessitates a clinical performance study, which may be retrospective (using historical data) or prospective.

Documentation is a critical failure point for startups. You require a Quality Management System (QMS), typically compliant with ISO 13485; software lifecycle documentation compliant with IEC 62304; and risk management compliant with ISO 14971.

These cannot be retroactively generated at the end of a project. "Backfilling" a design history file is rarely successful and is easily detected by auditors. Standards must be integrated from day one.

Common mistakes and pitfalls

Avoiding common errors can save significant time and resources. The following are frequent missteps in the industry.

The "Wellness" Trap: Founders often claim to be a "wellness tool" to avoid regulation, yet pitch investors on "revolutionizing healthcare." You cannot have it both ways.

If the product claims to manage a disease, it is a medical device. Diluting claims to remain in the wellness category often results in a product with limited utility that customers are unwilling to pay for.

Advice vs. Information: There is a critical distinction between providing information and giving advice.

A prompt saying "Go to the ER now" is advice—a command with high risk. A display stating "Heart rate is 140 bpm" is information.

User interface copy is often too prescriptive. Phrases like "Take 5mg of this drug" are dangerous for non-regulated apps; "Consult your doctor about your medication" is the compliant alternative.

Underestimating Timelines: In Europe, securing a slot with a notified body and completing the review process is time-intensive.

Recent manufacturer surveys indicate that the average time for technical documentation assessment can exceed 18 to 22 months. Launch plans must account for this regulatory lag to ensure sufficient financial runway.

Post-Market Surveillance (PMS): Certification is not the finish line. Manufacturers must monitor the device throughout its lifecycle.

For AI models, this includes monitoring for data drift. If a model's performance degrades because the patient population changes, the manufacturer must detect and correct it.

Regulators are increasingly focused on this aspect of AI lifecycle management.

Recent developments and trends

The regulatory field is evolving rapidly. The most significant trend is Generative AI and Large Language Models (LLMs) in healthcare.

These non-deterministic models pose a challenge for regulators because they can hallucinate. Determining classification for these tools still relies on the intended purpose and existing rules, though validating their safety remains a complex, evolving area.

Post-Brexit, the UK is developing its own framework via the MHRA. They are aiming for a more agile, pro-innovation approach, exploring concepts like "AI Airlocks" for controlled device testing.

The UK currently accepts CE marked devices, with dates extending to June 2028 or 2030 depending on the device type and certification status, though a UKCA mark will eventually be required.

Globally, there is a convergence of standards. The FDA, Health Canada, and the UK are collaborating on "Good Machine Learning Practice" (GMLP).

This suggests that following best practices for data management and validation will position companies well globally, even if a single global approval remains unlikely.

Final Thoughts

Medical device classification for software can feel overwhelming, dense, and bureaucratic. However, these regulations exist to prevent patient harm.

Software errors can be lethal; an algorithm missing a cancer diagnosis carries risks comparable to physical surgical failures.

Success in this space requires respecting the rules rather than attempting to circumvent them.

Embrace the classification process. A Class IIa certification is a significant asset that validates the product, proves it works, and differentiates it from unregulated competitors. It serves as a competitive moat.

The best approach is to study MDCG 2019-11 thoroughly and build compliance into the company's DNA from the start.

Top comments (0)