Predetermined change-control plans for AI/ML SaMD — how to make them audit-proof

#medtech #regulatory #compliance #qms

I’ve spent the last three years defending algorithm updates to notified bodies and answering the same auditor question: “Show me how you control changes to this model.” For Class II software-as-a-medical-device (SaMD) where models will keep evolving, a predetermined change-control plan (PCCP) isn’t optional — it’s the practical way to show auditors you treated change control as governance, not theater.

Below are the concrete patterns we used to build PCCPs and validation artifacts that survive ISO 13485/21 CFR inspections and real-world use. I’ll note where things are specific to a Class II workflow, and where the approach scales up to higher-risk devices.

What a PCCP actually needs to prove

Regulators are not asking you to stop improving models. They want evidence that:

You pre-specified what kinds of changes are allowed without a full redesign review.
You defined objective acceptance criteria and test artifacts for each change type.
You have a controlled, traceable pipeline for making, testing, and deploying changes.
You monitor model performance in the field and have CAPA triggers tied to that monitoring.

Link these to the standards auditors will quote: ISO 13485 (change control expectations — see section on change processes) and FDA 21 CFR 820.70 (design changes). If you can map your PCCP artifacts to those clauses, you’re speaking the auditor’s language.

A practical PCCP structure (the pieces we deliver)

Treat the PCCP like a small design control bundle. Ours includes:

Scope and change taxonomy
- What component(s) the PCCP covers (weights, retraining pipeline, pre/post-processing).
- Change categories: A (parameters/configs), B (retraining on new labeled data), C (architecture changes).
Preconditions / guarded inputs
- Data provenance checks, labeling consistency rules, and minimum sample-size rules.
Acceptance criteria (numeric + clinical context)
- Performance metrics (AUC, sensitivity at fixed specificity, calibration) with pass/fail thresholds.
- Clinical-impact checks: false-negative reduction target, no clinically meaningful increase in false-positives.
Validation artifacts to produce
- Fixed validation dataset (frozen holdout), independent test set, synthetic stress tests.
- Reproducible training logs, seed control, container image ID.
Deployment controls
- Canary/rollout plan, rollout percent, rollback criteria.
Monitoring & post-market checks
- Drift detection rules, periodic re-eval cadence, real-world performance thresholds.
Governance & traceability
- Required approvals, CAPA triggers, trace matrix linking risk controls to requirements.

How we make validation reproducible

Auditors will ask to re-run your validation or at least to see that it could be re-run. That means reproducible environments and frozen datasets:

Keep a "golden" holdout test set that never touches retraining. If you need to expand it, document why and treat it as a design change.
Store container images and a deterministic training script (hash the repo/commit + container ID).
Capture random seeds, preprocessing versions, and third-party library versions (don’t rely on "latest").
Produce a validation report template that includes: dataset stats, metric results with confidence intervals, failure-mode analysis, and clinical-meaning commentary.

We check these programmatically in CI so the document is generated, not manually assembled.

Risk-staged changes and when to escalate

Not every model tweak needs a full design-review workflow. Use a clear staging rule:

Category A (minor): hyperparameter tuning, threshold change within pre-specified range.
- Controls: automated unit tests, automated metric checks against frozen validation set, engineering sign-off + QA review.
Category B (moderate): retraining on new labeled data that meets provenance rules.
- Controls: full validation report, clinical reviewer sign-off, QA + RA approval, limited rollout.
Category C (major): architecture changes, new input modalities, label schema changes.
- Controls: design history file update, full risk assessment (per ISO 14971), formal change control board (CCB) review.

Document the escalation path in the PCCP and automate it wherever possible.

Operational tooling that helps (CI, monitoring, QMS integration)

You don’t need exotic tools — you need reliable links between engineering and your QMS:

CI pipelines that run validation and produce an artifact bundle (report, hashes, container IDs).
Model-monitoring that emits drift and performance alerts as events (webhooks).
Configured webhooks or API calls that create a change-control record in the QMS when a Category B/C change is proposed.
Traceability matrix stored with artifact IDs linking to the Design History File (DHF).

On the tooling side: we run Greenlight Guru for controlled docs and link CI artifacts via a webhook that creates a change record. Whatever QMS you use, ensure it captures the artifact IDs — auditors want the chain of custody.

Post-market and CAPA integration

Define explicit CAPA triggers tied to monitoring metrics:

Trigger examples:
- Population drift metric exceeds threshold for two reporting periods.
- Sensitivity drop below X for consecutive weeks.
- Clinician complaint count related to misclassification crosses threshold.
When a trigger fires, your PCCP should define:
- Immediate mitigation (rollback or restrict use).
- Root-cause workflow (data drift vs label noise vs concept shift).
- A timeboxed plan for corrective action and re-validation.

Make sure CAPA activities link back to the PCCP change record — that traceability is a frequent auditor question.

My pragmatic rules that saved time in audits

Freeze an immutable test set and treat changes to that set as a design change.
Automate as many checks as possible — a generated validation report beats a hand-assembled one every time.
Keep acceptance criteria clinical, not just statistical. Auditors like seeing clinical rationale.
Be explicit about who can approve what. “Engineering OK” is not enough.

Closing (and one question)

A PCCP is as much about the governance steps you pre-agree on as it is about metrics. If you can answer “what would we do if model sensitivity dropped 5% tomorrow?” in a 3-slide packet with reproducible artifacts, inspectors tend to relax.

How are you tying automated model-monitoring events into your change-control/QMS workflows today — webhooks into a change record, manual ticketing, or something else?