EU AI Act compliance for ML engineers — what Annex IV means for your training pipeline

#euaiact #compliance #regulation

If you're training or deploying ML models that touch EU users, the EU AI Act's August 2026 deadline is getting close. I've been digging into the Annex IV technical documentation requirements and wanted to share what's actually relevant for ML engineers (vs. the legal jargon). What Annex IV requires you to document about your ML system:

Training data provenance: where your datasets came from, how they were collected, preprocessing steps, labeling methodology. Not just "we used CommonCrawl."
Model architecture and design choices: why you chose your approach, what alternatives you considered, and how the architecture relates to the intended purpose.
Validation and testing: metrics used, test datasets, performance across subgroups (bias testing), and the statistical methodology behind your evaluation.
Risk management: what can go wrong, what you did to mitigate it, and residual risks you've accepted.
Data governance: Article 10 requires documenting data relevance, representativeness, absence of errors, and completeness. If you've ever shipped a model with "good enough" data, this is the part that'll sting.
Post-market monitoring: how you'll track performance drift, feedback loops, and incidents after deployment. The kicker: this applies to non-EU companies too if your model's output is used in the EU (Article 2). The threshold is lower than GDPR, it doesn't require you to be "targeting" EU users, just that the output ends up being used there. The full breakdown of what counts as high-risk is in Annex III, hiring tools, credit scoring, biometrics, medical devices, and safety components all qualify. If your model doesn't fall into these categories, you're likely fine with minimal obligations.

For those who want to go deeper, I wrote a detailed breakdown of the extraterritorial scope here: https://annexa.eu/blog/eu-ai-act-extraterritorial/

Has anyone here started working on Annex IV documentation for their models? Curious what approaches people are taking, especially around documenting training data provenance for large-scale datasets.

DEV Community

EU AI Act compliance for ML engineers — what Annex IV means for your training pipeline

Top comments (0)