What Is an Annex IV Technical File? EU AI Act Technical Documentation for Engineers
If your AI system is high-risk under Annex III, producing an Annex IV Technical File is a legal precondition for CE marking — not a post-launch deliverable. This guide breaks down every required section, the specific engineering artefacts each one demands, and what compliance teams consistently get wrong.
Definition
An Annex IV technical file is the mandatory documentation package that providers of high-risk AI systems must prepare before placing their system on the EU market, under Article 11 of the EU AI Act (Regulation 2024/1689). It consists of 9 sections covering system description, development process, data governance, performance metrics, risk management, lifecycle changes, applied standards, the EU Declaration of Conformity, and the post-market monitoring plan. The file serves as the primary evidence of compliance during regulatory assessment by market surveillance authorities or notified bodies.
What Article 11 Requires and Why It Matters
Art. 11Article 11 is the legal basis; Annex IV is the technical specification it references. The Article sets out the obligation — providers must draw up technical documentation before placing a high-risk AI system on the market or putting it into service. Annex IV enumerates exactly what that documentation must contain.
Two timing points are frequently misunderstood. First, the technical file must be drawn up before market placement — it is a precondition for CE marking, not a post-launch deliverable. CE marking without a completed Annex IV file is unlawful. Second, under Art. 18, the file must be retained for 10 years after the system is placed on the market, not just maintained until launch.
The obligation falls on providers only. Deployers are not required to produce a technical file, though they have separate obligations under Article 26. For guidance on which role applies to your organisation, see provider obligations under the EU AI Act. Note also that a deployer who substantially modifies a third-party system or deploys it outside its intended purpose may be reclassified as a provider — see Article 25 reclassification scenarios.
The 9 Sections of an Annex IV Technical File
Annex IV specifies nine mandatory sections. Every high-risk AI system's technical file must address all nine — there is no optional set. The sections below are the exact categories defined in the regulation, mapped to the concrete engineering artefacts each one requires.
- General description: intended purpose, version history, deployment context
- Development process: architecture, data, training, validation, cybersecurity
- Monitoring and control: accuracy metrics by subgroup, human oversight
- Performance metric justification: why chosen metrics fit the use case
- Risk management system: hazards, mitigations, residual risk sign-off
- Lifecycle changes: pre-determined changes, substantial modification log
- Applied standards: harmonised standards used or alternative common specs
- EU Declaration of Conformity per Annex V
- Post-market monitoring plan: drift, incidents, escalation runbook
| # | Section | What it covers | Key engineering artefacts | Linked article |
|---|---|---|---|---|
| 1 | General description | Intended purpose, version history, deployment context | System card, model card, use-case spec | Art. 11, Annex IV §1 |
| 2 | Development process | Architecture, data governance, training, validation, testing, cybersecurity | Datasheets, architecture diagrams, test reports, SBOM, adversarial test results | Art. 10, 15, Annex IV §2 |
| 3 | Monitoring and control | Accuracy metrics by subgroup, human oversight mechanisms, output types | Disaggregated benchmark results, override logging spec | Art. 13, 14, Annex IV §3 |
| 4 | Metric justification | Why chosen metrics are appropriate for the use case | Metric selection rationale doc | Annex IV §4 |
| 5 | Risk management | Identified hazards, mitigations, residual risk acceptance | Risk register, FMEA/FMEA-ML, residual risk sign-off | Art. 9 |
| 6 | Lifecycle changes | Pre-determined changes, substantial modification log | Change classification matrix, changelog | Art. 16(d), Annex IV §6 |
| 7 | Applied standards | Harmonised standards used or alternative common spec | Standards register, gap analysis | Art. 40–41, Annex IV §7 |
| 8 | EU Declaration of Conformity | Formal legal statement of compliance | DoC per Annex V | Art. 47 |
| 9 | Post-market monitoring plan | Active monitoring of deployed system performance and incidents | Monitoring spec, incident escalation runbook | Art. 72 |
How to Build Your Technical File from an ML Pipeline
The most efficient path is integrating documentation generation into your MLOps pipeline from project inception, not reconstructing it retroactively before a deadline. Industry estimates put retroactive documentation at 2–3× the effort of concurrent documentation.
- 1
Start at project kick-off, not before launch
Create a documentation repo alongside your model repo on day one. Recommended: a
docs/compliance/directory version-controlled in Git alongside yoursrc/. - 2
Write your system card (Section 1)
Document intended purpose, deployment context, known limitations, and target populations. Commit to a system-card.md. This doubles as your model card for external communication.
- 3
Generate dataset datasheets (Section 2d)
One datasheet per dataset (training, validation, test). Cover provenance, collection method, known biases, preprocessing steps, and class distribution. Reference: Google's Datasheets for Datasets format, adapted for Art. 10 compliance.
- 4
Capture architecture and design decisions (Section 2a–c)
Document model architecture, hyperparameter choices, training infrastructure, and software dependencies (SBOM). Suggested tools: MLflow, DVC for data lineage; Architecture Decision Records (ADRs) for design rationale.
- 5
Run and record validation testing (Section 2f–g)
Disaggregated benchmark results by protected attribute groups. Document acceptance criteria before running tests — Art. 9(8) requirement. Tools: Fairlearn, AI Fairness 360, Weights & Biases eval tables.
- 6
Build and version your risk register (Section 5)
Enumerate failure modes using FMEA-ML or a structured hazard analysis. Score probability × severity. Map each risk to a design control or deployer safeguard. Track mitigation evidence.
- 7
Configure your logging pipeline (Section 2g / Art. 12)
Per-inference structured logging to an append-only store. Capture: input hash, prediction, confidence, model version, timestamp, latency. Minimum 6-month retention.
- 8
Define your post-market monitoring spec (Section 9)
Specify drift detection thresholds, subgroup metric monitoring schedule, incident escalation path, and serious incident reporting triggers per Art. 73.
- 9
Assemble and sign the Declaration of Conformity
Reference all applicable Annex III use cases and harmonised standards used. Must be signed by an authorised representative.
- 10
Set up a document control process
Version control the entire file. Log every substantial modification. Set calendar reminders for periodic review — recommended every 12 months or on any model retrain.
Risk Management Documentation That Satisfies Article 9 Art. 9
A compliant Article 9 risk management system requires maintaining a version-controlled risk register that maps each identified hazard to probability × severity scores, tested against pre-defined probabilistic thresholds (e.g., FPR ≤ 0.01 for safety-critical classifications), with mandatory re-evaluation whenever post-market monitoring surfaces new failure modes or data drift. Engineering teams must produce three concrete artefacts at each lifecycle stage: a risk analysis matrix covering health, safety, fundamental rights, and discrimination categories; a mitigation traceability record linking each risk to its design-level control or deployer-facing safeguard; and signed, dated test reports demonstrating residual risk acceptability — including subgroup-disaggregated results for persons under 18 and other vulnerable populations per Art. 9(9). The process must be iterative and documented as a living system, not a one-time pre-launch gate review.
For fundamental rights risk categories, providers should produce a Fundamental Rights Impact Assessment (FRIA) alongside the Article 9 risk register. These are related but distinct obligations.
Two underspecified requirements engineers miss:
- Art. 9(8): Acceptance criteria must be pre-defined before testing — not derived post-hoc from results. Define your pass/fail thresholds in the risk register before any test run begins, and document the date.
- Art. 9(2)(b): Must assess reasonably foreseeable misuse, not just intended use. Your risk register must enumerate plausible off-label deployment scenarios and show they were considered.
Data Governance Evidence a Notified Body Expects Art. 10
Article 10 requires producing dataset datasheets documenting provenance, collection purpose, labelling procedures, and cleaning methods for every training, validation, and test split. Bias analysis must go beyond aggregate accuracy: you need disaggregated fairness metrics (e.g., equalized odds, demographic parity) across all protected attributes relevant to your deployment population, with explicit documentation of feedback-loop amplification risk per Art. 10(2)(f). For a notified body review, the minimum evidentiary bar is a representativeness report comparing your training distribution against the intended deployment population's demographics, a quantified data error rate from systematic validation sampling, and a documented gap analysis with remediation plan — the "to the best extent possible" qualifier in Art. 10(3) means demonstrated process rigour, not zero defects.
Logging, Audit Trails, and the 6-Month Retention Rule Art. 12
Article 12 requires high-risk AI systems to implement automatic event logging covering three regulatory purposes: risk detection (Art. 79), post-market monitoring (Art. 72), and deployer oversight (Art. 26(5)), with a minimum 6-month retention per log entry under Art. 19(1) and Art. 26(6), and 10-year retention for the broader technical file under Art. 18. In practice, build an append-only structured log pipeline — immutable event store or write-ahead log with hash chains — capturing per-inference records (input hashes, output predictions, confidence scores, model version, feature values, latency) plus system-level events like model deployments, config changes, and human overrides. The forthcoming ISO/IEC DIS 24970 and CEN-CENELEC EN 18229-1 standards will define the normative logging framework.
| Log per inference | Log per system event |
|---|---|
| Input hash | Event type (deploy / config-change / human-override / retrain) |
| Output prediction | Timestamp (UTC) |
| Confidence / probability score | Actor (human or automated) |
| Model version ID | Previous state |
| Timestamp (UTC) | New state |
| Latency (ms) | Approver ID |
| Deployment environment | |
| User / session ID hash |
Post-Market Monitoring Beyond SRE Dashboards Art. 72
Article 72 post-market monitoring demands an active data collection pipeline that goes well beyond SRE dashboards: you must systematically track compliance-relevant signals across all Chapter III requirements — including data drift detection against the original Art. 10 training distribution baseline, subgroup-disaggregated fairness metric monitoring, human override rates, and interaction effects with other AI systems in the deployment stack. When monitoring detects a serious incident — defined as death, serious health harm, critical infrastructure disruption, fundamental rights infringement, or serious environmental damage under Art. 3(49) — this triggers tiered reporting obligations under Art. 73 with deadlines of 2–15 days depending on severity, creating a direct engineering requirement for automated anomaly detection with low-latency alerting to your compliance function.
| Severity | Definition | Reporting deadline |
|---|---|---|
| Critical | Widespread fundamental rights infringement or critical infrastructure disruption | ≤2 business days |
| Severe | Death or serious health harm | ≤10 business days |
| General | All other serious incidents | ≤15 business days |
Harmonised Standards Are Not Ready — What to Use Instead
The following standards are in active development under CEN/CENELEC JTC 21:
| Standard | Maps to | Status |
|---|---|---|
| prEN 18286 — QMS for AI Act | Art. 17, Annex IV overall | Public enquiry closed Jan 2026. Comment resolution underway. First AI harmonised standard. Target: Q4 2026. |
| prEN 18228 — AI Risk Management | Art. 9, Annex IV §5 | Entering public enquiry imminently |
| prEN 18284 — Data Quality | Art. 10, Annex IV §2(d) | Active development |
| prEN ISO/IEC 24970 — AI Logging | Art. 12, Annex IV §2(g) | Internal ballot complete. Comment resolution in progress. |
Already published — use now:
- ✓ISO/IEC 5259 Parts 1–4 (data quality for ML) — published 2024
- ✓ISO/IEC 23894:2023 (AI risk management guidance) — operational
- ✓ISO/IEC 22989 (AI terminology), ISO/IEC 23053 (ML framework) — foundational
- ✓NIST AI RMF — complementary risk management methodology
- ✓ISO/IEC 42001:2023 (AI management system) — useful operational framework, not being harmonised
The Digital Omnibus proposal may delay Annex III obligations to December 2027, but trilogue has not begun as of late March 2026. Build as if August 2, 2026 is still the deadline. Track live standard development at ai-act-standards.com.
For GPAI and foundation model obligations under Annex XI/XII (separate from Annex IV), see GPAI and foundation model obligations.
Annex IV Technical File Checklist
Use this checklist to track coverage against all 9 Annex IV sections. Every item is a required artefact or process for high-risk AI system providers.
Section 1 — General Description
- System name, version, and release date documented
- Intended purpose and use cases defined (including foreseeable misuse)
- Deployment context and target user population specified
- Interaction with other hardware/software documented
- EU responsible person / authorised representative identified
Section 2 — Development Process
- Model architecture documented with design rationale
- Training data datasheets completed (one per dataset)
- Validation and test data datasheets completed
- Bias/representativeness analysis conducted and documented
- Feedback-loop risk (Art. 10(2)(f)) assessed and documented
- Pre-defined acceptance criteria documented before testing
- Subgroup-disaggregated test results recorded
- Adversarial testing and robustness tests documented
- Software Bill of Materials (SBOM) generated
- Cybersecurity measures documented
Section 3 — Monitoring and Control
- Performance metrics disaggregated by relevant subgroups
- Human oversight mechanisms specified
- Output types and confidence/uncertainty measures documented
Section 4 — Metric Justification
- Selected accuracy metrics justified for the use case
- Metric limitations and appropriate-use caveats noted
Section 5 — Risk Management (Art. 9)
- Risk register with probability × severity scoring
- All health, safety, fundamental rights, and discrimination risks enumerated
- Reasonably foreseeable misuse scenarios documented
- Mitigation traceability record (risk → control → evidence)
- Residual risk acceptability sign-off with dated approval
- Vulnerable population considerations (Art. 9(9)) documented
- Ongoing risk management process defined (not one-time)
Section 6 — Lifecycle Changes
- Change classification matrix defined (minor / substantial modification)
- Pre-determined changes documented
- Changelog maintained from v1.0
Section 7 — Applied Standards
- Standards register listing all harmonised standards applied
- For each standard: scope, version, coverage notes
- Gap analysis for areas not covered by harmonised standards
- Alternative common specifications (Art. 41) documented if used
Section 8 — EU Declaration of Conformity
- DoC drafted per Annex V template
- All applicable Annex III use cases referenced
- Signed by authorised signatory
- Dated and version-controlled
Section 9 — Post-Market Monitoring Plan (Art. 72)
- Monitoring scope covering all Chapter III requirements
- Data drift detection thresholds defined
- Subgroup fairness metric monitoring schedule set
- Human override rate monitoring configured
- Cross-system interaction monitoring (Art. 72(2))
- Serious incident definition and detection automated
- Escalation runbook with Art. 73 reporting timelines
- Feedback loop from monitoring to technical file updates
Document Control
- Document control process in place (versioning, review schedule)
- Retention policy configured for 10-year minimum (Art. 18)
- Logging pipeline configured (6-month retention per Art. 19/26)
Not sure which gaps apply to your system? Run a full assessment in ~10 minutes.
Run a full gap analysis on your AI system →Compliance Deadlines and What Triggers a File Update
| Obligation | Deadline | Notes |
|---|---|---|
| Annex III high-risk AI systems (employment, credit, education, biometrics, law enforcement) | August 2, 2026 | Subject to Digital Omnibus — may shift to Dec 2, 2027 if adopted |
| AI in Annex I products (medical devices, machinery, vehicles) | August 2, 2027 | |
| GPAI models | August 2, 2025 — already in force | Separate Annex XI/XII obligations |
| Technical file retention | 10 years from market placement | Art. 18 |
| Log retention (deployer and provider) | 6 months minimum per entry | Art. 19(1), Art. 26(6) |
What triggers a mandatory technical file update:
- ▸Model retrain with new or modified training data
- ▸Architectural changes to the model
- ▸Changes to deployment context or intended purpose
- ▸Post-market monitoring discovers systematic performance degradation
- ▸Serious incident reported under Art. 73
- ▸New failure modes discovered not in the original risk register
- ▸Any change classified as a "substantial modification" under Art. 83(5)
What does NOT require a new conformity assessment:
- ✓Pre-determined changes documented in the original technical file
- ✓Bug fixes that do not affect AI system performance
- ✓UI changes that don't affect model inputs or outputs