The EU AI Act's high-risk AI system requirements take effect on August 2, 2026. If you're building AI agents that make decisions affecting people -- purchasing, customer service, hiring, content moderation -- this applies to you.
Fines: up to EUR35 million or 7% of global revenue.
I'm not a lawyer, but I've read the regulation and built tooling around it. Here's what developers actually need to do, with code examples.
What Article 14 Requires (Plain English)
Article 14 is about Human Oversight. In summary:
| Requirement | What It Means for Developers |
|---|---|
| Understand capabilities and limitations | Log what the agent can and can't do |
| Monitor operation and detect anomalies | Record every decision, detect failures |
| Interpret outputs correctly | Show why the agent made each decision |
| Decide not to use or override | Allow humans to block actions |
| Intervene or interrupt | Detect and flag instruction changes |
The common thread: you need a record of what your agent decided, why, and whether anything went wrong.
The Checklist
1. Record Every Decision Point
Not just inputs and outputs -- record why the agent chose each action.
# [BAD] Insufficient
logger.info(f"Agent called tool: {tool_name}")
# [GOOD] What auditors want to see
{
"timestamp": "2026-03-29T10:15:32Z",
"event_type": "decision",
"action": "purchase_product",
"input": {"product": "Logitech M750", "price": 45.00},
"reasoning": "Cheapest option matching user's 'wireless mouse' query",
"agent_id": "shopping-agent",
"session_id": "order-123"
}
2. Track Which External Data Influenced Decisions
If your agent uses RAG, memory, or retrieved documents, log which documents were used and how relevant they were.
{
"event_type": "context_injection",
"source": "vector_db",
"content": {
"document": "refund_policy_v2.md",
"similarity_score": 0.92
},
"reasoning": "Retrieved refund policy for customer question"
}
This creates a chain: "this decision was influenced by this specific document."
3. Detect Instruction Changes (Prompt Drift)
If your system prompt changes between agent steps -- config updates, middleware injections, A/B tests -- you need to detect and log it.
# Record the system prompt at each step
prompt_v1 = "You are a helpful shopping assistant."
prompt_v2 = "You are a helpful shopping assistant. Prioritize conversion rate."
# If they differ -> flag as prompt drift
if prompt_v1 != prompt_v2:
log_event("prompt_drift", diff=compute_diff(prompt_v1, prompt_v2))
4. Add Approval Checkpoints for Critical Actions
Financial transactions, data deletion, external communications -- these need explicit guardrails.
# Before any critical action, record approval/denial
{
"event_type": "guardrail_pass", # or "guardrail_block"
"intent": "user asked to check refund status",
"action": "process_refund",
"allowed": True,
"reason": "Refund amount ($45) within auto-approval limit"
}
If an auditor asks "why did the agent process this refund?", you have the answer.
5. Generate Audit-Ready Reports
You need to produce reports that non-technical people (compliance officers, legal) can read. A JSON log dump won't work.
A good forensic report includes:
- Timeline -- chronological record of all actions
- Decision chain -- each decision with its reasoning
- Incident analysis -- what went wrong and why
- Causal chain -- how one failure led to the next
- Statistics -- how many decisions, errors, guardrail checks
6. Analyze Failure Patterns Across Sessions
One session's failure is a bug. The same failure across 50 sessions is a systemic risk. Track patterns:
- How often does the agent ignore tool errors?
- How often are critical actions taken without approval?
- Is prompt drift correlated with incorrect decisions?
Timeline
| Date | What Happens |
|---|---|
| Aug 1, 2024 | EU AI Act entered into force |
| Feb 2, 2025 | Prohibited practices apply |
| Aug 2, 2025 | General-purpose AI obligations apply |
| Aug 2, 2026 | High-risk AI system requirements apply |
You have ~4 months. If your agents handle anything high-risk, start logging now -- retrofitting decision traceability into a production system is much harder than building it in from day one.
Tools
I built Agent Forensics as an open-source tool that handles all 6 checklist items above. One-line integration for LangChain, OpenAI Agents SDK, and CrewAI:
from agent_forensics import Forensics
f = Forensics(session="order-123")
agent.invoke({"input": "..."}, config={"callbacks": [f.langchain()]})
# Generates compliance-ready report
f.save_markdown()
# Auto-classifies 6 failure patterns
failures = f.classify()
But regardless of the tool you use -- the important thing is to start recording now. The longer you wait, the more sessions go untracked.
What's your team's plan for EU AI Act compliance? Are you tracking agent decisions today?
Top comments (0)