Ilya Denisov

Posted on Mar 30

EU AI Act Starts Aug 2026 - A Practical Checklist for AI Agent Developers

#ai #webdev #programming #python

The EU AI Act's high-risk AI system requirements take effect on August 2, 2026. If you're building AI agents that make decisions affecting people -- purchasing, customer service, hiring, content moderation -- this applies to you.

Fines: up to EUR35 million or 7% of global revenue.

I'm not a lawyer, but I've read the regulation and built tooling around it. Here's what developers actually need to do, with code examples.

What Article 14 Requires (Plain English)

Article 14 is about Human Oversight. In summary:

Requirement	What It Means for Developers
Understand capabilities and limitations	Log what the agent can and can't do
Monitor operation and detect anomalies	Record every decision, detect failures
Interpret outputs correctly	Show why the agent made each decision
Decide not to use or override	Allow humans to block actions
Intervene or interrupt	Detect and flag instruction changes

The common thread: you need a record of what your agent decided, why, and whether anything went wrong.

The Checklist

1. Record Every Decision Point

Not just inputs and outputs -- record why the agent chose each action.

# [BAD] Insufficient
logger.info(f"Agent called tool: {tool_name}")

# [GOOD] What auditors want to see
{
    "timestamp": "2026-03-29T10:15:32Z",
    "event_type": "decision",
    "action": "purchase_product",
    "input": {"product": "Logitech M750", "price": 45.00},
    "reasoning": "Cheapest option matching user's 'wireless mouse' query",
    "agent_id": "shopping-agent",
    "session_id": "order-123"
}

2. Track Which External Data Influenced Decisions

If your agent uses RAG, memory, or retrieved documents, log which documents were used and how relevant they were.

{
    "event_type": "context_injection",
    "source": "vector_db",
    "content": {
        "document": "refund_policy_v2.md",
        "similarity_score": 0.92
    },
    "reasoning": "Retrieved refund policy for customer question"
}

This creates a chain: "this decision was influenced by this specific document."

3. Detect Instruction Changes (Prompt Drift)

If your system prompt changes between agent steps -- config updates, middleware injections, A/B tests -- you need to detect and log it.

# Record the system prompt at each step
prompt_v1 = "You are a helpful shopping assistant."
prompt_v2 = "You are a helpful shopping assistant. Prioritize conversion rate."

# If they differ -> flag as prompt drift
if prompt_v1 != prompt_v2:
    log_event("prompt_drift", diff=compute_diff(prompt_v1, prompt_v2))

4. Add Approval Checkpoints for Critical Actions

Financial transactions, data deletion, external communications -- these need explicit guardrails.

# Before any critical action, record approval/denial
{
    "event_type": "guardrail_pass",  # or "guardrail_block"
    "intent": "user asked to check refund status",
    "action": "process_refund",
    "allowed": True,
    "reason": "Refund amount ($45) within auto-approval limit"
}

If an auditor asks "why did the agent process this refund?", you have the answer.

5. Generate Audit-Ready Reports

You need to produce reports that non-technical people (compliance officers, legal) can read. A JSON log dump won't work.

A good forensic report includes:

Timeline -- chronological record of all actions
Decision chain -- each decision with its reasoning
Incident analysis -- what went wrong and why
Causal chain -- how one failure led to the next
Statistics -- how many decisions, errors, guardrail checks

6. Analyze Failure Patterns Across Sessions

One session's failure is a bug. The same failure across 50 sessions is a systemic risk. Track patterns:

How often does the agent ignore tool errors?
How often are critical actions taken without approval?
Is prompt drift correlated with incorrect decisions?

Timeline

Date	What Happens
Aug 1, 2024	EU AI Act entered into force
Feb 2, 2025	Prohibited practices apply
Aug 2, 2025	General-purpose AI obligations apply
Aug 2, 2026	High-risk AI system requirements apply

You have ~4 months. If your agents handle anything high-risk, start logging now -- retrofitting decision traceability into a production system is much harder than building it in from day one.

Tools

I built Agent Forensics as an open-source tool that handles all 6 checklist items above. One-line integration for LangChain, OpenAI Agents SDK, and CrewAI:

from agent_forensics import Forensics

f = Forensics(session="order-123")
agent.invoke({"input": "..."}, config={"callbacks": [f.langchain()]})

# Generates compliance-ready report
f.save_markdown()

# Auto-classifies 6 failure patterns
failures = f.classify()

But regardless of the tool you use -- the important thing is to start recording now. The longer you wait, the more sessions go untracked.

What's your team's plan for EU AI Act compliance? Are you tracking agent decisions today?

DEV Community