DEV Community

Ramon
Ramon

Posted on

Your AI agent just took an action. Do you know what it did?

A few months ago, a fintech company's accounts payable agent approved and triggered a $47,000 payment to a vendor that had been flagged for fraud two weeks earlier. The flag was in the system. The agent never saw it. By the time anyone noticed, the money was gone.

The company had logs. Technically. They had server logs, database logs, error logs. What they didn't have was a clear record of what the agent saw, what it decided, and why it sent that payment. When their auditors asked, the engineering team spent three days piecing together a timeline from scattered log files that were never designed to answer that question.

This is not an edge case. This is what happens when you put AI agents into production without thinking about accountability.


Agents are different from software

Traditional software is deterministic. If a bug causes a wrong transaction, you look at the code, find the bug, fix it. The behavior is reproducible and the cause is traceable.

AI agents don't work like that. They reason. They make judgment calls. Two identical inputs can produce different outputs depending on context, model temperature, and what happened in previous steps. When something goes wrong, "look at the code" doesn't give you answers. You need to know what the agent actually did, step by step, in that specific run.

This is a fundamentally new problem. And regulators are starting to notice.


What the law says now

The EU AI Act

The EU AI Act became enforceable in stages through 2025 and 2026. The full weight of it lands on August 2, 2026.

For anyone deploying AI in high-risk categories, Article 19 is the one to know. It requires providers of high-risk AI systems to maintain automatically generated logs for a minimum of six months. Longer in some sectors. The logs must be detailed enough to reconstruct what the system did and why.

High-risk categories include: employment and HR decisions, credit and financial services, healthcare, education, law enforcement, and critical infrastructure. If your AI agent touches any of those areas, Article 19 applies to you.

The fines for non-compliance go up to 30 million euros or 6% of global annual revenue, whichever is higher. These are not theoretical numbers. The EU has shown it will enforce them.

The US picture

The US has no single federal AI law yet. But the regulatory pressure is real and it comes from multiple directions at once.

SOC 2 is the de facto standard for B2B SaaS security. If you're selling to enterprise customers, they will ask for your SOC 2 report. Auditors evaluating SOC 2 compliance specifically look for activity logs that show who or what accessed what, when, and what they did. An AI agent that sends emails or triggers payments on your behalf is a system that SOC 2 auditors will want to see logs for.

HIPAA applies to any system handling protected health information. If your agent reads patient records, schedules appointments, or processes healthcare data in any form, HIPAA requires six-year retention of activity logs. Six years. Most teams think about HIPAA in terms of data encryption and access controls, but the logging requirement is just as strict.

SOX and SEC rules govern financial reporting and trading. If your agents are involved in expense approvals, transaction processing, or financial data handling, you need to be able to prove they followed the rules. Not just that the rules existed, but that they were followed, step by step, in each specific instance.

State laws are filling the federal gap. Colorado's AI Act took effect in 2026, requiring reasonable care to prevent algorithmic discrimination and documentation to prove it. California has multiple overlapping AI transparency requirements now in effect. Texas passed TRAIGA on January 1, 2026. These laws are moving fast and the trend is clearly toward more documentation, not less.

The common thread

Across all of these frameworks, the requirement is the same: prove what your AI did. Not in general. In the specific instance your auditor is asking about.

"Our agent follows these rules" is not an answer. "Here is a timestamped, immutable record of every action the agent took on March 14 at 2:47pm, here is what it saw, here is what it decided, and here is why" is an answer.


The problem with existing tools

Most teams are using one of three approaches to deal with this.

Application logs. Standard server logs capture requests and responses but not reasoning. They tell you the agent made a call. They don't tell you what it was thinking. When something goes wrong, you're reconstructing a timeline from logs that were never designed to answer compliance questions.

LLM observability tools like Langfuse or LangSmith are genuinely useful for debugging. They capture traces, spans, token counts, and latency. They're built for engineers who want to understand why a prompt failed or why costs spiked. They are not built for the compliance officer asking what your agent did on Tuesday.

Nothing. More common than people admit. Teams move fast, get agents into production, and assume logging can be sorted out later. Later is when the auditor arrives.

The gap isn't technical. The tools to capture logs exist. The gap is that nobody is building for the people who need to read those logs.


What a real audit trail actually needs

When regulators or auditors ask what your agent did, they need specific things.

A complete timeline. Every action in sequence. Not just the LLM calls but the tool calls, the decisions, the data accessed, the outputs produced.

The reasoning, not just the result. Why did the agent approve that payment? What criteria did it apply? What did it see that led it to that conclusion?

Human review steps. If a person signed off before the agent proceeded, that needs to be in the record too. The full chain of accountability, not just the automated parts.

Immutability. A log you can edit is not an audit trail. The record needs to be append-only with cryptographic proof that nothing was changed after the fact.

Readability. Your compliance team is not going to read JSON traces. The record needs to be something a non-technical person can actually understand.

Retention. Six months minimum for EU AI Act. Six years for HIPAA. The record needs to exist when someone asks for it, not just when it's convenient.


The window is closing

The EU AI Act enforcement deadline is August 2026. That is not far away. Companies that have been running agents in production without audit trails are going to face a choice: retrofit compliance into systems that were never designed for it, or get ahead of it now.

Getting ahead of it now is much cheaper than getting ahead of it in July 2026 with an auditor waiting.

The companies that take compliance seriously from the start will close enterprise deals faster. They will pass security reviews without delays. They will have answers when auditors ask questions. And when something goes wrong, they will know exactly what happened.

The companies that wait will be doing log archaeology at the worst possible time.


In a follow-up post, I'll cover how I built AgentReceipt to solve this problem, including how hash chaining works, why we anchor receipts to a public transparency log, and how three lines of code gives your agent a tamper-proof audit trail.

Top comments (0)