Your LangSmith Traces Are Not an Audit Trail

#ai #security #programming #webdev

You have LangSmith set up. You can see every prompt, every token, every span. Your traces are clean, your latency charts are green, and you can replay any run from the last 30 days.

When your compliance officer asks what your agent did on March 14th, you send them a LangSmith link.

They come back with more questions. And you realise you answered the wrong question.

Observability tools are built for engineers

LangSmith, Langfuse, Arize, Helicone. These are genuinely useful tools. They exist to help you debug prompts, track costs, measure latency, and understand why a chain returned something unexpected.

They are built for the question: why did this not work the way I expected?

That is an engineering question. It gets asked during development, during an incident, during a postmortem. The audience is you and your team. The output is a fix.

Audit trails are built for auditors

An audit trail exists to answer a different question: can you prove what your agent did, and that it was authorised to do it?

That question gets asked by a compliance officer, a regulator, a customer's legal team, or an external auditor. It might get asked six months from now. The audience is not your engineering team. The output is evidence.

The distinction matters because the two tools are built with completely different constraints in mind.

Observability tools are optimised for developer experience. Fast search, good visualisation, easy filtering. Retention is typically short because storage is expensive and the main use case is recent debugging. The data lives in a database the vendor controls. If you delete a trace, it is gone.

An audit trail needs to be the opposite. Long retention by default. Immutable records that cannot be edited or deleted after the fact. Cryptographic proof that what you are showing today is exactly what was recorded at the time. Readable by a non-technical person. Something you can hand to an auditor without a 20-minute explanation of what a span is.

The specific things that are missing

Immutability. LangSmith stores your traces in a database. That database can be written to. Records can be deleted. There is no cryptographic proof that a trace you show an auditor today is identical to what was recorded six months ago. A mutable log is not evidence. It is an assertion.

Chain of custody. Observability traces show you what the LLM did. They typically do not capture the full sequence of tool calls, external API calls, human approval steps, and downstream effects that make up an agent's actual action in the world. An auditor does not care about your token counts. They care about what your agent did to real data and real systems.

Retention guarantees. The EU AI Act requires six months minimum for high-risk systems. HIPAA requires six years. Most observability tools default to 30 or 90 days. You can pay for longer, but retention is not the same as an archived, legally defensible record.

Non-technical readability. Traces are structured for developers. They are full of span IDs, model names, raw JSON, and timing data. If your compliance team needs to understand what your agent did, they cannot read a LangSmith trace without help. An audit trail needs to be legible to the person asking the question.

You probably need both

This is not an argument to stop using observability tools. Use them. They are the right tool for debugging and performance monitoring.

But they should not be your answer when someone asks you to prove what your agent did. They were never designed to be that answer, and treating them as one creates a compliance gap that will surface at the worst possible time.

The question to ask about your current setup: if an auditor asked you right now to produce a tamper-proof record of every action your agent took in a specific session three months ago, could you do it?

If the answer is "we would pull the LangSmith traces," you have observability. You do not have an audit trail.

AgentReceipt gives your AI agents a tamper-proof audit trail with hash-chained records anchored to a public transparency log. Three lines of code. No infrastructure to manage. agentreceipt.co