<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: White Fang</title>
    <description>The latest articles on DEV Community by White Fang (@numbers_white_fang).</description>
    <link>https://dev.to/numbers_white_fang</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3958294%2F1ca9c603-dbe5-4bcc-8f8a-c0c5a19fa7b9.jpg</url>
      <title>DEV Community: White Fang</title>
      <link>https://dev.to/numbers_white_fang</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/numbers_white_fang"/>
    <language>en</language>
    <item>
      <title>We Built PR Auto-Review as an Auditable AI Agent, Not a Faster Code Reviewer</title>
      <dc:creator>White Fang</dc:creator>
      <pubDate>Mon, 01 Jun 2026 14:51:42 +0000</pubDate>
      <link>https://dev.to/numbers_white_fang/we-built-pr-auto-review-as-an-auditable-ai-agent-not-a-faster-code-reviewer-egc</link>
      <guid>https://dev.to/numbers_white_fang/we-built-pr-auto-review-as-an-auditable-ai-agent-not-a-faster-code-reviewer-egc</guid>
      <description>&lt;p&gt;Most AI code review demos focus on speed.&lt;/p&gt;

&lt;p&gt;How fast can the agent read a pull request? How many comments can it generate? How many small issues can it catch before a human reviewer opens the diff?&lt;/p&gt;

&lt;p&gt;Those questions matter. But for enterprise engineering teams, they are not the hardest questions.&lt;/p&gt;

&lt;p&gt;The harder question is this:&lt;/p&gt;

&lt;p&gt;If an AI agent reviews code and recommends an action, can the team reconstruct what it saw, why it made that recommendation, and when it decided to hand control back to a human?&lt;/p&gt;

&lt;p&gt;That is the part we have been working on at Numbers Protocol / Omni AI.&lt;/p&gt;

&lt;p&gt;Our first concrete demo is PR Auto-Review. The agent reads a pull request, evaluates the change, records a verdict, and stores the reasoning in an append-only audit trail. In our current internal test suite, the reviewer records 13 verdict types. The 7 tests in the merger module all pass in 0.38 seconds locally.&lt;/p&gt;

&lt;p&gt;Those are small numbers. That is intentional.&lt;/p&gt;

&lt;p&gt;The point is not to claim that the agent has solved code review. The point is to show the shape of an auditable AI workflow.&lt;/p&gt;

&lt;p&gt;Why Logging Is Becoming a Product Requirement&lt;br&gt;
This is not only a software architecture preference.&lt;/p&gt;

&lt;p&gt;The EU AI Act already treats logging and traceability as part of the control design for high-risk AI systems. The European Commission describes high-risk systems as subject to obligations including logging of activity, technical documentation, human oversight, robustness, cybersecurity, and accuracy. The AI Act Service Desk's Article 12 summary says high-risk AI systems must technically allow automatic recording of events over the lifetime of the system, and that those logs support traceability, post-market monitoring, and oversight.&lt;/p&gt;

&lt;p&gt;The timeline is also worth treating carefully. The AI Act becomes broadly applicable on 2 August 2026, while later high-risk AI implementation dates depend on system category and the EU's simplification package. I would not use that date as a generic panic deadline for every product. But I would use it as a practical signal: auditability is moving from a "nice to have" demo feature into the vocabulary of procurement, legal, and engineering review.&lt;/p&gt;

&lt;p&gt;That matters for developers.&lt;/p&gt;

&lt;p&gt;If an agent takes action inside a workflow, the record behind that action has to be designed before the workflow becomes business-critical. It is much harder to bolt on an audit trail after teams already depend on the automation.&lt;/p&gt;

&lt;p&gt;The Review Is Only Half the Product&lt;br&gt;
When an AI reviewer comments on a pull request, the visible output is the comment.&lt;/p&gt;

&lt;p&gt;But in a business workflow, the visible comment is only half the product. The other half is the record behind it:&lt;/p&gt;

&lt;p&gt;What files changed?&lt;br&gt;
What risk did the agent identify?&lt;br&gt;
Which rule or policy did it apply?&lt;br&gt;
Did it recommend merge, block, request changes, or human review?&lt;br&gt;
Was the final action automated or handed back to a person?&lt;br&gt;
Without that record, the team can still move quickly, but it cannot explain itself later.&lt;/p&gt;

&lt;p&gt;That becomes a problem the moment AI agents move from experiments into operational systems. Engineering managers need to know why a change was approved. Security teams need to know why a risk was ignored or escalated. Compliance teams need evidence that automation is not bypassing the normal control process.&lt;/p&gt;

&lt;p&gt;In other words, the value is not only automation.&lt;/p&gt;

&lt;p&gt;The value is automation that can be inspected.&lt;/p&gt;

&lt;p&gt;Human Handoff Is a Feature&lt;br&gt;
One of the easiest traps in agent design is to over-optimize for autonomy.&lt;/p&gt;

&lt;p&gt;It is tempting to describe every workflow as if the best agent is the one that never stops. In real engineering systems, that is rarely true.&lt;/p&gt;

&lt;p&gt;Sometimes the best decision an agent can make is to stop and ask for a human.&lt;/p&gt;

&lt;p&gt;In our PR Auto-Review demo, that means the agent can block auto-merge when the risk is unclear, when tests are missing, or when the change touches sensitive areas. The handoff is not a failure of the agent. It is part of the control design.&lt;/p&gt;

&lt;p&gt;A good enterprise AI agent should not only know how to act. It should know when not to act.&lt;/p&gt;

&lt;p&gt;What We Learned&lt;br&gt;
At first, we assumed that detailed audit logging would slow the workflow down.&lt;/p&gt;

&lt;p&gt;That felt intuitive. More records, more structure, more metadata. Surely that creates overhead.&lt;/p&gt;

&lt;p&gt;But after building the first version, the more interesting effect was different. The record reduced repeated discussion.&lt;/p&gt;

&lt;p&gt;When a pull request was flagged, the team could inspect the reason instead of reconstructing the context from memory. When a decision needed to be reviewed later, the verdict and evidence were already there.&lt;/p&gt;

&lt;p&gt;The audit trail did not make the agent faster in the narrow sense. It made the next human decision easier.&lt;/p&gt;

&lt;p&gt;That distinction matters.&lt;/p&gt;

&lt;p&gt;Why This Matters for AI Agents&lt;br&gt;
AI agents are moving from chat interfaces into operational systems: code review, support triage, sales operations, finance workflows, and internal approvals.&lt;/p&gt;

&lt;p&gt;Once agents touch those workflows, the product question changes.&lt;/p&gt;

&lt;p&gt;It is no longer enough to ask:&lt;/p&gt;

&lt;p&gt;Can the model produce a useful answer?&lt;/p&gt;

&lt;p&gt;We also have to ask:&lt;/p&gt;

&lt;p&gt;Can the organization explain what happened after the agent took action?&lt;/p&gt;

&lt;p&gt;That is the problem space we call TAEA: Transparent, Auditable, Explainable AI.&lt;/p&gt;

&lt;p&gt;For us, TAEA is not a slogan. It is a design constraint:&lt;/p&gt;

&lt;p&gt;The agent must leave a decision record.&lt;br&gt;
The system must preserve the evidence behind that decision.&lt;br&gt;
The workflow must support human handoff.&lt;br&gt;
The team must be able to inspect the process later.&lt;br&gt;
The PR Auto-Review demo is a small starting point, but it gives us a concrete surface to test these ideas.&lt;/p&gt;

&lt;p&gt;What We Are Looking For&lt;br&gt;
We are exploring how this framing resonates with teams that are already experimenting with AI agents in engineering workflows.&lt;/p&gt;

&lt;p&gt;If your team has tried AI code review, CI automation, or internal agents, I would be interested in one question:&lt;/p&gt;

&lt;p&gt;Which agent decisions do you actually need logged?&lt;/p&gt;

&lt;p&gt;Not every decision needs a heavy audit trail. Some actions are low risk. Some records are noise. Some workflows should stay lightweight.&lt;/p&gt;

&lt;p&gt;The useful line is the one we want to find.&lt;/p&gt;

&lt;p&gt;If you have seen this problem in practice, I would appreciate your perspective.&lt;/p&gt;

&lt;p&gt;Links&lt;br&gt;
Numbers Protocol: &lt;a href="https://www.numbersprotocol.io/" rel="noopener noreferrer"&gt;https://www.numbersprotocol.io/&lt;/a&gt;&lt;br&gt;
Omni / TAEA context: &lt;a href="https://www.numbersprotocol.io/solutions/auditable-ai" rel="noopener noreferrer"&gt;https://www.numbersprotocol.io/solutions/auditable-ai&lt;/a&gt;&lt;br&gt;
European Commission — AI Act overview: &lt;a href="https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai" rel="noopener noreferrer"&gt;https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai&lt;/a&gt;&lt;br&gt;
AI Act Service Desk — Article 12 record-keeping: &lt;a href="https://ai-act-service-desk.ec.europa.eu/en/ai-act/article-12" rel="noopener noreferrer"&gt;https://ai-act-service-desk.ec.europa.eu/en/ai-act/article-12&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devtools</category>
      <category>ai</category>
      <category>governance</category>
      <category>softwareengineering</category>
    </item>
  </channel>
</rss>
