What I Built
ReviewFlow is an automated Code Review pipeline driven by a custom Multi-Agent architecture written in native TypeScript. Instead of relying on a single monolithic prompt, it structures and routes incoming Pull Requests to specialized agents (Security, Logic, and Style) to perform parallel reviews. The workflow is orchestrated through GitHub Actions and posts comments via the Octokit API.
The Blind Spot: When Brilliant AI Meets Silent Failures
Getting the pipeline to trigger GitHub Actions felt like a huge win. But then, I noticed a fatal issue: the comments were disappearing.
The logs showed the LLM was generating brilliant security and logic feedback, but on the actual Pull Request page, nothing was posted. After digging into the Octokit error logs, I found a subtle reliability bug: Coordinate Hallucination.
Even when the model correctly identified a vulnerability, it couldn't reliably anchor that feedback to a valid line in the Git Diff. My system had a strict VerifierNode designed to block invalid API requests. When it saw these hallucinated out-of-bound coordinates, Octokit threw a hard 422 Unprocessable Entity error: Validation Failed: {"resource":"PullRequestReviewComment","code":"invalid","field":"line"}
It silently dropped the comments.
A single hallucinated number was wiping out the entire review pipeline. During initial testing across 5 dummy Pull Requests, the LLM generated 14 highly valuable security and architectural suggestions. However, because of coordinate hallucination, 9 of them were completely dropped by the verifier node due to invalid line ranges. That's a 64% silent failure rate—brilliant engineering ideas lost in the ether simply because the AI couldn't read the Git Diff index accurately.

The Fix: Deterministic Guardrails
At the boundary of this layer, I enforce a hard check to strip away any out-of-bounds coordinates or malformed responses before they hit the GitHub API:
// 传入的必须是单个文件的 patch,避免多文件行号冲突
function parseDiffToValidLines(filePatch: string): Set<number> {
const validLines = new Set<number>();
// 1. 兼容 Git Diff 协议:当改动仅有 1 行时,',1' 会被省略
// 正则修正:将前后两部分的 ',count' 均设为可选捕获组 (?:,...)
const hunkHeader = /@@ -\d+(?:,\d+)? \+(\d+)(?:,(\d+))? @@/g;
let match;
while ((match = hunkHeader.exec(filePatch)) !== null) {
const start = parseInt(match[1], 10);
// 如果没有匹配到第二组 (count),说明改动仅有 1 行,默认值为 1
const count = match[2] ? parseInt(match[2], 10) : 1;
// 2. 设计抉择 (Design Choice):为什么只提取 '+' 后面的行号?
// 因为 GitHub PR 评论 API 必须锚定于新文件 (Right Side) 的有效行。
// 该循环完整提取了当前 block 中出现的所有上下文行与新增行。
for (let i = start; i < start + count; i++) {
validLines.add(i);
}
}
return validLines;
}
(If the diff is empty or malformed, the regex match fails gracefully, returning an empty set and blocking the comment pipeline entirely.)
My Experience with GitHub Copilot
While the core filtering mechanism is straightforward, parsing raw unified diffs into a flawless line-by-line index map is highly intricate. I outlined the complex parsing logic and edge cases in architectural comments, and GitHub Copilot rapidly generated the core parsing function and comprehensive unit tests.
The Result: The agents now have a hard constraint—choose a line number strictly from this pre-verified list, or drop the comment. By offloading the structural verification to Copilot, I was able to inject deterministic reliability into a non-deterministic LLM pipeline. The silent failures stopped, and the agents finally pinned their feedback to the exact lines of code.
The Takeaway: In an LLM pipeline, non-deterministic components need deterministic guardrails.
The Impact (Before vs. After)
- Before (across 5 test PRs): 14 suggestions generated ➔ 5 posted (64% lost due to hallucination)
- After: 14 suggestions generated ➔ 14 posted (0% lost, 100% reliable)
Have you hit silent failures or coordinate hallucinations in your own LLM pipelines? Drop a comment below—I'd love to hear how you handled it in your own projects!
Demo


Top comments (1)
Have you encountered similar hallucination issues in your LLM pipelines?