DEV Community

Deny Herianto
Deny Herianto

Posted on

Building "Niteni": Gemini Code Review Bot for GitLab with Zero Dependencies

Niteni, Javanese for "to observe carefully." That's what this tool does: it watches your merge requests and tells you what you missed.

I built Niteni to solve a simple problem: I wanted automated, inline code review comments on GitLab merge requests, powered by Google's Gemini, without pulling in half of npm. Here's how it went and the surprising number of things that bit me along the way.

The Idea

GitLab's CI/CD pipelines are powerful, but there's no built-in AI review feature available on the Free tier like GitHub Copilot Reviews. I wanted something that would:

  1. Run inside a standard GitLab CI job
  2. Post findings as inline diff comments (not a wall-of-text MR note)
  3. Provide one-click "Apply suggestion" buttons
  4. Work without any runtime dependencies beyond Node.js itself

That last point was a deliberate constraint. CI environments are ephemeral. Every npm install in a pipeline is wasted time and a potential point of failure. So Niteni uses only Node.js built-ins: https, child_process, fs, path, os, and url.

Architecture: Direct API Call

Early prototypes tried a cascading strategy — Gemini CLI /code-review extension first, then CLI direct prompt, then REST API. But the CLI approaches proved unreliable in CI: Gemini CLI restricts its tool set in non-interactive mode, so the /code-review extension can't even run git diff. No amount of Docker image changes fixes this — it's a fundamental limitation of non-TTY environments.

The solution was simpler: just call the API directly.

async review(diffContent: string): Promise<string> {
  console.log('Reviewing code changes via Gemini REST API...');
  const apiResult = await this.reviewWithAPI(diffContent);
  if (apiResult && this.isStructuredReview(apiResult)) {
    console.log('Gemini REST API review completed successfully.');
    return apiResult;
  }

  throw new Error('Review failed: API response was empty or not in the expected structured format.');
}
Enter fullscreen mode Exit fullscreen mode

A direct HTTP call to generativelanguage.googleapis.com gives us full control over the prompt, the model parameters (temperature: 0.2 for consistent output), and the response format. No CLI installation, no extension dependencies, no sandbox issues — just an API key and a network connection.

The isStructuredReview() method validates the response with a regex check for ### Summary, ### Findings, or severity markers like **CRITICAL**. This prevents malformed output from being posted as a review comment.

Gotcha #1: GitLab CI Variable Circular References

This one cost me hours of debugging. In .gitlab-ci.yml, if you do this:

niteni-code-review:
  variables:
    GEMINI_API_KEY: $GEMINI_API_KEY
    GITLAB_TOKEN: $GITLAB_TOKEN
  script:
    - niteni --mode mr
Enter fullscreen mode Exit fullscreen mode

It looks reasonable, you're just "passing through" the project-level CI/CD variables. But GitLab interprets this as a circular reference. The variable GEMINI_API_KEY expands to the literal string $GEMINI_API_KEY instead of the actual secret value.

The fix: Don't re-declare project-level CI/CD variables in the variables: section. They're already available in every job automatically. Only declare variables in the job if they're new values (like GEMINI_MODEL: gemini-3-flash-preview).

Gotcha #2: Token Authentication is a Maze

GitLab supports three authentication methods, and picking the wrong header silently fails:

Token type Header
Personal/Project access token PRIVATE-TOKEN: glpat-xxx
CI job token JOB-TOKEN: $CI_JOB_TOKEN
OAuth token Authorization: Bearer xxx

My first implementation always used PRIVATE-TOKEN. It worked locally but failed in CI because $CI_JOB_TOKEN requires the JOB-TOKEN header. The config module now auto-detects the token type:

function resolveToken() {
  const gitlabToken = env.GITLAB_TOKEN && !env.GITLAB_TOKEN.startsWith('$')
    ? env.GITLAB_TOKEN : null;
  if (gitlabToken) return { token: gitlabToken, tokenType: 'private' };
  if (env.CI_JOB_TOKEN) return { token: env.CI_JOB_TOKEN, tokenType: 'job' };
  return { token: '', tokenType: 'private' };
}
Enter fullscreen mode Exit fullscreen mode

Notice the !env.GITLAB_TOKEN.startsWith('$') guard that catches the circular reference gotcha from above. If the variable expanded to a literal $GITLAB_TOKEN string, we fall through to CI_JOB_TOKEN.

Gotcha #3: Inline Diff Comments Need diff_refs

GitLab's MR discussion API accepts a position parameter for inline comments. But the position requires three SHA values: base_sha, start_sha, and head_sha. These come from the MR's diff_refs field.

If diff_refs is null (which happens with certain merge strategies or force-pushes), the inline comment fails with a 400 error. The fallback? Post as a general discussion comment instead.

if (diffRefs) {
  try {
    await gitlab.postMergeRequestDiscussion(mrIid, body, position);
  } catch {
    // Fallback: post as general discussion without position
    await gitlab.postMergeRequestDiscussion(mrIid, body);
  }
}
Enter fullscreen mode Exit fullscreen mode

This two-tier posting strategy means the review always gets posted, even if it can't be pinned to the exact line.

Gotcha #4: Parsing LLM Output is Fragile

Gemini's output is structured markdown, but LLMs don't always follow instructions perfectly. The finding regex needs to handle variations:

// Both formats appear in practice:
// **[CRITICAL]** `file.ts:42`    (with brackets)
// **CRITICAL** `file.ts:42`      (without brackets)
const findingRegex = /\*\*\[?(CRITICAL|HIGH|MEDIUM|LOW)\]?\*\*\s*`([^`]+)`/g;
Enter fullscreen mode Exit fullscreen mode

The \[? and \]? make brackets optional. Without this, half the findings were silently dropped.

Another subtlety: the regex-based parser uses exec() in a loop, which maintains lastIndex state. When we peek ahead for the next match to determine where a finding block ends, we need to reset lastIndex afterward. Miss this, and findings get merged or skipped.

Gotcha #5: Shell Injection in CI Environments

The original code used execSync() to run git commands:

// DANGEROUS in CI where branch names come from user input
execSync(`git diff origin/${targetBranch}...HEAD`);
Enter fullscreen mode Exit fullscreen mode

If someone creates a branch named main; rm -rf /, this becomes a shell injection. In CI, branch names are attacker-controlled input.

The fix: Switch to execFileSync() with argument arrays. This calls the binary directly without shell interpretation:

execFileSync('git', ['diff', '-U5', '--merge-base', `origin/${targetBranch}`]);
Enter fullscreen mode Exit fullscreen mode

Similarly, the Gemini API key was originally passed as a URL query parameter. Moving it to the x-goog-api-key header prevents it from appearing in logs, proxy caches, and browser history.

Gotcha #6: Large Diffs and CLI Limits

An earlier version of Niteni tried passing diffs as CLI arguments to gemini -p "...". This hits the OS argument length limit (ARG_MAX) with large diffs. The workaround was writing to temp files with @filename syntax — but this added complexity with file cleanup, PID-based naming for parallel jobs, and error handling.

This was one of several reasons we moved to the REST API: HTTP request bodies have no practical size limit, and the diff is just a JSON string in the request payload. The API approach eliminated an entire class of problems.

Gotcha #7: ReDoS in Glob Pattern Matching

The diff filter converts glob patterns like *.min.js into regex. The naive approach:

// Original - vulnerable to ReDoS
pattern.replace(/[.*+?^${}()|[\]\\]/g, '\\$&').replace(/\\\*/g, '.*');
Enter fullscreen mode Exit fullscreen mode

This escapes the * first, then tries to un-escape it. But it also escapes ., +, and other regex metacharacters that appear in filenames. The order of operations matters, escape everything except glob characters, then convert glob characters:

const escaped = pattern
  .replace(/[.+^${}()|[\]\\]/g, '\\$&')  // escape regex chars (NOT *)
  .replace(/\*/g, '.*')                    // convert glob * to regex .*
  .replace(/\?/g, '.');                    // convert glob ? to regex .
Enter fullscreen mode Exit fullscreen mode

The difference is subtle but critical. The original version would double-escape patterns and produce incorrect matches.

Gotcha #8: URL Encoding Everything Twice (or Not at All)

GitLab project IDs can contain slashes when using namespaced paths like my-group/my-project. These need to be URL-encoded in API paths. But if you encode a numeric project ID like 12345, it stays the same. The code must handle both cases:

const encodedProjectId = encodeURIComponent(this.projectId);
const url = new URL(`${this.apiUrl}/projects/${encodedProjectId}${path}`);
Enter fullscreen mode Exit fullscreen mode

Every path parameter, MR IID, note ID, discussion ID, file paths, branch refs, gets encodeURIComponent(). It's tedious but necessary. A branch named feature/auth without encoding becomes a path traversal.

Testing Without External Dependencies

The test suite uses Node's built-in node:test and node:assert modules. No Jest, no Mocha, no Vitest. This keeps the dependency tree at exactly two entries: typescript and @types/node.

import { describe, it } from 'node:test';
import * as assert from 'node:assert';
Enter fullscreen mode Exit fullscreen mode

The simulation mode (niteni --mode simulate) uses hardcoded mock data that exercises the full parsing pipeline without making any API calls. It's both a demo tool and a manual integration test, you can see exactly what Niteni would post to GitLab.

What I'd Do Differently

  1. Structured output from Gemini. Instead of parsing markdown with regex, I'd use Gemini's JSON mode or function calling to get structured findings. The regex parser works but is inherently fragile.

  2. Rate limiting for large MRs. Posting 20+ inline comments in rapid succession can hit GitLab's API rate limits. A simple delay between requests would help.

  3. Caching reviewed files. If a file hasn't changed between pipeline runs, there's no need to re-review it. A SHA-based cache would cut token usage significantly.

  4. Better diff context. The current approach sends raw diffs. Sending surrounding context (the full file, or at least more lines around changes) would give Gemini better understanding of the code.

The Result

Niteni runs in about 30 seconds in CI, reviews diffs up to 100K characters, and posts findings with one-click suggestion buttons. It catches real bugs, SQL injections, missing auth middleware, hardcoded secrets, loose equality comparisons.

The zero-dependency approach paid off. Install is git clone && npm ci && npm run build. No native modules, no platform-specific binaries, no post-install scripts. It works on node:20-alpine with just git and bash added.

If you're interested in the code: github.com/denyherianto/niteni

Top comments (0)