Tenure — Building an AI Code Reviewer That Earns Trust Over Time

#ai #api #llm #programming

The Problem
Every team has unwritten rules.
"We don't use inline comments." "Early returns only." "No console.log in production."
These rules live in senior developers' heads. When they leave — the rules leave too.
Existing AI reviewers don't help. They are stateless. Every PR starts from zero. They flag things your team already decided to ignore. They never learn.
The result? Noisy reviews that developers learn to dismiss. An AI that cries wolf gets ignored entirely.
The Idea: Tenure
What if a code reviewer could watch how your team responds to its suggestions — and update its own standards accordingly?
That is Tenure. An AI code reviewer that earns trust over time, powered by Hindsight (persistent memory) and NVIDIA Nemotron 3 Super (the review engine).
It starts by suggesting. As your team accepts or rejects comments, it builds evidence. Enough evidence in one direction — a convention graduates:
Suggest → Warn → Block
Reject something 3 times? Tenure goes quiet on that category. No config file. No rule editor. Just observed behavior.
Other agents remember. Tenure earns trust.
How the Memory Works
Tenure uses one Hindsight bank with two types of memory:
World Facts — your team's explicit preferences. Example: "we use early returns", "no magic numbers".
Experience Facts — Tenure's own learned outcomes. Example: "my console.log suggestions keep getting rejected", "my naming suggestions get accepted consistently".
These facts are distilled into observations — weighted conventions that track proof count and freshness. Each convention lives in one of three trust tiers:
Suggest (proof count 1–2): Shows as a soft hint
Warn (proof count 3–4): Highlighted, harder to ignore
Block (proof count 5+ stable): Gates the merge
Tiers promote on strengthening trends and demote when contradicted or weakening.
The Three Hero Moments
(A) Self-suppression
After 3 rejections of a suggestion category, Tenure stops flagging it entirely. It learned the team does not want it. No config change required.
(B) Contradiction handling
When team behavior contradicts a prior observation, the memory revises. It does not average with the old belief. It updates.
(C) Graduation to BLOCK
A convention that started as a soft suggestion earns merge-blocking status purely through proof count. The team's consistency is the authorization.
Architecture and Design
We built two modes — this was our entire cost and reliability strategy.
REPLAY mode (default, public): Plays a pre-recorded real session from static JSON fixtures. Zero API calls. Zero cost. Fully deterministic. Safe for judges and public traffic.
LIVE mode (passphrase-gated): Real NVIDIA Nemotron 3 Super and Hindsight calls, rate-limited via Upstash Redis. Proves the system is genuine when needed.
Total hackathon spend: under $3. Public traffic cost: $0.
The UI is a two-panel layout. Left panel shows the diff with inline comments — each has a severity level, a reference to the convention it enforces, and a color-coded trust-tier badge. Right panel shows Team Memory — observations with proof counts, trust tiers, and a self-writing style guide that reflects what the team has taught the system.
When a convention crosses a tier threshold, an animation fires. An unmissable "Promoted to BLOCK — now gates merges" callout appears in the Trust Tiers panel. This is the visual centerpiece of the demo.
Technologies Used
Next.js (App Router, TypeScript) — frontend and backend, Vercel-deployable
NVIDIA Nemotron 3 Super via OpenRouter — the review engine, OpenAI-compatible API, structured JSON output
Hindsight Cloud — persistent memory with World Facts, Experience Facts, and observation consolidation
Upstash Redis — serverless-safe rate limiting via REST API
Vercel — deployment
Tailwind CSS — styling
Challenges Encountered
Hindsight consolidation is async. After calling retain(), observations do not update instantly. Consolidation takes 10 to 30 seconds between steps. Our seeding script had to wait with bounded waitForProcessing() calls. Racing this caused the three hero moments to not land cleanly in fixtures, requiring multiple re-runs to get clean demo data.
Serverless rate limiting. Vercel resets in-memory state on every invocation. We initially tried in-memory counters — they always reset to zero. Switching to Upstash Redis solved it but required rethinking how we stored and retrieved rate limit state across invocations.
NVIDIA Nemotron 3 Super output parsing. It sometimes returned JSON wrapped in markdown fences or with extra preamble text. We built safe parsing that strips fences and falls back gracefully instead of crashing the review pipeline.
Keeping REPLAY and LIVE in sync. The UI needed to work identically in both modes from the same components, with only the data source changing. Keeping that abstraction clean as we added features took discipline.
How It Differs from Claude Code
Claude Code is a stateless coding agent — brilliant at generating and editing code in the moment, but it does not remember your team's decisions across sessions.
Tenure is the persistent team-memory layer that sits above any reviewer. It knows what your team has accepted and rejected, and it enforces that knowledge over time. The two are complementary, not competing.
Future Scope
GitHub App integration — comment directly on pull requests inside the existing review workflow
Per-repo convention banks — separate Hindsight banks per repository so frontend and backend teams build independent standards
Team onboarding mode — seed Tenure's memory from existing merged PR history so it starts informed instead of blank
Convention conflict resolution — when two teammates consistently disagree on a convention, surface the conflict explicitly instead of letting proof counts cancel out
Analytics dashboard — show which conventions are trending toward BLOCK, which are suppressed, and which are contested
Try It
Live demo:https://tenure-green.vercel.app
GitHub: https://github.com/InsaneCoder-69/tenure.git
Built at HackBaroda by Vatsal Trivedi, Vraj Patel and Rutul Patel. Drop a comment — we read everything.

DEV Community

Tenure — Building an AI Code Reviewer That Earns Trust Over Time

Top comments (0)