DEV Community

Annavarapu Vinod Kumar
Annavarapu Vinod Kumar

Posted on

How Hindsight Changed Refyn From Tool to Workspace

The Problem

Building Refyn meant holding four things in tension at once.

I wanted reviews to come back fast. I wanted them to be cheap — a five-line utility function shouldn't cost as much to analyze as a security-sensitive auth flow. I wanted the system to stay up even when a provider breaks, rate-limits, or quietly swaps out a model. And I wanted it to actually remember you, so the reviewer improves across sessions instead of starting fresh every time.

Those four requirements shaped almost every decision in the architecture. Speed and depth pull against each other. Reliability means fallbacks, which adds complexity. Memory means threading user context through the backend without making the whole thing fragile.

The final shape of the app came from working through those tensions honestly, not pretending they weren't there.


What Refyn Is

Refyn is a browser-based code review workspace. You write or paste code into a Monaco editor, send it to the backend, and get back a structured review: a score, a list of issues, optimized code, runtime stats, and patterns the system has learned about how you write.

Behind that simple flow, the backend is doing quite a bit. The review request flows through a router that loads your memory, builds prompt context from past sessions, scores the code's complexity, picks a model, runs the analysis, and enriches the response with cost, latency, and savings data before sending it back.

Code execution is a separate path with its own fallback chain — three different runners, tried in order.

The distinguishing feature is that Refyn isn't just "frontend calls a model." It's a layered system: UI, routing, memory, execution, and model orchestration each doing their own job.


The Frontend

The frontend is React, Vite, Monaco Editor, Tailwind, and Framer Motion. But the important part is how state flows from the backend into what you see.

When you click Analyze, the app sends your code, the language, routing settings, and a userId stored in local storage. That userId is what lets the backend attach memory to future reviews.

The response gets split into three visible surfaces. A stats bar shows which model was used, the complexity score, cost, savings, and latency — routing decisions made visible rather than hidden. A memory panel shows patterns the system has noticed across your past sessions. And the top navigation separately fetches your remembered pattern count on load, so it survives a page refresh.

The frontend is mostly a projection of backend state. That's intentional. The interesting logic lives deeper.


The Routing Layer

Before any model call happens, one question gets answered: how much model does this code actually need?

A scoring function produces a complexity value from 0 to 100 using a small set of readable heuristics. Length is the first signal — longer files score higher. Then it scans for security-sensitive patterns: eval, exec, subprocess, SQL, passwords, tokens, JWT, authentication-related strings. Each match adds points, so short code can still escalate if it touches risky areas. Smaller increments get added for structural complexity: async functions, classes, recursion hints, import density.

That score maps to tiers. High complexity gets a heavier model for deeper analysis. Mid-range gets a faster, cheaper model. Simple code takes the cheapest available path.

The same layer calculates cost and savings by comparing the actual route against a premium baseline. That gives Refyn something most AI tools don't surface: a concrete accounting of what each review cost and what it saved.


The Memory Layer

The memory system has a clean two-phase design.

Before analysis, it recalls past patterns for your userId using the Hindsight SDK. Those memories get turned into a prompt block that tells the model about your recurring habits before it reads a single line of your code.

After analysis, it extracts patterns from the issues and score, converts them into plain-text summaries, and saves them back. The extraction is intentionally simple — it groups issues by category and records things like recurring security issues in a given language, plus a score snapshot for that session.

In the UI, the memory panel shows those insights. The nav shows how many patterns the system has accumulated.

Architecturally, Hindsight isn't a separate feature bolted on. It sits directly in the request path: recall before analysis, retain after, display on next render.


The Execution Layer

Code execution runs through a three-step fallback chain.

The first hop is Piston, a free hosted runner that supports many languages and requires no API key. If that fails, it falls back to Judge0. If that fails, it tries to run locally — Python and JavaScript only, but enough to keep things functional when hosted runners are down.

This reflects the same design philosophy as the rest of Refyn: use the simplest cheap path first, but keep progressively more resilient layers underneath.


The Model Layer

The model orchestration layer is where routing, memory, and provider fallbacks come together.

In auto mode, it loads your memory, builds prompt context, scores the code's complexity, and picks a model. If that model fails, it walks a fallback order through every available provider. The concrete model services are split out cleanly — Groq handles two models, OpenRouter tries a chain of free-tier options, and a local Ollama path provides an offline fallback using a coding-focused model.

After the analysis comes back, a final enrichment step adds everything the frontend needs: which model was used, latency, cost, savings, complexity score, the routing reason, and the memory data.

That enrichment step is the real center of the system. Everything else either feeds into it or displays what it returns.


What I Took Away

Resilient AI architecture on free or low-cost infrastructure is mostly about composition, not picking the best single model.

A strong model is useful. A layered system is what actually holds up under real usage. The routing layer keeps costs sane. The memory layer gives the reviewer continuity. The execution layer keeps code running even when hosted runners fail. The model layer stitches it all together and keeps failures from becoming outages.

The main lesson: if you want an AI system to feel dependable, you have to design for provider churn, partial failures, and stateful user context from the beginning — not as an afterthought.

Top comments (0)