Nova Elvaris

Posted on Mar 29

Stop Feeding Your Entire Codebase to AI: A File Selection Strategy

#ai #productivity #programming #beginners

"Just paste the whole repo."

I see this advice everywhere. And every time, the AI returns garbage — hallucinated imports, wrong function signatures, confused architecture. Not because the model is bad, but because you drowned it in context it doesn't need.

Here's how I select files for AI context, and why less really is more.

Why "Everything" Doesn't Work

Modern models advertise 128K or even 1M token context windows. That doesn't mean they use all of it equally. Research consistently shows that LLMs have a middle attention gap — they pay most attention to the beginning and end of context, and lose focus in the middle.

When you paste 50 files, the model:

Loses track of which function belongs to which file
Confuses similar-looking patterns across different modules
Hallucinates imports from the wrong file
Generates code that's technically valid but architecturally wrong

The 3-Ring Strategy

I think of file selection like concentric rings around my current task:

Ring 1: The Target (always include)

The file you're modifying. The function you're debugging. The component you're extending. This is non-negotiable context.

Ring 2: Direct Dependencies (include selectively)

Files that the target imports or that import the target. But only the relevant parts. If your target calls getUserById from user-service.ts, include that function signature — not the entire 500-line service.

## Context: user-service.ts (relevant excerpt)
async function getUserById(id: string): Promise<User | null>

Ring 3: Structural Context (include minimally)

Project-level files that establish patterns: tsconfig.json, a sample test file, maybe your ORM schema. These help the model match your project's conventions without overwhelming it.

Never include: node_modules contents, lock files, build output, unrelated features, or entire directories "just in case."

Practical File Selection Checklist

Before pasting any file into context, ask:

Does the model need this to complete the task? If no, don't include it.
Can I include an excerpt instead of the full file? If yes, extract just the relevant function/type/interface.
Is this file establishing a pattern? One example file beats five — the model generalizes from examples fast.
Would a summary work instead of the raw file? For complex modules, a 5-line description of the API surface beats 200 lines of implementation.

Before and After

The "Everything" Approach

Here's my project. Fix the authentication bug.
[pastes 15 files, 4000 lines]

Result: The model "fixes" auth by rewriting the middleware to match patterns from an unrelated payment module it found in the context.

The Focused Approach

## Task
Fix: login returns 401 even with valid credentials.

## Target: auth-middleware.ts
[full file, 45 lines]

## Dependency: token-service.ts (relevant function)
function verifyToken(token: string): Promise<TokenPayload>

## Pattern reference: error-handler.ts (line 10-25)
[standard error response format]

## Test that should pass: auth.test.ts (line 30-45)
[the failing test]

Result: Targeted fix that matches the project's error handling pattern, uses the correct token service API, and passes the test.

The Numbers

From my own tracking over the past few months:

Metric	Full context	Selective context
Avg tokens sent	~12,000	~2,500
First-try success rate	~40%	~75%
Time editing AI output	~15 min	~3 min
API cost per task	~$0.05	~$0.01

Less context. Better results. Lower cost. It's not even close.

Quick Reference: What to Include by Task Type

Bug fix: Target file + failing test + error output + relevant dependency signatures

New feature: Target file + similar existing feature (as pattern) + relevant types/interfaces

Refactor: Target file + files that import it (signatures only) + target test file

Code review: The diff + relevant context files for changed functions

One More Thing

If you find yourself needing more than 5 files in context, that's a signal to break the task into smaller steps. Ask the model to do one thing well, verify it, then move to the next.

Sequential small prompts with tight context beat one massive prompt with everything.

What's your file selection strategy? Or are you still in the "paste everything" camp? No judgment — I was there six months ago.

DEV Community