yuhao li

Posted on Feb 22

Stop Stuffing Entire Files into LLMs — I Built a Surgical Context Extractor for Python

#llm #python #showdev #tooling

We’ve all done this.

You’re refactoring a moderately complex function with an LLM.

You paste the function in. The model produces a confident answer.

It’s wrong.

Because it doesn’t know about:

a helper method in the same class
a type definition declared above
an enum imported from another module
a factory function wrapping everything

So you start manually expanding context:

Copy the function
Copy the helper
Copy the imports
Paste half the file
Hit token limits
Watch reasoning degrade

At some point it becomes clear:

The problem is not just model capability.

It’s context density.

The Core Issue: Signal vs Noise

When working on real Python codebases (Django services, FastAPI backends, layered systems), I repeatedly ran into two structural issues.

1. The Blind Spot

If you only send the active file, the model misses “one-hop” dependencies:

private helpers
internal utilities
type aliases
nearby definitions that shape logic

It sees syntax but lacks structural understanding.

2. The Noise Floor

If you send everything, reasoning quality drops:

irrelevant code dilutes attention
token budgets are wasted
important logic gets lost in the middle

LLMs don’t simply need more context.

They need structured and relevant context.

What I Built

To explore this, I built a VS Code extension called Python Deep-Context.

The idea is straightforward:

Extract a precise “code neighborhood” around the symbol you are working on.

Not a full file dump.

Not full-project indexing.

A constrained, structural slice.

Technical Approach

The extension runs a local Python sidecar engine that builds a context report using multiple layers.

Structural Analysis (AST + CST)

ast is used for fast structural parsing
libcst is used when structure-preserving traversal is required

This determines:

scope boundaries
symbol ownership
internal references

One-Hop Connectivity Mapping

Instead of recursively pulling everything, the engine:

detects direct symbol references
includes only immediate internal dependencies
avoids recursive explosion

This keeps the slice shallow but precise.

LSP Integration

Static parsing alone is insufficient in Python.

The engine queries the VS Code language server to resolve:

external symbol definitions
import targets
type ownership

Combining AST and LSP improves accuracy without building a full indexer.

Token Budget Heuristics

This is the most experimental part.

The engine attempts to fit the extracted neighborhood within a configurable token budget by:

prioritizing the target symbol
including direct dependencies first
preserving signatures and type hints
trimming overview sections before logic
truncating lower-impact utilities

The goal is not perfect completeness.

The goal is maximizing reasoning density per token.

Example Output

The result is a single Markdown report:

# Target: process_order()

## Upstream Callers
- api/routes.py: submit_order()

## Surgical Source
class OrderService:
    def process_order(self, order: Order):
        validated = self._validate(order)
        return self._charge(validated)

    def _validate(self, order: Order) -> Order:
        ...

## External Types
class Order(BaseModel):
    id: str
    amount: float

Instead of pasting 800 lines of unrelated code, the model sees:

the target function
its direct logical neighbors
minimal external types

Nothing more.

Why Not Just Use RAG?

Embedding-based retrieval is useful, but it comes with trade-offs:

similarity does not guarantee structural adjacency
chunking can break coherence
token truncation often becomes arbitrary

This project explores structured static slicing as a complementary approach rather than a replacement.

Limitations

Static slicing in Python is inherently imperfect.

It can:

miss dynamic dispatch
include unnecessary utilities
misjudge importance

The heuristics are opinionated and still evolving.

The aim is not perfect reconstruction, but improved reasoning conditions for LLM workflows.

Feedback Welcome

This is still an early experiment.

I’m particularly interested in hearing from developers working with LLMs in real codebases:

Does structured context improve answer quality?
Is token-based trimming too aggressive?
How are you handling context management today?
Is static extraction even the right direction?

If you’d like to try it, search for Python Deep-Context in the VS Code Marketplace.

You can also open issues or share thoughts here:

https://github.com/hgliyuhao/python-deep-context/issues

DEV Community