DEV Community

yuhao li
yuhao li

Posted on

Stop Stuffing Entire Files into LLMs — I Built a Surgical Context Extractor for Python

We’ve all done this.

You’re refactoring a moderately complex function with an LLM.

You paste the function in. The model produces a confident answer.

It’s wrong.

Because it doesn’t know about:

  • a helper method in the same class
  • a type definition declared above
  • an enum imported from another module
  • a factory function wrapping everything

So you start manually expanding context:

  1. Copy the function
  2. Copy the helper
  3. Copy the imports
  4. Paste half the file
  5. Hit token limits
  6. Watch reasoning degrade

At some point it becomes clear:

The problem is not just model capability.

It’s context density.


The Core Issue: Signal vs Noise

When working on real Python codebases (Django services, FastAPI backends, layered systems), I repeatedly ran into two structural issues.

1. The Blind Spot

If you only send the active file, the model misses “one-hop” dependencies:

  • private helpers
  • internal utilities
  • type aliases
  • nearby definitions that shape logic

It sees syntax but lacks structural understanding.

2. The Noise Floor

If you send everything, reasoning quality drops:

  • irrelevant code dilutes attention
  • token budgets are wasted
  • important logic gets lost in the middle

LLMs don’t simply need more context.

They need structured and relevant context.


What I Built

To explore this, I built a VS Code extension called Python Deep-Context.

The idea is straightforward:

Extract a precise “code neighborhood” around the symbol you are working on.

Not a full file dump.

Not full-project indexing.

A constrained, structural slice.


Technical Approach

The extension runs a local Python sidecar engine that builds a context report using multiple layers.

Structural Analysis (AST + CST)

  • ast is used for fast structural parsing
  • libcst is used when structure-preserving traversal is required

This determines:

  • scope boundaries
  • symbol ownership
  • internal references

One-Hop Connectivity Mapping

Instead of recursively pulling everything, the engine:

  • detects direct symbol references
  • includes only immediate internal dependencies
  • avoids recursive explosion

This keeps the slice shallow but precise.

LSP Integration

Static parsing alone is insufficient in Python.

The engine queries the VS Code language server to resolve:

  • external symbol definitions
  • import targets
  • type ownership

Combining AST and LSP improves accuracy without building a full indexer.

Token Budget Heuristics

This is the most experimental part.

The engine attempts to fit the extracted neighborhood within a configurable token budget by:

  • prioritizing the target symbol
  • including direct dependencies first
  • preserving signatures and type hints
  • trimming overview sections before logic
  • truncating lower-impact utilities

The goal is not perfect completeness.

The goal is maximizing reasoning density per token.


Example Output

The result is a single Markdown report:

# Target: process_order()

## Upstream Callers
- api/routes.py: submit_order()

## Surgical Source
class OrderService:
    def process_order(self, order: Order):
        validated = self._validate(order)
        return self._charge(validated)

    def _validate(self, order: Order) -> Order:
        ...

## External Types
class Order(BaseModel):
    id: str
    amount: float
Enter fullscreen mode Exit fullscreen mode

Instead of pasting 800 lines of unrelated code, the model sees:

  • the target function
  • its direct logical neighbors
  • minimal external types

Nothing more.


Why Not Just Use RAG?

Embedding-based retrieval is useful, but it comes with trade-offs:

  • similarity does not guarantee structural adjacency
  • chunking can break coherence
  • token truncation often becomes arbitrary

This project explores structured static slicing as a complementary approach rather than a replacement.


Limitations

Static slicing in Python is inherently imperfect.

It can:

  • miss dynamic dispatch
  • include unnecessary utilities
  • misjudge importance

The heuristics are opinionated and still evolving.

The aim is not perfect reconstruction, but improved reasoning conditions for LLM workflows.


Feedback Welcome

This is still an early experiment.

I’m particularly interested in hearing from developers working with LLMs in real codebases:

  • Does structured context improve answer quality?
  • Is token-based trimming too aggressive?
  • How are you handling context management today?
  • Is static extraction even the right direction?

If you’d like to try it, search for Python Deep-Context in the VS Code Marketplace.

You can also open issues or share thoughts here:

https://github.com/hgliyuhao/python-deep-context/issues

Top comments (0)