<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: yuhao li</title>
    <description>The latest articles on DEV Community by yuhao li (@yuhao_li_hg).</description>
    <link>https://dev.to/yuhao_li_hg</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3784506%2F109a53f1-3b65-4f62-b430-dbdf24cabbd4.png</url>
      <title>DEV Community: yuhao li</title>
      <link>https://dev.to/yuhao_li_hg</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yuhao_li_hg"/>
    <language>en</language>
    <item>
      <title>Stop Stuffing Entire Files into LLMs — I Built a Surgical Context Extractor for Python</title>
      <dc:creator>yuhao li</dc:creator>
      <pubDate>Sun, 22 Feb 2026 02:24:02 +0000</pubDate>
      <link>https://dev.to/yuhao_li_hg/stop-stuffing-entire-files-into-llms-i-built-a-surgical-context-extractor-for-python-26e6</link>
      <guid>https://dev.to/yuhao_li_hg/stop-stuffing-entire-files-into-llms-i-built-a-surgical-context-extractor-for-python-26e6</guid>
      <description>&lt;p&gt;We’ve all done this.&lt;/p&gt;

&lt;p&gt;You’re refactoring a moderately complex function with an LLM.&lt;br&gt;&lt;br&gt;
You paste the function in. The model produces a confident answer.&lt;/p&gt;

&lt;p&gt;It’s wrong.&lt;/p&gt;

&lt;p&gt;Because it doesn’t know about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a helper method in the same class&lt;/li&gt;
&lt;li&gt;a type definition declared above&lt;/li&gt;
&lt;li&gt;an enum imported from another module&lt;/li&gt;
&lt;li&gt;a factory function wrapping everything&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So you start manually expanding context:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Copy the function&lt;/li&gt;
&lt;li&gt;Copy the helper&lt;/li&gt;
&lt;li&gt;Copy the imports&lt;/li&gt;
&lt;li&gt;Paste half the file&lt;/li&gt;
&lt;li&gt;Hit token limits&lt;/li&gt;
&lt;li&gt;Watch reasoning degrade&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At some point it becomes clear:&lt;/p&gt;

&lt;p&gt;The problem is not just model capability.&lt;br&gt;&lt;br&gt;
It’s context density.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Core Issue: Signal vs Noise
&lt;/h2&gt;

&lt;p&gt;When working on real Python codebases (Django services, FastAPI backends, layered systems), I repeatedly ran into two structural issues.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. The Blind Spot
&lt;/h3&gt;

&lt;p&gt;If you only send the active file, the model misses “one-hop” dependencies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;private helpers&lt;/li&gt;
&lt;li&gt;internal utilities&lt;/li&gt;
&lt;li&gt;type aliases&lt;/li&gt;
&lt;li&gt;nearby definitions that shape logic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It sees syntax but lacks structural understanding.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. The Noise Floor
&lt;/h3&gt;

&lt;p&gt;If you send everything, reasoning quality drops:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;irrelevant code dilutes attention&lt;/li&gt;
&lt;li&gt;token budgets are wasted&lt;/li&gt;
&lt;li&gt;important logic gets lost in the middle&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;LLMs don’t simply need more context.&lt;br&gt;&lt;br&gt;
They need structured and relevant context.&lt;/p&gt;


&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;To explore this, I built a VS Code extension called &lt;strong&gt;Python Deep-Context&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The idea is straightforward:&lt;/p&gt;

&lt;p&gt;Extract a precise “code neighborhood” around the symbol you are working on.&lt;/p&gt;

&lt;p&gt;Not a full file dump.&lt;br&gt;&lt;br&gt;
Not full-project indexing.&lt;br&gt;&lt;br&gt;
A constrained, structural slice.&lt;/p&gt;


&lt;h2&gt;
  
  
  Technical Approach
&lt;/h2&gt;

&lt;p&gt;The extension runs a local Python sidecar engine that builds a context report using multiple layers.&lt;/p&gt;
&lt;h3&gt;
  
  
  Structural Analysis (AST + CST)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ast&lt;/code&gt; is used for fast structural parsing&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;libcst&lt;/code&gt; is used when structure-preserving traversal is required&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This determines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scope boundaries&lt;/li&gt;
&lt;li&gt;symbol ownership&lt;/li&gt;
&lt;li&gt;internal references&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  One-Hop Connectivity Mapping
&lt;/h3&gt;

&lt;p&gt;Instead of recursively pulling everything, the engine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;detects direct symbol references&lt;/li&gt;
&lt;li&gt;includes only immediate internal dependencies&lt;/li&gt;
&lt;li&gt;avoids recursive explosion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This keeps the slice shallow but precise.&lt;/p&gt;
&lt;h3&gt;
  
  
  LSP Integration
&lt;/h3&gt;

&lt;p&gt;Static parsing alone is insufficient in Python.&lt;/p&gt;

&lt;p&gt;The engine queries the VS Code language server to resolve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;external symbol definitions&lt;/li&gt;
&lt;li&gt;import targets&lt;/li&gt;
&lt;li&gt;type ownership&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Combining AST and LSP improves accuracy without building a full indexer.&lt;/p&gt;
&lt;h3&gt;
  
  
  Token Budget Heuristics
&lt;/h3&gt;

&lt;p&gt;This is the most experimental part.&lt;/p&gt;

&lt;p&gt;The engine attempts to fit the extracted neighborhood within a configurable token budget by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;prioritizing the target symbol&lt;/li&gt;
&lt;li&gt;including direct dependencies first&lt;/li&gt;
&lt;li&gt;preserving signatures and type hints&lt;/li&gt;
&lt;li&gt;trimming overview sections before logic&lt;/li&gt;
&lt;li&gt;truncating lower-impact utilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is not perfect completeness.&lt;/p&gt;

&lt;p&gt;The goal is maximizing reasoning density per token.&lt;/p&gt;


&lt;h2&gt;
  
  
  Example Output
&lt;/h2&gt;

&lt;p&gt;The result is a single Markdown report:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Target: process_order()&lt;/span&gt;

&lt;span class="gu"&gt;## Upstream Callers&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; api/routes.py: submit_order()

&lt;span class="gu"&gt;## Surgical Source&lt;/span&gt;
class OrderService:
    def process_order(self, order: Order):
        validated = self._validate(order)
        return self._charge(validated)&lt;span class="sb"&gt;

    def _validate(self, order: Order) -&amp;gt; Order:
        ...

&lt;/span&gt;&lt;span class="gu"&gt;## External Types&lt;/span&gt;
class Order(BaseModel):
    id: str
    amount: float
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of pasting 800 lines of unrelated code, the model sees:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the target function
&lt;/li&gt;
&lt;li&gt;its direct logical neighbors
&lt;/li&gt;
&lt;li&gt;minimal external types
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nothing more.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Not Just Use RAG?
&lt;/h2&gt;

&lt;p&gt;Embedding-based retrieval is useful, but it comes with trade-offs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;similarity does not guarantee structural adjacency
&lt;/li&gt;
&lt;li&gt;chunking can break coherence
&lt;/li&gt;
&lt;li&gt;token truncation often becomes arbitrary
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This project explores structured static slicing as a complementary approach rather than a replacement.&lt;/p&gt;




&lt;h2&gt;
  
  
  Limitations
&lt;/h2&gt;

&lt;p&gt;Static slicing in Python is inherently imperfect.&lt;/p&gt;

&lt;p&gt;It can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;miss dynamic dispatch
&lt;/li&gt;
&lt;li&gt;include unnecessary utilities
&lt;/li&gt;
&lt;li&gt;misjudge importance
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The heuristics are opinionated and still evolving.&lt;/p&gt;

&lt;p&gt;The aim is not perfect reconstruction, but improved reasoning conditions for LLM workflows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Feedback Welcome
&lt;/h2&gt;

&lt;p&gt;This is still an early experiment.&lt;/p&gt;

&lt;p&gt;I’m particularly interested in hearing from developers working with LLMs in real codebases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does structured context improve answer quality?&lt;/li&gt;
&lt;li&gt;Is token-based trimming too aggressive?&lt;/li&gt;
&lt;li&gt;How are you handling context management today?&lt;/li&gt;
&lt;li&gt;Is static extraction even the right direction?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’d like to try it, search for &lt;strong&gt;Python Deep-Context&lt;/strong&gt; in the VS Code Marketplace.&lt;/p&gt;

&lt;p&gt;You can also open issues or share thoughts here:&lt;br&gt;&lt;br&gt;
&lt;a href="https://github.com/hgliyuhao/python-deep-context/issues" rel="noopener noreferrer"&gt;https://github.com/hgliyuhao/python-deep-context/issues&lt;/a&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>python</category>
      <category>showdev</category>
      <category>tooling</category>
    </item>
  </channel>
</rss>
