<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: swupel</title>
    <description>The latest articles on DEV Community by swupel (@swupel).</description>
    <link>https://dev.to/swupel</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3761777%2Fa5c44e6e-58ab-4a27-b340-b5ae2ddcd269.png</url>
      <title>DEV Community: swupel</title>
      <link>https://dev.to/swupel</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/swupel"/>
    <language>en</language>
    <item>
      <title>Understanding Large Codebases: Why AST Analysis Beats Asking an LLM</title>
      <dc:creator>swupel</dc:creator>
      <pubDate>Mon, 09 Feb 2026 11:22:57 +0000</pubDate>
      <link>https://dev.to/swupel/understanding-large-codebases-why-ast-analysis-beats-asking-an-llm-5ke</link>
      <guid>https://dev.to/swupel/understanding-large-codebases-why-ast-analysis-beats-asking-an-llm-5ke</guid>
      <description>&lt;h3&gt;
  
  
  TL;DR
&lt;/h3&gt;

&lt;p&gt;LLMs are probabilistic—they predict text; they don't parse logic. When navigating massive legacy codebases, "guessing" isn't enough. By using &lt;strong&gt;Abstract Syntax Trees (AST)&lt;/strong&gt; and &lt;strong&gt;Cyclomatic Complexity&lt;/strong&gt;, you can map out technical debt deterministically instead of relying on an AI's "vibe check."&lt;/p&gt;




&lt;h2&gt;
  
  
  The 2:00 AM Inheritance
&lt;/h2&gt;

&lt;p&gt;Last week, I inherited a 50,000-line Python monolith. &lt;/p&gt;

&lt;p&gt;You know the type: a Django app from 2017 with six different authentication schemes, import statements that circle back on themselves, and a &lt;code&gt;utils.py&lt;/code&gt; that was somehow 3,000 lines long. &lt;/p&gt;

&lt;p&gt;My first instinct? Ask an LLM to explain it.&lt;/p&gt;

&lt;p&gt;The response was… fine. It gave me a plausible architectural overview and identified some patterns. But when I drilled into specific files, the LLM’s confidence didn’t match reality. It would describe a function’s behavior perfectly but miss that it had a cyclomatic complexity of 23 and was nested seven layers deep in exception handlers.&lt;/p&gt;

&lt;p&gt;That’s when it clicked: LLMs are great at natural language, but code isn’t natural language. It’s a formal grammar with a deterministic structure. If you want to understand structure, you need structural tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottleneck is Mapping, Not Reading
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqq5xn1io73nmy8yp5h9a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqq5xn1io73nmy8yp5h9a.png" alt="UI screenshot" width="800" height="470"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When we talk about "technical debt," we’re really talking about mental load. How long does it take a human to load this codebase into their brain? &lt;/p&gt;

&lt;p&gt;Reading code top to bottom is like trying to understand a skyscraper by walking through it blindfolded. You’ll eventually get there, but it’s slow and error prone. You need a mental model of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which functions call which other functions?&lt;/li&gt;
&lt;li&gt;How do files relate to each other?&lt;/li&gt;
&lt;li&gt;Where does the complexity actually live?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An LLM can summarize what it sees, but it’s fundamentally probabilistic. It can't tell you that Function A has 18 execution paths while Function B has 3. That distinction matters when you’re deciding what to refactor first.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Ground Truth: Abstract Syntax Trees (AST)
&lt;/h2&gt;

&lt;p&gt;Every piece of code you write is parsed into an Abstract Syntax Tree (AST) by the interpreter before it ever runs. That tree is the "ground truth" of your program.&lt;/p&gt;

&lt;p&gt;When you visualize an AST, you can literally see complexity:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5g7gxak1nzts0hdw7o87.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5g7gxak1nzts0hdw7o87.png" alt="Ast of a function" width="800" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deep branches:&lt;/strong&gt; Heavily nested loops and conditionals.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wide branches:&lt;/strong&gt; Functions with too many decision paths.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tangled roots:&lt;/strong&gt; Circular dependencies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of asking an AI "is this code complex?", AST-based tools allow you to measure it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Math of Bugs: Cyclomatic Complexity
&lt;/h2&gt;

&lt;p&gt;One of the most useful signals you can extract from an AST is Cyclomatic Complexity. Developed by Thomas McCabe, it measures the number of independent paths through a function’s code. &lt;/p&gt;

&lt;p&gt;The formal representation is:&lt;br&gt;
M = E - N + 2P&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(Where E = edges, N = nodes, and P = connected components.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In simpler terms: Start with a base of 1, and add +1 for every decision point (&lt;code&gt;if&lt;/code&gt;, &lt;code&gt;elif&lt;/code&gt;, &lt;code&gt;for&lt;/code&gt;, &lt;code&gt;while&lt;/code&gt;, &lt;code&gt;try/except&lt;/code&gt;, &lt;code&gt;and&lt;/code&gt;, &lt;code&gt;or&lt;/code&gt;). &lt;/p&gt;

&lt;h3&gt;
  
  
  The Risk Scale
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;th&gt;Risk Level&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1–10&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Easy to maintain. High testability.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;11–20&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;Needs monitoring; getting "wordy."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;21+&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Refactor candidate.&lt;/strong&gt; Statistically high bug density.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Research shows that functions with a complexity score over 10 are significantly more likely to contain bugs. Not because the code is "bad," but because humans struggle to track more than 7-10 execution paths in working memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "Cyborg" Workflow: How to Cut the Mess
&lt;/h2&gt;

&lt;p&gt;I’ve stopped approaching legacy code blindly. Here is my 3-step structural workflow using &lt;a href="https://ast-visualizer.com?utm_source=dev.to"&gt;ast-visualizer.com&lt;/a&gt;:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Satellite View (Dependency Graphs)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8g54onti5puu837eka5j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8g54onti5puu837eka5j.png" alt="Fast API file graph" width="800" height="701"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First, I visualize the project’s import structure as a network graph. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Goal:&lt;/strong&gt; Identify bottleneck files—the "God Objects" everything depends on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it beats LLMs:&lt;/strong&gt; An AI might say "This looks like a core module." A graph shows you that 94% of your codebase imports it. If you touch that file, you touch everything.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. The Terrain Map (Complexity Heatmaps)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpujmw6lfooynwokuic27.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpujmw6lfooynwokuic27.png" alt="Depth to line of code chart" width="800" height="139"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I run a "Depth-over-Sequence" chart. Imagine a line graph where the X-axis is "lines of code" and the Y-axis is "nesting depth."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Green:&lt;/strong&gt; Depth 0–6 (Safe)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Red:&lt;/strong&gt; Depth 10+ (The "Danger Zone")
When I ran this on my 3,000-line &lt;code&gt;helpers.py&lt;/code&gt;, 80% of it was red. I didn’t need to read a single line to know exactly where the bugs were hiding.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. The Microscope (Individual Function AST)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8u4cvpjwh1b63uz4618z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8u4cvpjwh1b63uz4618z.png" alt="Individual function AST " width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the gnarliest functions, I visualize the tree itself. I look for asymmetric branches. If one side of an &lt;code&gt;if/else&lt;/code&gt; is 50 lines deep and the other is 2, I've found a hidden edge case that’s likely poorly understood and untested.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Use the Right Tool for the Job
&lt;/h2&gt;

&lt;p&gt;I’m not saying "don't use LLMs." I use them daily. But I’ve learned to use them for what they’re good at: Semantics.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use LLMs for:&lt;/strong&gt; Generating boilerplate, explaining what a function intends to do, and suggesting refactors after you’ve found the target.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use AST Analysis for:&lt;/strong&gt; Structure. Measuring complexity objectively and mapping the territory of a massive project.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're staring at a "helper" file that gives you anxiety, stop reading and start mapping. &lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>programming</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
