DEV Community

Cover image for The Hidden AI Tax on Tech Debt
Volodymyr Yepishev
Volodymyr Yepishev

Posted on

The Hidden AI Tax on Tech Debt

TL/DR: the larger your files are, the more you will likely pay for tokens using AI

Sometimes files bloat: someone got lazy and did not separate concerns, someone was rushed to release the feature, someone silently quit or could care less. It is impossible to catalogue all the possible reasons. There are different views on the file sizes in the codebase, especially when it comes to frontend development, with some developers considering fine grained small files with single export, as was the case with one of the Angular's older style guides, or loose suggestions as it was once in React: see what works for you, you can start by putting everything in one file, or something like that. Both are valid approaches and usually developers go by their preferences.

Yet, it is the end of 2025 and everyone and their mother are prompt egineers shipping features daily, while it is often that the velocity of the development is paid for by the quality of the end product and the state of the codebase itself. This often produces bloated files, which is no wonder, but if we are using AI to develop, is the file bloat actually a problem? Certainly it used to be a problem in the old days for the clean code purists, but what of now?

This has lead me to wonder if AI struggles with files as they grow larger. Intuitively it has to, I even asked some AIs if increased file sizes would be more costly to maintain using AI coding agents, the answer I got back was that I was absolutely right (like always when asking AI anything). I needed a way to test it. I found my old repo with a practice project in React, where I had followed one export per file rule and kept everything separated to an extreme: 89 files in total, 1450 lines, with the largest file counting as few as 136 lines. Copying the repo and merging all files into more bulky ones by module I produced a project with the same code, same functionality, but higher code density, down to 17 files, with the largest file sitting with 448 lines. Another copy would take it to the extreme: all the functional code in a single of 1204 lines. I wish I had something larger to test on, but these would have to suffice.

For testing I further generated 80 prompts in 4 different categories: Code Reading & Display, Code Analysis, Information Queries and Code Generation (very primitive, i.e. add comment or add a field to an interface). The methodology for research was rather simple:

  • open folder with small/large/humongous file(s) with Cursor/Grok;
  • submit prompts one by one, each in new chat window at regular interval;
  • waste all money on tokens compare token usage/price;
  • repeat 5 times.

Submitting prompts at regular intervals in that quantity was an impossible endeavor for me, so a Python script was crafted to take care of it and log results with time. A prompt every 2 minutes, 3 repos, 5 runs per each repo to collect the data. It would take a while. Also, every folder would need to be copied from a pristine source into a new path, to prevent the agent from re-using previous knowledge of it.

The initial displayed some interesting results:

Repo Input (w/o Cache Write) Cache Read Output Tokens Total Tokens Cost
small 363,431.00 5,651,776.00 48,670.00 6,063,877.00 0.308
large 641,014.00 4,732,480.00 42,424.00 5,415,918.00 0.338
humongous 817,980.00 4,771,392.00 44,903.00 5,634,275.00 0.424

The repos with smaller files had better cache read numbers and cheaper use as the result. The next run produced similar results, though not identical (probably because the non-deterministic nature of AI, temperature and etc):

Repo Input (w/o Cache Write) Cache Read Output Tokens Total Tokens Cost
small 478,906.00 6,445,127.00 49,960.00 6,973,993.00 0.364
large 700,265.00 4,968,008.00 44,018.00 5,712,291.00 0.372
humongous 1,000,671.00 4,846,604.00 45,552.00 5,892,827.00 0.482

The humongoes file was getting beaten by its smaller counterparts, it was obviously more expensive to maintain, even though it offered exactly same functionality and same code. It was then that I decided the margin could be even more dramatic if the file size increases even more, though I did not have any code to bloat it with, so I merely polluted it with useless comments to ridiculous 3615 lines, which is almost 3 times larger that its original volume.

The results of the next run were absolutely buffling to me: although small and large files were still better than the single humongous, the bloated version which had almost 3 times the volume of the original, did outperform it!

Repo Input (w/o Cache Write) Cache Read Output Tokens Total Tokens Cost
small 337,567.00 5,612,288.00 48,050.00 5,997,905.00 0.312
large 606,161.00 4,459,264.00 40,769.00 5,106,194.00 0.318
humongous 1,024,393.00 5,148,457.00 45,304.00 6,218,154.00 0.448
humongous-bloated 336,670.00 6,513,728.00 53,460.00 6,903,858.00 0.337

I had no reasonable explanation why this happened. My expectation had been the bloated file would tank, yet with cache reads it outperformed every other repo. Without any idea how LLMs treat files in codebases under the hood, I started thinking that after certain size it is somehow split and treated as multiple.

ChatGPT, being LLM, had a different explanation:

The heavily bloated file performed unexpectedly well because most of its added content consisted of repetitive comments, which produced extremely stable and highly cacheable token prefixes—allowing the model to reuse far more of the long-context cache than any of the smaller, more diverse codebases.

I had no expertise to confirm whether it is true or not, so I continued testing. The next run actually revealed the bloated file to be the best:

Repo Input (w/o Cache Write) Cache Read Output Tokens Total Tokens Cost
small 415,485.00 6,425,600.00 49,693.00 6,890,778.00 0.328
large 626,846.00 4,460,160.00 39,830.00 5,126,836.00 0.331
humongous 891,457.00 5,117,184.00 43,634.00 6,052,275.00 0.427
humongous-bloated 348,328.00 6,393,536.00 52,378.00 6,794,242.00 0.320

Inspecting the prompts, code generation was only one of the categories, 20 prompts out of 80, the other three being for reading. It was the time to calculate the average per category based on 5 runs for small + large + humongous and 3 runs form the humongous-bloated.

Here are the results

CODE READING & DISPLAY (PROMPTS 1-20)

The bloated file basically gives same speed as the repo with a bunch of small ones.

Repo Input (w/o Cache Write) Cache Read Output Tokens Total Tokens Cost
small 71,709.40 646,280.20 4,261.00 722,250.60 0.034
large 135,992.80 796,562.80 4,647.00 937,202.60 0.051
humongous 221,114.60 783,213.80 4,396.20 1,008,724.60 0.070
humongous-bloated 55,467.33 769,216.00 4,856.67 829,540.00 0.033

CODE ANALYSIS (PROMPTS 21-40)

Same amazing speed for the bloated monster, on par with larger files on the average.

Repo Input (w/o Cache Write) Cache Read Output Tokens Total Tokens Cost
small 120,173.40 2,113,496.40 16,864.40 2,250,534.20 0.118
large 188,936.40 1,685,606.40 13,997.20 1,888,540.00 0.116
humongous 223,842.20 1,580,842.40 14,566.80 1,819,251.40 0.129
humongous-bloated 121,723.00 2,063,594.67 17,915.67 2,203,233.33 0.116

INFORMATION QUERIES (PROMPTS 41-60)

Here the bloated file takes a dive and small ones prevail.

Repo Input (w/o Cache Write) Cache Read Output Tokens Total Tokens Cost
small 115,328.40 1,907,962.40 13,827.80 2,037,118.60 0.097
large 169,173.80 1,357,708.80 11,282.60 1,538,165.20 0.099
humongous 273,745.80 1,562,889.80 11,829.80 1,848,465.40 0.131
humongous-bloated 103,394.00 2,133,376.00 14,525.00 2,251,295.00 0.107

Code Generation (Prompts 61-80)

Code generation the bigger files lose, though the large files outperform the super small ones.

Repo Input (w/o Cache Write) Cache Read Output Tokens Total Tokens Cost
small 88,252.20 1,365,952.00 14,204.80 1,468,409.00 0.078
large 151,602.60 795,801.60 11,740.40 959,144.60 0.071
humongous 218,112.60 1,050,280.60 14,053.40 1,282,446.60 0.109
humongous-bloated 87,987.67 1,589,269.33 16,841.33 1,694,098.33 0.087

Yet the overal totals still have the repo with smallest files the winner

Repo Input (w/o Cache Write) Cache Read Output Tokens Total Tokens Cost
small 395,463.40 6,033,691.00 49,158.00 6,478,312.40 0.327
large 648,632.20 4,643,444.80 41,759.40 5,333,836.40 0.340
humongous 936,815.20 4,977,226.60 44,846.20 5,958,888.00 0.440
humongous-bloated 368,572.00 6,555,456.00 54,138.67 6,978,166.67 0.343

So what is the conclusion here? Bloat your file with comments to make it faster for AI On the average smaller files are more cost effective, as they allow more cache reads when using AI agents, modify less cache. The larger you go, the more expensive it will be for any operation.

As usual, the repo with code.

Top comments (2)

Collapse
 
isaachagoel profile image
Isaac Hagoel

Interesting but don't coding agents like Cursor only read parts of the file already?

Collapse
 
bwca profile image
Volodymyr Yepishev

I wish I knew :/