DEV Community

Cover image for Our IDEs Are Quietly Failing Us — And We Normalized It
Manoj Pisini
Manoj Pisini

Posted on • Edited on

Our IDEs Are Quietly Failing Us — And We Normalized It

This is a long one. We're going from 1983 to 2026, making all stops along the way.


The Tool That Shapes the Thinker

There's a thought experiment worth sitting with before we get into history and benchmarks.

Open your Task Manager right now. Sort by memory. Look at what's near the top. Odds are it's VS Code, Cursor, or one of their Electron siblings — consuming RAM in the range of 800 MB to 2 GB just to let you edit text files. Now consider: that machine is also running your Docker containers, your local database, your test runner, your TypeScript compiler. Your IDE is not a passive tool. It is an active competitor for the same resources your actual work needs.

We accepted this. Gradually, quietly, and almost without noticing.

But this is just one symptom of a deeper question: "what has happened to the IDE, and what should it actually be?" To answer that, we need to go back to where it started.


Part 1: How We Got Here and Who to Blame

The Before Times: Compile-and-Pray (No, Literally)

Before IDEs, writing software meant stitching together three separate tools — a text editor, a compiler, and a debugger — and manually orchestrating them. You wrote code in one program, switched to a terminal to compile, read cryptic errors, went back to the editor, made a change, compiled again. The feedback loop was hostile. Errors were output to paper in some early environments. Learning to code required significant tolerance for friction.

For most of the 1970s and early 80s, this was just how programming worked. Not because it was good, but because nobody had thought hard enough about the alternative.

1983–1990: Turbo Pascal and the Integrated Revelation

The IDE as we know it was arguably born in 1983 with Borland's Turbo Pascal. The concept was radical in its simplicity: what if the editor, compiler, and error output all lived in the same program? What if compilation was a single keypress and the cursor jumped directly to the offending line?

The result was transformative. A full Pascal development environment ran in under 40 KB of RAM and started instantly. The feedback loop collapsed from minutes to seconds. Turbo Pascal became legendary not just as a product but as proof of concept — that developer experience was a design problem worth solving.

By 1990, Borland extended this philosophy to C++ with Turbo C++: full-screen text UI, syntax highlighting, integrated debugger, compile-time error navigation. All of it running natively, tightly, and fast. Developers who used these tools remember them with a specific kind of affection — the affection you have for a tool that seems to understand what you're trying to do.

The key thing these early IDEs had in common: "they were built for the machine they ran on." No abstraction layers. No runtimes sitting between user input and screen output. The editor was a native program in the truest sense.

1991–1999: The GUI Era and the Rise of Visual Studio

The 1990s brought graphical interfaces, color monitors, and a mouse to the developer toolchain. Microsoft's Visual Basic (1991) introduced something genuinely new: a drag-and-drop form designer where you could build Windows UIs without writing layout code by hand. Many developers describe the experience as revelatory — suddenly the gap between idea and running application narrowed dramatically.

Visual Basic's success demonstrated something important: "the IDE could drive adoption of a language." The tool and the language became inseparable. VB sold because its IDE was extraordinary. This was the first time, though not the last, that tooling choices began to have profound effects on the ecosystem around them.

By 1997, Microsoft unified its language tooling into Visual Studio — C++, VB, and eventually C# and web development under one roof. Visual Studio became the gold standard for what an integrated environment could do: IntelliSense (context-aware code completion), integrated debugging with step-through and breakpoints, project management, built-in build systems. Heavy, yes — but architecturally coherent and purpose-built. Microsoft's strategy was, as always, "embrace, extend, and make it really hard to leave" — and for a long time, it worked brilliantly.

Meanwhile, in the Unix world, the dominant tools were Emacs and Vim. Modal, keyboard-driven, infinitely extensible, and essentially weightless. Vim ran on everything from a Sun workstation to a 4 MB RAM machine over SSH. It had no GUI, no project tree, no debugger integration — and yet developers swore by it with an intensity that remains undiminished to this day. The Vim community was the first to articulate a philosophy that would resurface constantly: "a tool's greatest strength is often what it refuses to include."

2000–2010: JetBrains, Eclipse, and the Age of Deep Language Intelligence

The 2000s brought Java to the mainstream, and Java brought Eclipse (2001) — an open-source, plugin-based IDE built on the JVM. Eclipse was architecturally interesting: everything was a plugin, including the core editor. This made it extraordinarily extensible and accelerated the spread of IDE culture across languages and platforms. The Eclipse model of "editor as plugin platform" would echo all the way forward to VS Code. It also introduced a generation of developers to the spiritual experience of watching a progress bar that says "Building workspace…" for forty-five seconds after every git pull, which some people apparently enjoyed enough to keep doing for a decade.

But the decade's most significant development was JetBrains' IntelliJ IDEA (2000). Where Eclipse was broad, IntelliJ was deep. It understood Java not as text but as semantics — it tracked types, inferred intent, caught dead code, spotted misused APIs. The refactoring tools were genuinely remarkable: rename a class and every reference across the entire project updated automatically, with confidence. Extract method, inline variable, change method signature — all performed correctly, preserving behavior.

IntelliJ established a principle that would go underappreciated for two decades: "a truly useful IDE must understand the Abstract Syntax Tree (AST) of your code, not just its text." The difference between text-aware and semantics-aware tooling is the difference between Find and Replace and actual refactoring.

2008–2015: The Lightweight Counter-Revolution

As IDEs grew heavier (Eclipse's cold start in 2010 was routinely over 20 seconds on typical hardware), a counter-movement emerged. TextMate (2004) showed that a fast, extensible text editor with good syntax highlighting and snippets was often enough. Sublime Text (2008) refined this into something close to perfect — instant startup, the revolutionary Command Palette, multi-cursor editing, a plugin ecosystem that filled in the gaps. Sublime became the editor of the web development world for several years.

These tools didn't have debuggers, didn't understand ASTs, didn't have refactoring. But they were fast. They started instantly. They stayed out of your way. They reminded the community that responsiveness was a feature — that the feeling of the tool under your hands mattered.

GitHub's Atom (2014) tried to marry the extensibility of these lightweight editors with a modern architecture. The architecture they chose was Electron. In hindsight, this is a bit like solving a bicycle puncture by buying a car — technically it gets you there, but you're now paying for petrol, insurance, and parking just to pop to the shop.

2015–Present: The Electron Monoculture (It's Browsers All the Way Down)

Visual Studio Code (2015) was built by Microsoft on the same Electron foundation as Atom, but executed with far more discipline and resources. It was free, cross-platform, lightweight by the standards of full IDEs, and — crucially — it introduced the Language Server Protocol (LSP) (2016). To Microsoft's credit, they took Electron and made something genuinely useful with it. To Electron's credit, it had absolutely nothing to do with that.

LSP was genuinely brilliant. By defining a standard communication protocol between editors and language servers, it decoupled language intelligence from the editor. A Rust language analyzer could be written once and work in any LSP-compatible editor. Go, Python, TypeScript, C++ — all got first-class tooling overnight, in any editor. The democratizing effect was enormous.

VS Code became the dominant editor in software development with stunning speed. By 2024, 73.6% of professional developers used VS Code as their primary editor [Stack Overflow Developer Survey 2024]. The extension marketplace swelled to over 60,000 plugins. Language support became virtually universal.

But underneath all of this sits Electron — and Electron means running a Chromium browser instance to display a text editor.


Part 2: The Hidden Tax We've Been Paying Without Noticing

What Electron Actually Means

Electron bundles a full Chromium browser engine with a Node.js runtime and ships them as a desktop application. Every Electron app is, architecturally, a website running in a private browser. It is, to be precise about it, a solution to the problem of writing cross-platform desktop apps that also creates the problem of your desktop app being a browser. The implications:

  • RAM: A basic Electron app consumes around 100 MB at runtime just for the runtime overhead. VS Code with a few extensions running routinely hits 700 MB–1.5 GB. A 2025 comparative study found VS Code using approximately 5× the RAM of Zed at idle [Markaicode benchmark, 2025].

  • Input latency: Keystrokes in a Chromium-based editor travel through a JavaScript event loop, get diffed against a virtual DOM, get reconciled, get laid out by a CSS engine, get composited by Chromium's rendering pipeline — before a pixel changes. Measured input latency in VS Code averages around 12 ms. In Zed (native GPU rendering), it's around 2 ms. This isn't perceptible in isolation, but at 120 keystrokes per minute across an 8-hour session, the accumulated micro-friction is real.

  • Startup time: VS Code opening a large monorepo takes 3–5 seconds on modern hardware. Zed takes under 300 ms.

  • CPU in the background: If you've wondered why your laptop fan spins when you have VS Code open but aren't typing, you're watching Chromium's background processes do work you didn't ask for. Electron isn't just running your editor. It's running a small city of background tasks, garbage collection cycles, and security sandboxes — all to display some syntax-highlighted text.

Editor Architecture Startup (large project) Idle RAM Input Latency
VS Code Electron (Chromium + Node.js) ~3.8 s ~730 MB ~12 ms
Cursor Electron (VS Code fork) ~3.5 s ~800 MB+ ~12 ms
Windsurf Electron (VS Code fork) ~3.5 s ~750 MB+ ~12 ms
Zed Native Rust + GPUI ~0.25 s ~142 MB ~2 ms
Neovim Native C <0.1 s ~30 MB <1 ms

Sources: Markaicode benchmark (2025), devtoolreviews.com, multiple community benchmarks. Note: Cursor and Windsurf are both VS Code forks — they inherit all of Electron's overhead with AI features layered on top. You are, in these cases, paying a RAM tax twice: once for the browser pretending to be an editor, and once for the AI pretending to be a programmer.

"It's Fine On My Machine" — The Five Stages of Electron Denial

The counter-argument is always "hardware is cheap." And it's true that on a modern MacBook Pro with 32 GB of RAM, VS Code's overhead is negligible. But:

  1. Not everyone has 32 GB of RAM. A significant portion of working developers globally are on machines where 700 MB dedicated to a text editor meaningfully competes with their compiler, database, and containers.

  2. The overhead compounds. If you run VS Code + Docker Desktop + a local Kubernetes cluster + a Postgres instance + your app server, you're looking at a machine under constant memory pressure. The editor is one of the few components in that list where the overhead is architectural, not functional.

  3. Battery life. Electron apps notoriously drain batteries faster than native equivalents because the Chromium engine doesn't yield CPU efficiently when idle. For developers on laptops — which is most developers — this is a real quality-of-life issue [XDA Developers, 2025].

  4. It normalizes bad architecture. When the most widely used developer tool in the world is built on an architecture that sacrifices performance for cross-platform convenience, it sends a signal. That signal is: "performance doesn't matter for tools." That is exactly the wrong signal to send to the community that builds the software other people rely on. We are, in effect, a generation of engineers who optimise database queries to the microsecond and then go home and type into a web browser to do it.

What We Actually Lost

There's a less quantifiable cost that the benchmark tables don't capture: "the texture of the tool changes what you do with it."

A slow startup makes you reluctant to close and reopen the editor. You keep files open you don't need. You accumulate tabs. Cognitive overhead grows. A tool with 12 ms input latency makes you slightly less willing to make small, exploratory edits — the subconscious cost of each keystroke is higher. None of this is dramatic. It's all just a little friction, everywhere, all the time.

The Turbo Pascal developers from 1985 had a tool that was faster than modern VS Code at its core function: editing text and seeing errors. We have more features, certainly. But we have less of the thing that makes a tool feel like an extension of your hands.


Part 3: The Good, the Bad, and the Honest Scorecard

Let's be honest about both sides. The current IDE landscape is not a simple failure story. (If it were, this post would be half as long. You're welcome.)

What Modern IDEs Got Right

Universal language support via LSP. Before LSP, if you switched from Java to Go, you either used IntelliJ's Go plugin (good but proprietary) or a significantly worse experience. Now, language servers are open-source, community-maintained, and work in any compatible editor. The Rust Analyzer, for instance, provides extraordinary IDE intelligence — type inference, lifetime annotations, macro expansion — and it's free, open, and works in VS Code, Zed, Neovim, Helix, and more.

The extension ecosystem. VS Code's extension marketplace has solved problems that no single team could solve. Docker integration, Kubernetes management, database GUIs, live collaboration, remote development over SSH — the ecosystem extends the editor into a full platform. This is genuinely valuable and would be hard to replicate in a native, monolithic architecture.

Remote development. VS Code's Remote-SSH and Dev Containers features changed how many teams work. Editing code that runs on a cloud VM, in a Docker container, or on a Raspberry Pi — with full IntelliSense, debugging, and extension support — is a capability that heavyweight native IDEs struggle to match.

Accessibility and onboarding. A new developer can go from zero to productive in VS Code in under an hour. The defaults are good. The error messages are readable. The Git integration works. For education, bootcamps, and onboarding, this matters enormously.

Free and open. VS Code is open source. Its core is MIT licensed. The fact that the dominant development tool in the world is freely accessible to everyone, in every country, on every operating system, is not nothing. It is actually remarkable.

What Modern IDEs Got Wrong

Performance as an afterthought. The Electron choice was made for developer convenience (web technologies are familiar, cross-platform is free) at the cost of user experience. The performance tax is paid by every developer, every day, forever. It was a reasonable choice in 2015; it becomes harder to justify as native alternatives prove the gap is real and closeable.

Extension quality is a lottery. The same marketplace that gives you extraordinary tools also gives you extensions that conflict, slow startup, cause memory leaks, and break silently between VS Code updates. The extension model's power and its reliability problems are the same thing.

JVM-era IDEs tax Java developers. IntelliJ IDEA, the gold standard for Java/Kotlin development, is a JVM application. Its cold start on a large project can exceed 30 seconds. Its background indexing after opening a large repository consumes significant CPU for minutes. The intelligence it provides is extraordinary — but the warmup cost is steep. Java developers have learned to make coffee when they open a new project. This is not a productivity feature. This is just coping.

Language servers are good but not seamless. LSP abstracts language intelligence into a protocol, but the communication overhead is real. For very large codebases, the language server can take minutes to fully index. Type-checking a change in a large TypeScript monorepo can take several seconds. These are protocol-level bottlenecks that can't be fully solved without tighter integration.

Deep debugging is still primitive. The state of debugging in most modern IDEs — set a breakpoint, step through execution, print to stdout — is fundamentally unchanged from the 1990s. The tooling to go further exists: Mozilla's rr gives you full record-and-replay; LLDB has reversible stepping. They're just sitting there, mostly unintegrated, largely ignored by the IDE vendors, while everyone argues about whether the AI suggestion panel should live on the left side or the right.


Still with me? Good. We’re about to get to the part where even the AI-skeptics might actually find themselves nodding along for once.

Part 4: The Agentic IDE — Magic Trick or Loaded Gun?

No discussion of IDEs in 2026 is complete without confronting what happened over the last three years: the industry's aggressive pivot toward AI-powered code generation.

The Pitch and the Reality

The value proposition of tools like Cursor, Copilot Workspace, and Windsurf is seductive: describe what you want in English, and receive working code. Scaffold a REST API in 30 seconds. Generate unit tests for a module you didn't want to write tests for. Autocomplete not just lines but entire functions.

For certain use cases, this works well. Generating boilerplate. Writing a test harness for a known pattern. Converting data between formats. Explaining an unfamiliar codebase. Getting a first draft of something you'd eventually rewrite anyway. These are genuinely useful applications of AI in developer tooling.

But the reality of AI-assisted coding at the frontier of complex software development is considerably more complicated than the marketing suggests.

The METR Study: Numbers Don't Lie, Even When We Want Them To

In July 2025, the non-profit research group METR published a randomized controlled trial with a finding that sent shockwaves through the developer community: developers using AI tools (primarily Cursor Pro with Claude 3.5/3.7 Sonnet) completed tasks 19% slower than developers working without AI [METR, arXiv:2507.09089].

The study involved 16 experienced open-source developers working on their own familiar repositories — projects with an average of 22,000+ GitHub stars and over a million lines of code. Each developer had an average of 5 years of experience on their specific codebase. The 246 tasks were real GitHub issues, not synthetic benchmarks.

The most striking finding wasn't just the slowdown. It was the "perception gap": developers predicted a 24% speedup before the study, and after completing it, still believed they had been sped up by 20% — despite objective measurement showing the opposite.

They felt faster. They were measurably slower.

What caused the slowdown? The METR researchers identified several factors: extra cognitive load from switching between coding mode and prompting mode, time spent reviewing and correcting AI outputs, and AI's low reliability on complex, context-heavy tasks in mature codebases. Ars Technica's analysis of screen recordings from the study found developers spending roughly 9% of total task time specifically reviewing and modifying AI-generated code — work that didn't exist before the AI was introduced [Ars Technica, 2025].

"When AI is allowed, developers spend less time actively coding and searching for/reading information, and instead spend time prompting AI, waiting on and reviewing AI outputs, and idle."
METR study, July 2025

It's worth noting that the situation is nuanced and evolving. METR's follow-up in early 2026 acknowledged significant challenges with their newer study design — many developers refused to participate because they didn't want to work without AI, and there were selection effects in which tasks got submitted [METR, February 2026]. The technology is also evolving rapidly. But the July 2025 finding stands as the most methodologically rigorous data point we have on the question, and it should give pause to anyone treating AI coding tools as an unqualified productivity multiplier for experienced engineers.

The Black Box Problem

Here's a scenario most developers will recognize.

A team uses an agentic tool to scaffold a new service — 400 lines of code, tests included, generated in under ten minutes. It looks reasonable. The tests pass. It ships.

Six weeks later, under sustained production load, the service develops a slow memory leak. Heap usage climbs until the pod crashes. The on-call engineer opens the code and realizes they're staring at something nobody on the team wrote by hand. The token bucket, the middleware chain, the request context threading — all generated, all unfamiliar. What would normally take an hour to debug takes three days, because the mental model was never built.

The root cause, when found: the AI-generated middleware captured a closure through the logger through the request context, preventing garbage collection. A subtle pattern the generator used consistently, invisible until production pressure revealed it.

"Code is read ten times more often than it is written." When a tool generates 400 lines in ten minutes, you don't save ten minutes — you create 400 lines of legacy code that must be maintained, debugged, and understood by people who didn't build it. The generation speed becomes a maintenance debt.

The Human Brain Is Not a Token Predictor (And 12 Lines of C Prove It)

This is the argument that tends to get lost in the productivity debate, and it might be the deepest one.

Consider the Fast Inverse Square Root — a piece of code written by Gary Tarolli and later refined by others at id Software, made famous in the Quake III Arena source code (1999). The algorithm computes 1/√x extraordinarily fast, without a single division or square root operation, by exploiting the bit-level representation of IEEE 754 floating-point numbers. It treats the bits of a float as if they were an integer, performs a bit shift, subtracts from a magic constant (0x5F3759DF), reinterprets the result as a float again, and runs one iteration of Newton's method to refine the approximation [Lomont, 2003 — "Fast Inverse Square Root"].

float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y;           // evil floating point bit level hacking
    i  = 0x5f3759df - ( i >> 1 );   // what the f*ck?
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );  // 1st iteration of Newton's method

    return y;
}
Enter fullscreen mode Exit fullscreen mode

The comment in the original source code literally reads: "what the f*ck?"

No language model generates this from a prompt against a blank slate. Not because it lacks the tokens to reconstruct it — it can reproduce it, having seen it in training data. The point is that no language model could have invented it in 1999 from a standing start. The algorithm required a human mind to simultaneously hold and marry together concepts from completely separate domains:

  • Computer architecture — the specific memory layout of IEEE 754 floats in a 32-bit register
  • Mathematical intuition — recognizing that log₂(x) ≈ (bits of x as integer) / 2²³ when x is a float, a relationship spanning number theory and hardware representation
  • Physics and rendering knowledgethe exact bottleneck was 1/√x for surface normal normalization in real-time 3D lighting, a domain-specific pressure that forced the search for a faster path
  • Approximation theory and numerical analysis — the insight that "good enough is better than exact", and knowing precisely how much error one iteration of Newton's method would correct
  • The willingness to break language conventions — deliberately aliasing a float* to long*, violating strict aliasing rules in C, in a way that would make most compilers and code reviewers flinch today

This is not a programming problem. It is a physics problem, a mathematics problem, a hardware problem, and an engineering trade-off decision — all collapsed into 12 lines of C. The human brain performed an act of cross-domain synthesis that took ideas from completely separate fields and married them into something that looks, superficially, like "just code."

This kind of reasoning has a formal name in cognitive science: "analogical transfer" — the ability to recognize that the structure of a problem in one domain maps onto a solution technique from a completely different domain [Gentner, 1983 — "Structure-Mapping: A Theoretical Framework for Analogy"]. It is arguably the central mechanism of human mathematical and scientific creativity.

  • Newton didn't just solve orbital mechanics — he recognized that falling apples and orbiting moons were the same problem.
  • Fourier didn't just analyze heat — he recognized that any periodic function could be expressed as a sum of sines and cosines.
  • Dijkstra didn't just write a graph algorithm — he looked at road networks and recognized underneath them a mathematical structure that could be solved optimally.

The Fast Inverse Square Root is a small but perfect example of the same move: "a rendering bottleneck is secretly a numerical analysis problem wearing a computer architecture costume." The engineer who wrote it didn't search a known solution space. They reframed the problem entirely — and the reframe required fluency in multiple disciplines simultaneously.

This capacity shows up constantly in decisions that don't make headlines:

Choosing an approximation algorithm over an exact solution. This requires simultaneously knowing the mathematics of the error bound, the statistical distribution of inputs, and the business's actual tolerance for inaccuracy. No prompt captures all of this context, and no model can supply the judgment call of when "close enough" is correct. A/B testing frameworks, approximate nearest-neighbor search in recommendation engines, probabilistic data structures like Bloom filters — all of these are engineering decisions where the right answer was deliberately wrong, and a human had to decide how wrong was acceptable.

Recognizing that a problem in one domain is a known problem in another. The entire field of information theory began when Claude Shannon recognized that the reliability of communication channels was mathematically equivalent to problems in thermodynamics. MapReduce became the dominant distributed computing paradigm when its designers recognized that a functional programming pattern from the 1950s could describe arbitrary distributed computation [Dean & Ghemawat, 2004 — "MapReduce: Simplified Data Processing on Large Clusters"]. These insights don't come from predicting the next token in a training corpus. They come from holding two apparently unrelated domains in mind simultaneously and seeing the structural echo between them.

Deciding that the right solution is to delete code, not write it. Some of the most valuable engineering work ever done involved recognizing that a complex system could be replaced by a simpler one. This is not a generative act at all. It requires deeply understanding what the existing system does, what the actual requirements are (not the stated ones), and having the confidence to make a judgment call that no benchmark rewards and no autocomplete can suggest.

"The gap between what current AI can generate and what the human brain can invent is not primarily a gap in coding ability. It is a gap in cross-domain reasoning, in physical and mathematical intuition, and in the capacity to decide that an approximate answer is better than an exact one — and to know, precisely, why."

This is not a counsel of complacency. AI capabilities are improving. The gap will narrow. But it will not close on the timeline that the productivity dashboards suggest — because the gap is not about syntax, or even about logic. It is about the kind of creative insight that emerges from deeply internalizing multiple fields of knowledge over years and then holding them in tension at the moment a problem demands it. Current language models are trained to be statistically consistent with their training distribution. Human experts, at their best, break with the distribution — and that is precisely when the most important code gets written.

Outsourcing the generative work to an AI before building those internal models is not a productivity win. It is a deferral of an educational debt that compounds with interest.

We Came Here to Build Things, Not to Babysit a Diff Tool

There's a dimension to this that doesn't show up in METR's data, and it might be the most important one.

Ask a developer why they got into software engineering. Almost nobody answers "because I wanted to review pull requests generated by a statistical model." The answer is almost always some version of: the joy of making something work. The satisfaction of wrestling with a hard problem and winning. The specific pleasure of getting the Rust borrow checker to stop complaining. The dopamine hit when a failing test finally turns green after an hour of debugging.

When an AI generates the solution before you've worked through the problem, you are demoted from builder to reviewer. The outcome may be the same. The experience is completely different.

This is not nostalgia. It is a concern about the mechanism by which expertise is built. A developer who hand-writes a lock-free concurrent queue learns something irreplaceable about cache line invalidation and memory ordering. A developer who prompts an AI to write one learns how to write better prompts. Both are real skills. Only one of them transfers when production systems fail at 2 a.m. and the AI tool is not available or not helpful.

There is a reason 69% of developers in the METR study continued using AI tools after the experiment ended, despite being objectively slower with them [METR, 2025]. The tools feel good to use. They reduce the friction of uncertainty. They make the act of coding less lonely. These are real benefits — but they are psychological benefits that exist somewhat in tension with the actual goal of building robust software.

So What AI "Should" Actually Do in Here?

The critique is not "no AI in IDEs." It's "AI in the wrong place." The industry has directed enormous effort at using AI to generate code — the most intellectually engaging part of engineering — while barely touching the parts that are genuinely tedious and error-prone.

Consider what AI could do that it largely doesn't:

Runtime-integrated debugging. An AI that watches execution state, catches a panic or segfault, and synthesizes a root cause from the execution trace — rather than just showing you a stack trace and wishing you luck. The record-and-replay primitives (rr, LLDB reversible stepping) are already there, sitting idle in the garage like a sports car nobody drives. The missing piece isn't the debugger. It's an AI layer on top that can reason about what it sees — correlate the panic with state changes three frames back, identify the lock that was held too long, and tell you in plain language what actually went wrong. That integrated tool does not yet exist.

Blast-radius-aware refactoring. When you change a core interface — add a parameter, modify a trait, restructure a data type — the AI should understand the AST holistically and perform the surgical correction across all call sites, checking for semantic correctness, not just syntactic. This is not generative. It's mechanical, precise, and enormously valuable. JetBrains' refactoring tools gesture at this, but without AI-scale reasoning over large codebases.

Continuous complexity profiling. A background process that surfaces O(n²) traversals on hot paths, identifies potential lock contentions, flags patterns associated with memory leaks in your specific language and framework — not as blocking warnings, but as ambient information visible at the right moment.

Semantic code review. Not style checking (linters already do this) but genuine pattern recognition: "This is the third service you've written this month that stores secrets in environment variables passed through the request context — here are the two previous incidents this caused."

The common thread: "AI handling the mechanics, not the invention." The creative decisions — the system design, the abstraction choices, the tradeoffs — stay with the engineer. The AI handles the surface area that scales poorly with human attention.


Almost there. This is the part where I stop complaining and say something constructive.

Part 5: What the IDE of the Future Actually Needs to Look Like

Bringing this all together. Not a product roadmap. A set of architectural principles that the industry should be building toward.

1. Native and GPU-Accelerated as a Hard Requirement

The era of accepting Electron-level performance for developer tooling needs to end. Zed has proved this is achievable: written in Rust, rendering directly to the GPU via the GPUI framework at 120 fps, no DOM, no JavaScript runtime, 2 ms input latency, 250 ms startup. These aren't benchmarks from a specialized research project — they're shipping, in production, used by tens of thousands of developers daily [Zed Industries, GPUI documentation].

The future IDE is a native binary. This is not about ideology. It's about having the performance headroom to do everything else on this list.

2. Deep LSP + AST-Aware Intelligence, Not Just Autocomplete

The Language Server Protocol democratized language intelligence. The next step is deeper integration: an IDE that holds the AST of your entire codebase in memory, understands not just types but behavioral contracts, and can reason about correctness and semantics — not just syntax.

This is what makes real refactoring possible. This is what makes "extract this function and update all callers with the correct types" reliable rather than probabilistic.

3. AI Embedded in the Debugger, Not the Editor

The most underserved use case in developer tooling is debugging. An AI that can watch an execution trace, correlate a panic with recent state changes, identify the thread that caused a deadlock and the lock ordering that made it possible — this would be transformative in a way that autocomplete is not.

The infrastructure for this already exists — What doesn't exist is someone bold enough to actually wire it together into a product a normal team would use on a Monday morning.

4. Semantic Refactoring as a First-Class Operation

Change an interface. Add a parameter. Rename a type. The IDE should be able to understand the "blast radius" of that change — every implementer, every call site, every test that will break — and execute the correction surgically, preserving behavior, flagging ambiguous cases for human review.

This is AI doing mechanical work well. Not generative. Not probabilistic. Precise.

5. Ambient Analysis That Doesn't Interrupt Flow

The best IDE feature is one that gives you information at exactly the moment it's useful, without breaking your train of thought. A non-blocking annotation: "This traversal is O(n²) on a growing dataset called on every request." Not a rewrite suggestion. Not a popup. Just: information, available when you glance at it, ignorable when you don't.


Conclusion: We Can Do Better — The Tools to Prove It Already Exist

We have, collectively, made a series of compromises. We accepted a browser-as-editor in exchange for cross-platform convenience. We accepted extension quality variance in exchange for ecosystem breadth. We accepted AI as a code writer in exchange for the feeling of going faster.

Each of these tradeoffs had genuine arguments in its favor. But the accumulation of them has produced tooling that is slow where it should be fast, shallow where it should be deep, and generative where it should be precise.

The IDE is not a neutral tool. It shapes how you think about code, how much time you spend in flow, how deeply you understand the systems you're building, and whether the friction you experience is the productive friction of hard thinking or the unproductive friction of waiting for a Chromium process to catch up. One of these frictions makes you a better engineer. The other one is just Electron doing its thing.

The native editor is not retro. The AST-aware refactoring engine is not a luxury. The AI debugger is not science fiction. Zed ships the first. JetBrains has proven the second for decades. The third is waiting for someone to build it.

"We built the most important industry in modern civilization largely on a text editor that is, at its core, a web browser." We can do better than that. The tools exist to prove it. And no, Microsoft, the answer is not to make the web browser bigger.


Disagree with something here? I'd genuinely like to hear it — especially from people using Neovim or Helix at scale, or who've had different experiences with AI tools than what the METR study suggests. The comments are open.

Top comments (0)