DEV Community

Cover image for Bun shipped a million lines of AI-generated unsafe code. That's not bold, it's reckless.
Aditya Agarwal
Aditya Agarwal

Posted on

Bun shipped a million lines of AI-generated unsafe code. That's not bold, it's reckless.

There were thirteen thousand unsafe blocks. This text was generated by AI. It was sent to a JavaScript runtime that millions of developers use. Take a moment to think about that.

Bun's massive Rust rewrite and its experimental multithreading PR have both landed recently, and the discourse around them has been… something. The headline achievements are real: Bun rewrote its core in Rust and is experimenting with true multithreading. But the way it got there should make every engineer uncomfortable.

What Actually Happened

The majority of the Rust rewrite was written by Claude AI. This is not gossip – it has been openly stated by the Bun team. The rewrite added more than 13,000 unsafe blocks to the codebase. And it was released without a concurrent garbage collector.

For the non-systems folks: "unsafe" in this context means code that bypasses memory safety guarantees. One unsafe block may not be problematic. Thirteen thousand of them in AI-generated code is a different conversation entirely.

The Velocity Trap 🏎️

I get the appeal. I really do. AI can generate enormous amounts of code fast. When you're a small team competing against Node.js and Deno, speed feels like survival.

Moving quickly is fine, but being careless is not. Each unsafe code block is like a contract signed by the developer stating: "I promise that this memory access is valid." The question is, if the developer is an LLM, who is signing that contract?

β†’ AI-generated code isn't inherently bad.
β†’ AI-generated unsafe code at scale, in a runtime, without a concurrent GC, is a different beast.
β†’ The risk isn't theoretical β€” it's memory corruption, data races, and crashes in production apps that chose Bun for speed.

The absence of a concurrent garbage collector implies that in real multithreaded workloads, memory management is a minefield. It is not a missing feature. It is a missing foundation.

Trust Is the Product

I believe that many people do not fully understand what runtimes are. A runtime is not like a library that you can replace easily. It is the foundation on which your entire application is built. Therefore, when you choose Node, Deno, or Bun, you are essentially making a decision based on trust.

Certain developers have begun shifting back to Node.js due to concerns around stability. That's not FUD β€” that's the natural consequence of shipping infrastructure that feels experimental at the layer where experiments hurt the most.

I've experienced something similar but in a less spectacular form. A team delivers quickly, receives recognition, and then for the next couple of years has to make up for the lack of trust. It wasn't that the code was wrong. But the difference in confidence between "it works on my machine" and "it won't destroy your production data" is staggering. πŸ˜…

The Real Question Isn't About AI

I may not be anti-AI in codebases, I use AI tools every day. But I treat AI-generated code the same way I treat code from a junior engineer - it needs review proportional to the blast radius.

The explosion range of multithreading programming inside a JavaScript runtime is one of the highest. Thirteen thousand unsafe blocks required thirteen thousand good reasons to have them. Not thirteen thousand rubber stamps.

β†’ High-velocity AI generation demands higher-velocity review, not lower.
β†’ The number of unsafe blocks isn't the scandal. The apparent lack of proportional scrutiny is.

This is not to blame Bun's team or their aspirations. Jarred and the team have done a truly amazing job with the technology. However, being ambitious without being similarly careful in your systems code is not being bold. It's a liability bomb that has a delayed fuse. πŸ’£

Where This Leaves Us

Bun still has the potential to become a great runtime. However, sending out such a massive amount of AI-generated unsafe code without a concurrent GC can establish a dangerous standard for those who are either building on top of it or working on building anything using AI-generated systems code.

The conclusion to draw isn't "let's not use AI for that." It is that the level of review must be proportional to the blast radius, particularly when the person who wrote the code is incapable of reasoning about the cost of being wrong.

Would you run 13,000 AI-generated unsafe blocks in production under your app? What's your threshold for trusting AI-written infrastructure code?

Top comments (0)