Juraj Kirchheim

Posted on Feb 9 • Edited on Mar 30

Using AI to write a transpiler

#webassembly #ai #javascript

Or: How I used AI to 1000x my speed and also 0.1x my speed - all on the same project.

Recently, I decided I wanted to test out the limits of AI's current capabilities for coding, to get a clear picture, independent of all the hype - or the equally exaggerated attempts to downplay the significance of AI.

It seems to me like the recent success stories largely leverage AI's unparalleled speed when it comes to glueing together a bunch of existing libraries, frameworks and services with a UI to create a tool that scratches an itch. The results - while certainly impressive and undoubtedly useful - don't reveal much about how far AI's capabilities go and which limits they hit. In an attempt to find this out for myself, I tried to tackle a relatively ambitious project, that I always considered potentially useful, but too much effort to justify the result: build "wasmix" - a library that transpiles Haxe code to WASM.

Let's consider this contrived example of how one might pan a stereo track:

class Track {
  public static function pan(left:Float32Array, right:Float32Array, pos:Float) {
    var gainL:Float32 = pos <= 0 ? 1 : 1 - pos;
    var gainR:Float32 = pos >= 0 ? 1 : 1 + pos;
    for (i in 0...left.length) {
      left[i] *= gainL;
      right[i] *= gainR;
    }
  }
}

It is reasonably fast in JavaScript (once hot). It does run 2x faster on V8 using wasmix (optimal WASM with SIMD would be significantly faster still, but getting this speedup "for free" is a decent starting point).

For some context: Haxe is the coolest compile-to-js language that nobody has ever heard of, but relatively immaterial to this article. So for the purpose of this story, let's think of it as TypeScript's geeky older brother with one peculiarity: a powerful macro system (think Lisp/Scala/Rust/Swift) that allows you to fully introspect the code at compile time and essentially do anything you want, like generating a blob of WASM.

The option to transpile your compile-to-js language to WASM directly is interesting, because you can always just compile the code to JS for debugging and use WASM to squeeze out performance for production.

While I know Haxe's macro system like the back of my hand, my knowledge of WASM was less than superficial at the start of this project. So using AI to fill that knowledge gap and deal with a lot of the menial work involved seemed like a good fit. I could learn a lot, both about WASM and about effective AI usage. And produce something useful in the process.

Someone more experienced with AI would have probably taken a straighter line, but still, I arrived at what I consider a rather satisfying result, all the while having fun like a kid on the playground. I mostly used AI as a "research assistant" and as a pair programmer. At first, I would ask for help understanding different WASM capabilities and compare different approaches I was considering. Once I decided on a path to pursue, I tried to let it generate the code and review it myself. If that didn't meet the standards I was aiming for, I would flip things, write the code and let AI do the review.

I have yet to get into true vibe coding, but that's for another time and another project. Having a Haxe → WASM transpiler that nobody understands seems pointless, at least until AI breaches the threshold of being able to truly drive and maintain a project of this sort (whether that's 3 months or 3 decades away I find hard to predict, because this is simply not the direction into which most efforts are directed).

One thing that was also very interesting for me personally was to see how much AI would struggle with some fringe tech like Haxe. More on that later.

The Journey

Let there be data!

Data is usually the best place to start:

Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

-- Linus Torvalds

Firstly, I told AI to generate definitions of the WASM data structures, like the instruction set, the module structure and so forth. It banged it out in under a minute. It would have easily taken me two days to dig through the spec, make sense of it and write down the definitions myself. That's 1000 minutes vs. just one. Quite the speedup. Although frankly, I would have probably abandoned the project even before finishing this basic part, so the effective speedup is closer to infinity.

I've instructed the AI to model the WASM definitions leaning on Haxe's enums, which like in Rust and Swift also carry different data for each case, effectively acting as ADTs: algebraic data types - a bit like tagged unions in more "classical" languages. And ADTs lend themselves well to representing all sorts of complex structures, like Haxe's syntax tree (just like for example TypeScript's syntax tree is a huge tagged union of all sorts of nodes) or the different WASM instruction sets.

So at this point, there was a data structure representing the Haxe code (already provided by the Haxe standard library) and one representing the WASM module. The core problem becomes to map one type of data onto the other. While this requires a lot of effort, there's not all that much ambiguity here, which makes it a great candidate for AI.

Make it runnable

To be able to check anything, we needed to be able to run it. So the next step was to create a writer that could take in the data structure defining a WASM module and turn it into actual WASM for running - and a small loader that actually runs it.

The task of creating such a writer by hand is substantial, for a human. Had I even made it to this point, I certainly would have given up here. But again, for an AI model, it's done in minutes. I should perhaps add that a much smaller subset of WASM would have been sufficient for my use case, so doing it by hand, I would have shaved off some time here and there, but still, for that I would have had to ingest the spec, think about what I need, support that and so on. But none of this was my problem. I could focus my energy on the core problem, namely transforming the given Haxe data structure into the AI-provided WASM data structure.

So, with a proper representation of the WASM module and a writer and a loader, I just let the model define the data of a few WASM modules (one exported function with a bunch of instructions inside) and then write it out and run it. And it worked. A good basis to start iterating.

Haxe → WASM

This is where the magic happens and where I expected AI to largely be useless in any decision making. As it was. I didn't mind, as this was the interesting part anyway. I did lean on AI very strongly, to help me understand multiple things:

the different opcodes and their semantics, especially around jumps
modules and how imports and exports work, allowing to call JS functions from WASM and vice versa
memory management and sharing

Where AI really blew me away was in debugging. I made a single command that would build everything, print the hex encoded WASM, load the code and run it and let the AI iterate with that. There were many issues to work out. Like me getting index space wrong for imports and exports. Or errors in how I computed offsets for memory access. It just disassambled the code into its context, and figured it out what's wrong. It was quite impressive and certainly very useful. I am reasonably sure I would not have been able to summon the energy to do this by hand.

Memory

The subject of memory is critical to this project, because there's not much of a point in using WASM if you don't shovel sizeable amounts of data back and forth. To figure out how to do this, I had AI brief me on the subject.

What I learned is that WASM itself has one block of linear memory and an instruction set to manipulate that. The way you share memory between JS and WASM is through an ArrayBuffer that represents this linear memory. What you usually do is to create typed arrays (Int16Array, Float32Array, etc.) backed by the buffer, fill those with data and then pass the start and end position to WASM so that it can deal with that.

We discussed multiple approaches, but my attempts to steer the model towards a solution that I considered most ergonomic simply failed. We discussed the status of the memory64 proposal, which is not supported on Safari at all and seems to suffer from performance issues too, but somehow that didn't get it on the right track either. My idea was to represent a typed array as an i64 in WASM, with the low 32 bits being the offset and the high 32 bits being the length. This allows a single value to hold the range information (together with the statically known element width). Once I proposed this, I got confirmation that this was the way to go. But somehow I would have expected the model to at least put the option on the table.

I implemented most of this by hand, because the AI generated code was rather verbose and noisy. The version I produced was far terser and intentful, with just one problem: stupid mistakes that monkey brains naturally do (wrong order, inverse logic, etc.). Nothing I couldn't have fixed myself, but AI just did it while I made myself a coffee. I can definitely see myself working like that.

Allocator

This was a task that required quite a bit of reasoning and a clear mental model. I tried multiple AI models and they all struggled.

Essentially, the problem is that creating the typed arrays on top of the WASM memory is a bit cumbersome, because you need to make sure they don't overlap, you need to grow the memory as needed and you want them to be properly aligned, otherwise access will be slow. What's more, you may wish to create new typed arrays from within the WASM code. And possibly free them again, which means you'll need some way of dealing with fragmentation. That's where the allocator comes in.

This is where AI was most underwhelming. It failed to provide useful ideas or to understand the context. For example, it suggested that every block that is allocated should have an extra header to store its size. But since we use all these blocks for typed arrays which know their own length, this adds overhead for no good reason. It also struggled to understand how I was dealing with alignment - for example it saw that I don't deal with alignments bigger than 8 bytes, which I simply don't have to, for typed arrays (64 bit floats and ints are the maximum element sizes). Or that the way I freed certain blocks was actually ensuring that they are contiguous and don't overlap. I had to walk it through.

It was able to spot a lot of bugs, but there were just as many false positives. Which was still a net benefit, just not quite what I was hoping for. Me wasting time on convincing AI that its "reasoning" was flawed, was not a big letdown though. However I did feel rather abandoned in trying to figure out how to approach this. I imagined that after 50 years this is a well understood problem and the models could rely on a giant corpus of knowledge that was undoubtedly part of the training data. Instead, most decisions were left to me - and I'm certainly no expert in this field. Thankfully, the allocator is an opt-in. And who knows, maybe the next version can be fully AI-generated.

Performance optimization

This is where results were anywhere between super cool and a total waste of time. This will mostly be an episode about how AI very effectively misled me.

Once I had the above sample code running (the one that pans a stereo track), I started benchmarking it. I found various issues, many of which AI could fix rapidly. The most important one was that after a high number of runs the performance of the WASM code totally collapsed (got 20x slower per iteration), while in JS it scaled linearly (at least after a certain threshold).

The resolution hinged on a very technical detail, but in the end it fell to me to put my finger on it. This was quite a journey, that shows how AI can go totally off track and take you with it:

The AI first adjusted the benchmark sizes for me, with some hand waving arguments about realistic sizes and whatnot. But I didn't just want nice numbers in a benchmark. I've been toying with audio stuff recently, where a 30s snippet has 1.3M samples per channel and you potentially have hundreds of those. Raw audio is heavy and if this library couldn't be made perform with real-world workloads, it's not worth the hassle.
Then it started doing some really meaningful optimizations in the generated code to reduce the amount of instructions etc. Things did speed up measurably, but the performance cliff was still there. So on one side, it was quite amazing, but on the other side, it was totally besides the point.
To refocus it at the task at hand, I asked it to zero in on the cliff and it tried to narrow it down to potential reasons. It came up with the idea to measure the amount of runs that cause the performance degradation. With some binary search, it determined that 310 are still fine, but at 311 iterations the performance collapsed. That was weirdly specific, yet stable across runs.
When I asked different models to hypothesize, there was a significant amount of back and forth, but interestingly they all had apparently conspired to essentially blame the JIT for dysoptimizing the WASM (V8's JIT operates in two passes, where the first focuses on producing executable code as quickly as possible, while the second one uses more expensive and sophisticated optimizations, that can sometimes backfire - implausible, but not impossible), while optimizing the JS to use SIMD (wouldn't totally surprise me, but doesn't explain the cliff).
This just wasn't convincing and when I asked to sort it out, the AI wanted to optimize the code generation again. Cool, but not really: it doesn't solve the actual problem and also it increases complexity, making it harder to weed out any conceptual problems before focusing on optimization. So to shut that down, I asked it to hand roll some WASM that does look like the optimized code that it was aiming for and to benchmark with that. Good news: it was 30% faster. Bad news: the cliff was still there. At the same point. And it still blamed the JIT. At this point I was reasonably sure I was being bullshitted, but the best way to deal with that was hard numbers.
I showed that where it measured the cliff moves depending on the benchmark's warmup length. This is when it tried to pin it down again and determined that the cliff was somewhere around 800 iterations, no matter how you split them. This did not square with the rather flaky argument it had used to support its JIT hypothesis and when I poked holes in that, it just waffled around to argue the point somewhat differently.
Enough of that. I ran the code in Bun, which uses JavaScriptCore rather than V8, and had the cliff at the same position. The same position. Up until that point I was willing to accommodate the JIT hypothesis, but it was 100% out the window here. Even the AI conceded that point, but had no other ideas to offer.
Facepalm time: The answer was in front of me all along and it was so easy to deduce when I couldn't blame it on a specific JS/WebAssembly runtime. It had to be something that is the same in different JS runtimes and the same in different WASM runtimes and consistently different in each: floating point arithmetic. The JS version uses double precision arithmetic, the WASM version uses single precision! When you pan hundreds of times, at some point one of the channels gets really close to zero, leading to a floating point underflow (the result is too small to be represented), causing it to be represented using subnormals which are slow in hardware. With single precision arithmetic, this point is reached much faster, which is why the JS version reaches the cliff so much later that it didn't occur during benchmarking (even though the numbers in the Float32Array do become subnormals, they are still in the range for normal double precision arithmetic in JS).

That was all. A simple underflow. It wasn't even an issue in wasmix, or V8 or anything, but just how things work. I lost hours here, for something that would've normally taken me minutes to spot, had I not built the habit of delegating all debugging to my assiduous assistant. The other optimizations that came out of it no doubt more than compensated for my loss of time, but still: had I used my brain at the right moment, I could have saved a lot of time and tokens.

Haxe/WASM → Haxe/JS

One thing I was particularly interested in was to figure out if it could be made simple to bridge from generated WebAssembly back into the generated JavaScript. While there are proposals for bringing garbage collection, objects and arrays into WebAssembly, this is still ongoing. I wondered if I could have that today.

As an example, let's take the following simplified waveform drawing function:

class WaveForm {
  static public function draw(ctx:CanvasRenderingContext2D, channel:Float32Array) {
    var width  = ctx.canvas.width,
        center = ctx.canvas.height / 2,
        total  = channel.length;

    for (col in 0...width) {
      var start = Math.floor(col * total / width),
          end   = Math.floor((col + 1) * total / width);

      var lo:Float32 = .0,
          hi:Float32 = .0;

      for (x in start...end) {
        var v = channel[x];
        if (v < lo) lo = v;
        else if (v > hi) hi = v;
      }

      ctx.fillRect(col, (lo + 1) * center, 1, (hi - lo) * center);
    }
  }
}

Calling out from WebAssembly into JavaScript adds two types of cost:

Convenience: You need quite a bit of glue code. For example to actually call the ctx.fillRect from WASM, you will have to create something like function (ctx, x, y, width, height) { ctx.fillRect(x, y, width, height); }, pass it to the WebAssembly module's imports when you instantiate it and then call that.
Performance: The runtime overhead is also significant. Compared to most WebAssembly instructions it is quite expensive. What's more: calling a JavaScripts object's method like so from WebAssembly is even more expensive than just doing it JS, since it has to pass through the glue.

Given the performance impact, one might ask: what's even the point? Luckily, the above code is a great example of when this makes sense, because here it is the inner loop that does most of the heavy lifting. Say you have 200s of audio (3:20, so roughly the average length of a radio track) and want to draw it onto a 100px wide canvas. That means 88200 samples per pixel, meaning that for every ctx.fillRect the inner loop's body runs almost 90K times, rendering the glue overhead negligible.

So the main challenge was to deal with the issue of convenience, which is what wasmix is really about. What's required here is to:

scan the Haxe AST for JavaScript object access
generate imports for glue functions in the WASM module
make use of those imports in the WASM code
generate the glue and pass it to the WebAssembly module at runtime

This was a rather involved task. And the more I got into it, the more I felt the need to refactor my code rather deeply. I did lean on AI quite heavily to provide me with a relatively robust testing harness (even if some of the tests arguably border on nonsense, I got them in the blink of an eye) and it was able to help me tackle various bugs quite promptly.

Learnings

Some episodes of my journey truly surprised me - typically in a pleasant manner. Occasionally, I was stumped by the degree to which AI fell short of my expectations. Probably they were overinflated by the high praise I've recently seen given by developers whose opinion I hold in high esteem. Whether through disappointment or through delight, I definitely feel like I learned a few things that I'd like to share.

Popularity becomes irrelevant

As I said initially, I wanted to see how well AI could deal with Haxe. After all, there is not so much training data out there.

Throughout last year I was using AI with TypeScript, although somewhat less ambitiously, but still to create custom transformers and plug them into an esbuild based pipeline. Comparing Haxe+AI to TS+AI, I would say the following: the models struggled more with reading and writing the code (because of syntactic oddities), but overall I would say they performed better:

Less hallucinations. When dealing with TS, I've seen the model just hallucinate APIs that simply weren't there, e.g. in the typescript package or in the esbuild package. More training data seems to mean the model is far more inclined to reach for its best guess. With Haxe, I could see the model doing far more research.
A stricter language is a better guardrail. TypeScript very effectively shoehorns a type system onto JS, a runtime that is inherently untyped and mushy. Haxe on the other hand is designed so that it will transpile to decent C++ or JVM bytecode. While its type system is in many ways richer, it is less "smart" and creates clearer constraints. Most of the time I've seen the model quickly act on type errors or at least prompt me for help, while with TypeScript it would try all sorts of things and sometimes get caught in a loop.
Terseness helps. Idiomatic Haxe tends to be terser than idiomatic TypeScript. Especially in this domain. With TypeScript I often saw behavior that I can only describe as the model working on some part of the code while completely losing track of what happens in another, which to the best of my understanding indicates that it frequently ran over the context window boundaries.

This leads me to believe that as AI writes more and more code, languages that primarily aim at low entry barriers for humans (such as TypeScript) may become less suitable for coding. AI doesn't struggle with problems the same way humans do. It's probably already a better Rust/OCaml/whatever programmer than the vast majority of human developers can hope to become.

In the software industry, the popularity of a tool was always seen as a decisive criterion on whether to use it in a project or not. And whether to learn it for oneself or not. I think AI can make this irrelevant - and in some cases reverse it fully. Some tools are popular due to their approachability. But that says nothing about how they scale with project size or duration. To push it to an extreme: if coding wasn't such a massive bottleneck, one would probably never pick Java/TypeScript/Python/PHP for any project. This choice had always been based on pragmatism, to deal with market realities: finding programmers was hard enough, the steeper the language's learning curve, the harder. All the while, comparatively speaking hardware is practically free. But we're entering an era where also writing code is almost for free. Other factors become more important:

How good is the signal/noise ratio of a given language?
How strong are the guarantees it can provide through its semantics?
How much performance does it provide? In the problem domain?

So who knows? Perhaps we'll be seeing a renaissance of COBOL, Fortran or Ada? Or some of the more left-field languages may start dominating certain problem domains, like OCaml being quite strongly positioned in high frequency trading, or Erlang in telecommunications. Perhaps some lesser known language lends itself superbly to push work to the GPU, so that the same hardware that runs the largely statistical AI algorithms can complement these with deterministic GPU-optimized algorithms. At the time of writing this, a recent study suggests that Elixir is the best language for AI models, by a significant margin.

People on the AI hype train love to make claims about 100% of all code being AI generated within years. I'm not sure about the time frame there, but once that hypothetical moment is reached, we do have to ask what relevance the learning curve / popularity of languages is.

Wheel reinvention becomes cheaper

There is a drive in many if not most programmers to build everything from scratch. Part of it is no doubt ego, but the true advantage is that you fully own and understand all the code you're relying on (to the degree it's humanly possible). More often than not, it is best to resist the urge and forego said advantage, because of all the effort this entails, which would typically drive you to cut corners as deadlines approach. The resulting quality is often below that of battle tested 3rd party tooling.

With the advent of AI, the sweet spot between reuse and reinvention shifts drastically. Writing tests - an arduous task many developers dread - comes almost for free, making it easier to push the quality of bespoke code. Writing the code itself also becomes much, much cheaper.

Now this is not to say that we should stop using libraries. Not at all. But the automatism of reaching for them should be reevaluated. Many libraries require some amount of glue or bending over backwards to produce results that are fully aligned with your requirements. The cost of instead generating exactly the functionality you need and relying on that may be lower than maintaining said glue (which more often than not relies on the kind of behavior that will break between major updates).

Take wasmix. Had I been truly determined to do it, but without AI, I would have definitely not aimed for outputting WASM directly, but rather WAT - the WebAssembly Text Format - and relied on other tools to compile that to the final WASM. Comparatively, this adds dependencies and makes the build process more complex and slower. But it would still have been a massive time saver, because WAT is way more approachable (human readable, symbolic names instead of indices tracked by hand, etc. etc.).

With AI, I was able to produce a solution that goes from Haxe to WASM directly. The only dependency that this project has is another small library that contains just the signature of JavaScript's BigInt, which unfortunately is not part of the Haxe standard library yet. So it's virtually dependency free. It does not have to invoke other processes. The above sample function takes 600 *micro*seconds to compile. Invoking wasm-tools through the OS (while going through npm) adds hundreds of milliseconds of overhead, before any work is even done.

Code is not an asset, it is a liability

I've felt this way for a long time, even before AI came into play. Code is a necessary evil. And I say that as someone who deeply loves coding and obsessing about code from every imaginable angle.

Many programmers treat code with undue reverence, be it their own or 3rd party. Because they can appreciate all the effort that went into it. And yet: all code is a means to an end. It is technical debt. The best code is no code at all.

This may seem counterintuitive, because until recently code was so expensive to produce that it's hard to attribute negative value to it. But the true value has always been the understanding that emerges as you codify your solution in a formal language. As the saying goes: "Programming is the debugging of the specification". Perhaps this was less apparent when producing even passable code was hard.

Now that writing code is for free, we need to face the fact that having code is not. Projects of the past ran the risk of failing due to an inability to produce the required code. Projects of the future will run the risk of failing due to the inability to maintain the vast quantities of the "free" code they have accrued in no time. You can go from idea to legacy code base in just 48h! What a time to be alive ...

AI is uninspired

Most of the solutions AI proposes are bland and noisy (which probably just replicates the training data). Sometimes you can push it towards elegance or at least creativity, but sometimes even that is unattainable. For application development (my bread and butter work), that's perhaps not that much of a problem, because in the end the quality of the code is not the most defining factor for the UX that emerges from it. That said, clunky verbose code will inevitably create problems in the long run:

longer build times
harder to fit into context (for both humans and LLMs)
more places for bugs to hide
suboptimal performance (loads longer, eats more memory and/or burns more CPU cycles)

So there is some risk of dying by a thousand cuts. Because, again, all code is a liability.

In library code, I definitely feel that AI in its current state needs strong supervision to meet a meaningful standard.

The biggest irony is that while AI enables us to pursue approaches that were previously impractical, it seems to reach for the most conventional option. Those conventions however largely emerged as coping strategies for human generated problems. They may be a thing of the past. Much like the need for something like wasmix or static typing or whatnot. Perhaps I could just let AI produce plain JS and along with a proper test harness, as well as a profiling process that helps pinpoint bottlenecks, for which it can generate the WASM all by itself. Bottom line: AI puts everything into question, but asks none of those questions.

AI has huge blind spots and no mitigation strategy

What AI provides is essentially the statistically most probable answer. When it comes to mushy subjects, that is quite probably the best answer by any meaningful standard. But in hard subjects, like the sciences and engineering, answers can only be correct or not. And often the correct answer is "We don't know", which is the starting point of all scientific discovery.

The claim has been made that AI's tendency to hallucinate or otherwise go off track is because it's been trained and constrained to provide the most helpful answer. For most people and most subjects "I don't know" is not all that helpful. Fair enough. But this becomes an issue when trying to solve a technical problem. You should never forget about that, the way I did when I tried to pin down the performance issue I saw in my benchmark.

A colleague of mine has recently commented on my tendency to begin answering questions with "I don't know", before embarking on an exploration of the problem domain. AI takes the opposite route: it tries to jump into the solution domain, pick the best fit and then argue the point. Most of the time that gets you a good enough solution superbly fast. But in the remaining cases, it will just push you in a wrong direction, that is not viable. You should be prepared for that and when you're not making progress, you should have the habit of asking yourself if that's what's currently happening.

Productivity Gains

For wasmix, the fully AI generated code is somewhere around 60% (and around 95% for the tests, which aren't super sophisticated, but still helped uncover a lot of bugs). And every line of it does something. Things that would have bored me to death to write myself, but things that need doing. Granted, I've seen some repetition that could be reduced, but on the whole, virtually every line of code has crucial information. To be clear: it is the opposite of boilerplate.

I wish to stress the last point, because I've seen many people praise AI for its ability to generate boilerplate. If your need for boilerplate is so pronounced that having AI generate it is a significant time saver, you're using the wrong language or you're using your language wrong - or even both.

However there were some parts of the code, where I really had to take charge, because while what the AI generated wasn't clearly bad, it was evidently not good. Which is not to dump on AI. The auto-completion still sped me up significantly. As did the ability to continue working while I fired off an inquiry in the background that would have taken me quite some googling and probably led me to lose track of what I was doing.

The difference between using AI vs. not using it is that without it, it's unlikely that I would have even started this project, let alone reached any releasable state. To do all the research to write just the required code and actually debug everything until it works, would have easily taken me 10x the time without AI and I would have had to compromise a lot. Instead I was able to direct my attention and energy towards making crucial decisions and focus on the code where it really matters.

So at the bottom line, I am no longer shocked when people say AI has boosted their productivity by an order of magnitude or two. I would say for this project, it was definitely the case. That said, it wasn't exactly reflective of the typical problems I run into as a developer:

Fundamental lack of familiarity with the target environment (in this case WASM).
Very straightforward requirements.
Very clear constraints.
Very few integrations.
Lots of coding to be done, comparatively few decisions to be made.

That's the diametrical opposite of almost every single task I've tackled in recent years.

So yes, AI can be an extreme force multiplier, but also it's still perfectly useless in most and perhaps actively harmful in some software development tasks. This will improve over time, but barring extreme qualitative leaps, it cannot replace the most important resource an engaged developer brings to the table: ownership. AI will never own any design or any implementation, let alone the impact on the user. After two decades of seeing some projects succeed and many fail, I have come to believe that lack of ownership is what the latter all had in common.

In an attempt to illustrate what today's AI does bring to the table, I would say it is perhaps best compared with a robotic exoskeleton, that makes its human wearer comically more powerful, with an observable cost to finesse. It does not render the human irrelevant - if anything, the importance of the non-mechanizable qualities of the wearer becomes far more pronounced. Yes, it makes you much stronger and faster, but if you don't know where you're going, it's about as useful a vehicle as a chair.

Much like before this "experiment", I still remain an AI skeptic, or at least an AI-hype skeptic. But I will admit that I am now mildly enthused. I believe we all should be. For some, the emphasis should be on "mildly", for others on "enthused". You know who you are :P