Back when we were learning to code, we started with assembly. Then C. Then C++, Java, Python, whatever syntax came next. Each new language was a new tax to pay before you got to write anything interesting. We complained. We paid. We knew the deal.
Now we're learning AI. And the funny thing is, the syntax part doesn't matter that much anymore. AI writes it faster than you can type. (Faster than you can type correctly, which, if you've seen my code, is a low bar.)
So I thought this whole AI thing had given me wings.
Turns out it gave me a magnifying glass.
It doesn't hide what I don't know. It focuses on it. Every fuzzy thought I have, every shortcut I've been getting away with, every "I'll figure that out later" gets concentrated, scaled up, and shipped to production with my name on it.
Here's what I mean.
Design (or: the part where the model reads my mind, badly)
Before any code lands, the design has to exist somewhere in a concrete form. You can't just toss out "make me a thing that does X". You have to spell out the inputs, the outputs, and where it'll fall over when something goes wrong. If you can't write that down yourself, the model is just guessing. It will guess confidently. It will guess in well-formatted code. The guess will still be a guess.
Here's the story. CI broke on me one morning. I did what any reasonable person does when CI breaks before coffee: I patched the symptom. Regenerated a lockfile, updated a few test assertions, pushed, green. Felt productive. Closed the laptop. Got coffee.
Two days later, the same kind of failure showed up. Different package, same shape. (You can see where this is going. I, at the time, could not.)
That's when I realized the fix had only worked because I never asked the harder question, which was: why does this keep happening, and what would actually stop it from happening again?
The design in my head was "fix CI". The design I needed was "what is the lifecycle of a dependency update in this repo, and where exactly can a broken lockfile sneak through". The AI was perfectly happy helping me with the first one. It would have helped me with the second one too. I just never asked.
That's the part nobody tells you about coding with AI. The model is incredibly good at executing the question you bring it. It will not bring you the question. The question is your job. (This was supposed to be the easier part of programming, remember.)
Code (or: the empty trades list problem)
Then the code has to match the design you wrote down. And this is where AI gets creative. It's more than happy to produce a pile of code that looks fine line by line but has wandered off into a completely different forest by the end.
Let me show you a real one. I was generating a strategy template for a backtest engine. The strategy needed to support entering long, entering short, and closing positions. Three states. Not subtle.
What came back looked correct on a scan. Functions in the right places, types matching, no obvious smells. I shipped it into the framework and ran a backtest. Hit Enter. Watched the progress bar.
The backtest finished. No errors. No warnings. Trades list: empty.
I'll let that sit for a second. The backtest completed. It just hadn't actually traded.
I went back and read the generated code more slowly. Here is what I found:
def check_open_conditions(self) -> tuple[bool, bool]:
long_signal = self.cumsum[0] > 0
short_signal = False # <-- this
return long_signal, short_signal
The model had taken "support long and short" and apparently decided, in the privacy of its own neural network, that this strategy was probably long-only. So it hardcoded short_signal = False and moved on with its day. Looked fine. Compiled fine. Passed type checks. Quietly ate the entire reverse-signal exit path.
(I want to be clear: the AI did not do this maliciously. It did this with the same cheerful confidence it does everything. That's what makes it worse.)
There was a second one too. The framework had a check_close_conditions() method for stop-loss and take-profit. The generated subclass implemented it. The base class never called it. AI had wired up the front half of a feature and quietly dropped the back half. Like getting a chair delivered, fully assembled, but only the seat. No legs. No back. Just a flat circle of wood on your living room floor. Functionally a chair? Sure, technically.
This is the magnifying glass working. Writing by hand, I would have hit the empty trades list inside ten minutes and known exactly which file to open. The AI version got me all the way to "ran successfully, zero trades" before anything pinged.
The only thing that helps, and I really do mean the only thing, is making the model walk you through what it wrote, piece by piece, and confirming as you go. It is slow. It is boring. It is the part I keep wanting to skip and the part I keep getting punished for skipping.
Test (or: the function ran, therefore it works)
Tests have to verify the same intent the design had, not just that the function runs without throwing.
In the empty trades story above, this is where the wheels came off. The backtest ran. It did not throw. If I had written a test that asserted "no exceptions during backtest", that test would have passed and I would have shipped a strategy that holds every position until the heat death of the universe.
The intent was "this strategy enters, exits, and produces a list of closed trades". The test you actually need is something closer to "the trades list is not empty, and each trade has both an entry timestamp and an exit timestamp". A completely different test. Not harder. Just different. And I have to write it, because the AI cannot guess the difference between "the function works" and "the function does the thing I meant".
If you let the AI write the tests from the code it just produced, you are not testing anything. You are notarizing it. The hardcoded short_signal = False becomes the documented behavior. The missing check_close_conditions() call becomes the spec. Congratulations, the bug now has a test guarding it.
These stages, by the way, are not actually sequential. They pretend to be. They are not.
You go back to the design because a test forced a decision you skipped. Then the code changes. Then the test changes again. The CI failure I patched in a hurry that one morning? Going back to it properly meant rewriting the design for how dependencies flow through the repo, which changed how the lockfile gets validated, which changed what the pre-commit hook checks for. It bounces around until the design stops moving and the code and the tests finally agree on what they're supposed to be doing.
What changes when AI is in the loop isn't the structure of any of this. It's the cost of being sloppy at any of the three.
Writing code by hand was forgiving. You could be vague in your head, and the act of typing would sort it out, or the compiler would, or you'd notice the bug within a few lines. AI removes all those forgiveness mechanisms. It runs with whatever vague thing you gave it, produces something that looks right, and only a real test tells you that you didn't actually know what you wanted.
I still slip back into just letting it run sometimes. Still working on that part.
The ladder and the magnifying glass
After enough of these, the failure mode stops looking like a bug. It starts looking like something else.
Step back for a second.
Look at the ladder we've been climbing. Assembly. C. C++. Java. Python. JavaScript. Whatever framework you used last week. And now AI. Every rung made the world a little more vivid. More things became possible. Software got more colorful, more strange, more alive. The future of software, looking up from where we stand, looks more colorful still.
But here's the thing nobody is saying out loud. All of that color came out of human heads. The richness on every rung came from someone sitting somewhere thinking hard about something they wanted to exist. C did not invent C++. C++ did not invent the things people built with it. The languages were never the source.
AI is on the same ladder. Same rung, even.
People keep talking about AI as if it changed the equation. It didn't. It's another language. A weirder one, maybe, but a language. If something in your head is fuzzy, AI is under no obligation to sharpen it for you. It wasn't trained to. The magnifying glass effect (the way it scales up whatever you give it) is what the training selected for. The fact that it cannot show you yourself, the way a real mirror would, is also what the training selected for. C didn't show you yourself either. Neither did Java. We just didn't notice because the languages were slow enough that we filled in the gaps ourselves.
Or that's how I've come to see it, anyway. I could be reading too much into a tool. But this is where I keep landing.
So when people ask whether human programmers still have a place in this future, I think the question is upside down.
Can a magnifying glass focus without light?
Sit with that one for a second.
Can a mirror show a face when no one is standing in front of it?
Find me on GitHub | Substack | StratCraft
Top comments (0)