Posted on Apr 16

I used AI to write 100% of my code for a month. My pull request got rejected.

#webdev #coding #ai #programming

Cursor, Copilot, zero manual typing and the code review that made me want to delete my GitHub account.

You ever submit a pull request feeling like an absolute genius, only to watch it get dissected like a first-year biology frog?

Yeah. That was me. February. Twelve commits deep into what I genuinely thought was my best work in years. My reviewer a senior engineer who communicates exclusively in terse Slack messages and code comments that feel like parking tickets came back with fourteen inline comments. Fourteen. On code I didn’t fully write. On code I couldn’t fully explain.

“What does this function actually do? Walk me through it.”

I stared at that comment for a solid three minutes. I had accepted the suggestion. I had seen it pass the linter. I had shipped it. But walk you through it? That was going to be a problem.

Here’s the context: for the entire month of January, I ran an experiment. Every line of code I wrote or rather, every line of code that went into my repos came from an AI tool. Cursor for the heavy lifting. Copilot for the inline fills. I wrote prompts. I reviewed suggestions. I pressed Tab and Enter more than I’ve ever pressed them in my life. But I did not manually write code. Not a function, not a variable name, not a comment. Full AI, full month, no exceptions.

The first two weeks felt like cheating. The last two weeks felt like a slow-motion disaster I didn’t have the vocabulary to describe yet. And then came the PR.

This is what I learned.

The bet nobody asked me to make

It started, as most bad ideas do, with a Slack message.

A teammate dropped a link to a blog post something about AI agents writing entire microservices overnight. The comments were the usual split: half the thread was "this changes everything," the other half was "yeah but it won't work in production." I read the whole thing, closed my laptop, and thought: I should just find out.

The rules I set for myself were simple. For the full month of January, I would not manually type a single line of production code. Every function, every class, every config tweak would come from Cursor or Copilot. I could write prompts. I could review output. I could refactor suggestions. But the keyboard was for prompting only. No manual coding. Not even a quick fix I "already knew."

My stack wasn't anything exotic a TypeScript backend, some React on the frontend, a handful of Node scripts for internal tooling. Bread and butter stuff. The kind of code I've written so many times I could do it half-asleep. Which, honestly, made the experiment feel almost unfair to the AI. This should be easy for it.

"I'm basically a senior prompt engineer now," I told my team on day three. Nobody laughed. I thought I was being self-aware. I was being an idiot.

Week one was genuinely impressive. I was closing tickets faster than I had in months. Features that would normally take a day were done in a couple of hours. I stopped hitting walls. The blank-file anxiety that specific dread of staring at an empty editor not knowing where to start was just gone. Cursor would scaffold the structure, Copilot would fill the gaps, and I'd stitch it together like a very confident editor who definitely understood every word.

Week two felt even better. I was in a rhythm. Prompt, review, accept, commit. My commit history looked like I'd been drinking coffee intravenously. Tickets closed. Standup updates were breezy. I genuinely started thinking about what I'd do with all this reclaimed time.

What I wasn't doing and this is the part I didn't notice until it was too late was understanding what I was shipping. I was reading the code. I was skimming it, really. Checking that it looked right the way you check that an email sounds okay before sending it. Syntax made sense. Logic seemed fine. Tests passed. Ship it.

I didn't realize I had stopped thinking. I had outsourced the hard part and kept only the approval step. And approval without comprehension is just a rubber stamp with a GitHub avatar on it.

Week three is where the cracks started showing. But I didn't see them yet. I was too busy closing tickets.

Fast is not the same as good

Here’s the thing about speed. It feels like proof. You’re moving fast, therefore you’re doing well. Commits are green, the board is clearing, your manager hasn’t pinged you in three days. What more evidence do you need?

By week three I had shipped more code in a single month than I typically do in six weeks. That number felt great right up until I started actually looking at what I’d shipped.

The first red flag was a helper function I found in the codebase while working on something unrelated. It looked familiar. It also looked almost identical to another helper function three files over. Same logic, slightly different variable names, both AI-generated, both accepted by me on different days. I had duplicated my own code without realizing it because I hadn’t written either version I’d just approved them two weeks apart and forgotten about both.

That wasn’t a one-off. I started finding it everywhere. Little pockets of near-identical logic scattered across the project like the AI had been quietly plagiarizing itself and I’d been too busy closing tickets to notice. Utility functions reinvented from scratch. Error handlers that did the same thing three different ways. Code that worked, technically, but read like it was written by someone who had never seen the rest of the codebase. Because it was.

The AI doesn’t know your project. It knows patterns. And if you don’t own the output, you end up with a codebase full of correct-looking patterns that don’t actually fit together.

Turns out this isn’t a me problem. A GitClear analysis of 153 million lines of changed code found that code duplication increased fourfold when AI tools were in the mix. Fourfold. CodeRabbit found that pull requests containing AI-generated code had 1.7 times more issues than those written by hand. The DORA 2025 report which surveyed nearly 5,000 technology professionals put it bluntly: AI adoption correlates with faster delivery and higher instability simultaneously. More change failures. More rework. Longer time to recover when things break.

Their question for 2026 is one I’d already answered the hard way: we may be shipping faster, but are we any better?

That last number is the one that should make every “AI is the future of coding” LinkedIn post pause for breath. 84% of developers now use AI tools. Only 29% trust what comes out. We’ve mass-adopted a tool we’re collectively skeptical of. We’re Tab-completing our way through codebases while quietly knowing something feels off.

I was in that 84%. I was not in that 29%. And I was about to find out exactly why that gap exists.

The pull request that ended the experiment

The PR looked fine. That’s the thing I keep coming back to. It looked completely fine.

It was a mid-sized feature a data transformation layer that sat between two internal services, cleaning and reshaping payloads before passing them downstream. Probably 400 lines across four files. Cursor wrote the scaffolding in about twenty minutes. Copilot handled the edge case logic. I reviewed, accepted, tweaked a few names, ran the tests, watched them pass, and opened the PR feeling quietly excellent about myself.

My reviewer came back the next morning with fourteen comments.

Not fourteen nitpicks. Fourteen actual problems. Things like:

why does this function re-implement something we already have in the utils folder?
Why are we catching this error and then doing absolutely nothing with it? This pipeline chain works, but who is going to maintain it in three months?

The variable names here are inconsistent with every other file in this project.

And then the first one. The one that actually stung.

“What does this function actually do? Walk me through it.”

I went back to the function. I read it. I read it again. I understood what it was doing in the way you understand a sentence in a language you half-speak you can get the gist, but you’d struggle to paraphrase it out loud. I had reviewed this code. I had accepted this code. I had written the PR description for this code. And I could not confidently explain what it was doing or why it was doing it that way.

The AI had written code that passed my review the same way a student passes an exam by sounding confident. Technically correct. Contextually clueless. Nobody home.

The duplicate utility was the one that really buried me. I had accepted essentially the same normalization logic twice, in two different files, two weeks apart, because I had no mental model of what was already in the codebase. A developer who had written the original would have remembered it, or at least gone looking. I hadn’t written anything. I had no memory of any of it. My own project had become a codebase I was a stranger in.

The PR got rejected. Not closed, technically my reviewer is measured about these things. But “needs significant rework” lands the same way. I spent the next three days actually reading the code, actually understanding what it did, rewriting the parts that were wrong, removing the duplicate, and fixing the null case nobody had tested. It took longer to fix than it would have taken to write correctly the first time.

The experiment was over. Not because I decided to stop. The codebase decided for me.

It’s not just you. The industry has the receipts.

After the PR incident I did what any self-respecting developer does when they want to feel less alone about something embarrassing I went looking for data.

What I found wasn’t comforting exactly, but it was clarifying. I hadn’t discovered some personal character flaw. I had stumbled into an industry-wide pattern that researchers were already documenting in real time.

Start with the trust gap.

That 55-point gap between usage and trust isn’t a small thing. It means the majority of developers shipping AI-generated code are doing it with an internal asterisk “this is probably fine.” That asterisk is doing a lot of invisible work. It’s the same asterisk I had on every commit in January, and it’s the asterisk my reviewer found in fourteen places.

Then there’s the delivery paradox. The DORA 2025 report the closest thing the industry has to an objective measure of software delivery health studied nearly 5,000 technology professionals and found that AI adoption correlates with both faster delivery and higher instability at the same time. Teams ship more. They also break more. More change failures, more rework cycles, longer recovery times when things fall over in production. The gains and the damage arrive together.

We may be faster, but are we any better? That’s not a rhetorical question it’s the central finding of the most rigorous study of software delivery in 2025.

Gartner projects that 60% of new code will be AI-generated by the end of 2026. That number might feel high, but the direction is right. We are moving toward a world where the majority of production code was not typed by a human. The question nobody is seriously asking yet is: who understands it?

Because here’s what the data doesn’t capture the invisible cost of approved code nobody owns. A bug in code you wrote is a bug you can debug. You know the intent. You know the edge cases you considered and the ones you didn’t. A bug in code you rubber-stamped is a mystery. You’re reading it cold, same as your reviewer, same as the on-call engineer at 2am when it falls over.

My PR was a small, recoverable version of that problem. In a larger system, with higher stakes, it’s an incident report and a post-mortem and a very awkward all-hands.

The tools aren’t the problem. The abdication is.

What I actually changed

I’m still using AI tools. I want to be clear about that because this is the part where these articles usually turn into a “I went back to Vim and hand-wrote my Makefiles” manifesto, and that’s not where this is going.

Cursor is still open. Copilot is still on. I still let them generate scaffolding, handle boilerplate, and suggest implementations for things I’ve written fifty times before. The tools are good. The tools were never the problem.

What changed is the contract I have with the output.

AI generates. I own. There’s no middle ground where “I reviewed it” counts as ownership. You either understand what the code does and why, or you don’t ship it.

In practice that means a few concrete things that sound obvious when you say them out loud but apparently needed a fourteen-comment PR to actually stick:

If I can’t explain it, it doesn’t ship.

Not “I could probably figure it out if I spent ten minutes.” Actually explain it, right now, to a rubber duck or a junior dev or a code comment. If I can’t, I go back and understand it before it gets near a PR.

I search the codebase before accepting anything non-trivial.

The AI has no idea what’s already in your project. You do. Before accepting a utility function or a helper or anything more than a one-liner, I spend thirty seconds checking if it already exists. Duplication is a you problem, not a Copilot problem.

I write the tests myself.

AI-generated tests are often testing the AI’s assumptions, not your requirements. Writing tests manually forces me to actually think about edge cases and intent rather than just watching a green checkmark appear.

I stay in the loop on the hard parts.

Boilerplate, scaffolding, repetitive transforms AI can have all of that. The core logic, the tricky edge cases, the things that will matter at 2am during an incident I write those myself, or at minimum I rewrite the AI’s version until it sounds like code I wrote.

None of this is revolutionary. Most of it is just what good code review was supposed to enforce before we all got very fast and very sloppy at the same time.

The experiment wasn’t a failure. It was the most useful thing I’ve done for my own coding practice in years, mostly because it showed me exactly where the floor is. The floor is: you can use all the AI you want, but the moment you stop understanding your own codebase, you’re not a developer anymore. You’re a very expensive approval button.

My next PR had two comments. Both were minor. My reviewer said nothing and merged it.

That felt better than closing any ticket in January.

Conclusion

AI didn’t kill my coding brain. I lent it out voluntarily and forgot to ask for it back.

The tools are genuinely good. They’re getting better. In another year they’ll be better still. None of that changes the fundamental thing: code you can’t explain is a liability with a green checkmark on it. The speed is real. The debt is real too it just doesn’t show up until someone asks you to walk them through it.

Use the tools. Own the output. And maybe, occasionally, close the autocomplete and write a for-loop yourself. Just to check that you still can.

If this resonated or if your PRs have a few too many “what does this do” comments lately drop a response below. You’re probably not alone.

Helpful Resources:

The data behind this article

Tools used in the experiment

Further reading

DEV Community