Gamya

Posted on Jun 21 • Edited on Jun 25

When Judgment Becomes the Bottleneck

#discuss #ai #watercooler #productivity

Treating judgment as inspectable state

A few days ago I published a lighthearted post about building a coding mascot generator with Google AI Studio. The app itself — MascotCraft Studio, complete with a mascot named Octo-Byte — wasn't the point of the post. It was a fun side project. But the comments turned into something I've been thinking about ever since.

The Comment That Stuck With Me

Someone left a comment that's been rattling around in my head:

"We're moving from an era where implementation was the bottleneck to one where judgment becomes the bottleneck. When anyone can generate code, interfaces, and integrations in minutes, the differentiator becomes identifying worthwhile problems, defining clear requirements, and recognizing whether the result is actually good."

I read that, nodded, moved on with my day — and then kept coming back to it.

What "Implementation Was the Bottleneck" Used to Mean

Think about what it took to build something like MascotCraft Studio even three or four years ago. You'd need:

Someone who knows frontend (to build the UI)
Someone who knows how to call an image generation API
Someone who knows how to call a language model API
Someone who knows how to wire those together into a coherent app
Someone who knows how to deploy it

That's a team. Or at minimum, a single person wearing a lot of different hats, each requiring real expertise.

I described what I wanted in a paragraph. The implementation step — all of the above — happened in minutes.

So... What's Left?

If the hard part used to be "can we build this," and that part is now fast, what's the hard part now?

Based on that comment thread, it's things like:

Identifying worthwhile problems. Anyone can generate an app. Generating an app that solves a problem someone actually has is different.
Defining clear requirements. My prompt for Octo-Byte was reasonably specific, but Gemini still made a bunch of decisions I didn't ask for — color palettes, visual styles, a gallery feature with local storage. Some of those were great. One of them (the gallery using localStorage) was pointed out by another commenter as something that wouldn't actually hold up if this were a real product — saved mascots vanish if you switch browsers or clear your cache.
Recognizing whether the result is good. This is the one I think about most. I looked at Octo-Byte's bio and thought "this is charming and well-written." But charming and well-written isn't the same as correct or appropriate for the use case. Evaluating output quality is its own skill, separate from being able to produce output at all.

The Part That's a Little Uncomfortable

Here's the thing I keep circling back to: judgment isn't something you can prompt your way into.

You can ask an AI to "review this code for bugs" or "tell me if this design is good," and it'll give you an opinion. But knowing whether that opinion is trustworthy — knowing enough to push back, to say "actually, for my use case, that tradeoff doesn't make sense" — that still requires you to understand the problem space yourself.

In other words: the easier it gets to generate things, the more it seems to matter that you actually understand what you're generating and why. It's less "know how to build everything yourself" and more "be able to tell good implementation from bad, quickly, across a much wider range of things than you could personally build by hand."

What This Means in Practice

I don't have this fully figured out, but it's shifted how I think about a few things:

When I look at a piece of code or a generated feature now, I try to also think about "what would a wrong but plausible-looking version of this look like?" — because that's the version judgment needs to catch.
When AI tools generate something for me (like Gemini did with MascotCraft Studio), I try to actually read through what was added rather than just checking "does it work." The localStorage gallery point only came up because someone else looked closely enough to notice it.
I'm less worried about "will AI make skills obsolete" and more curious about "which skills are becoming more valuable because of this shift" — and judgment, evaluation, and knowing what questions to ask seem to be high on that list.

An Open Question

I don't have a tidy conclusion here, because I don't think there is one yet — this feels like something the whole industry is figuring out in real time. But I'm curious: if "judgment becomes the bottleneck," how do you actually practice and sharpen that judgment deliberately, rather than just hoping it accumulates as a side effect of experience?

If you've got thoughts on this, I'd genuinely like to hear them. 🌸

I wrote this article based on concepts I work with regularly — AI assisted with grammar, structure, and readability.

Top comments (36)

Sloan the DEV Moderator • Jun 24

Hey, this article appears to have been generated with the assistance of ChatGPT or possibly some other AI tool.

We allow our community members to use AI assistance when writing articles as long as they abide by our guidelines. Please review the guidelines and edit your post to add a disclaimer.

Failure to follow these guidelines could result in DEV admin lowering the score of your post, making it less visible to the rest of the community. Or, if upon review we find this post to be particularly harmful, we may decide to unpublish it completely.

We hope you understand and take care to follow our guidelines going forward!

csm • Jun 21

When personal computers were first introduced, some choose to use it for office tasks, some choose to use it for gaming and some for making those software.
At the end of the day programming is all about giving instructions to the computer.
Whether they pass through assembler or python's interpreter or chatgpt's prompt, we get results.
Bottlenecks are present in all times:
In c world its about pointers and memory , in python world its about objects and references, and in rust world its about ownership and borrow checker, and in AI world its about context, memory and tokens.

Its true that Judgment is key, but I think its just about looking at the output!
If the output is matching our needs and expectations then its fine!

But, one thing we need to differentiate here:
Good Judgment or judgement is about making AI to work and get quality results
Its different from identifying worthwhile problems!

Because, not just AI, any thing like a programming language or any tool is just a tool.
We can't decide for what it should be used for!
For an office accountant only excel and PowerBI are worthwhile things, because they solve their problems.
For a Marketing guy or a designer, figma is worthwhile.
For, a pro gamer, video games are the worthwhile things.

Now, think atleast for video games, many people used to think they are waste of time and money, yet they exist!

A kid draws a sketch with a pencil, thats the same tool used to draw a pro drawing.
Can we say the kid has no right to use pencil for some rough sketches?

All I want to say is a professional in his office time will handle the real important things using AI as craft,
while a guy in his free time will enjoy the outputs of AI as art!

Gamya • Jun 22

Really enjoyed this perspective! 😊 The bottleneck-per-era framing is a great way to think about it — pointers in C, ownership in Rust, context and tokens in AI. Each era has its own version of "the thing you have to actually understand to get quality results."
And the pencil analogy is spot on — the tool doesn't define the worthwhile problem, the person holding it does. I think where I'd add a nuance is that "looking at the output" as judgment works well when you already have enough domain knowledge to know what good looks like. The tricky part is when someone is new enough to a space that the output looks right without them being able to tell it isn't—which is where the deeper judgment piece comes in. But that's probably a whole separate post! 🌸

csm • Jun 22

"The tricky part is when someone is new enough to a space that the output looks right without them being able to tell it isn't—which is where the deeper judgment piece comes in."
True, I agree!

Gamya • Jun 23

Glad that landed! 😊 It's probably the trickiest part of the whole shift—the less you know, the harder it is to spot what's wrong, which is exactly when you're most likely to trust the output without question.

Michael Salinas • Jul 10

Thank you for sharing such an excellent post. I really enjoyed reading it.

I’m a Python Full-Stack Engineer with over 10 years of experience designing and building scalable software solutions for clients across a variety of industries. Along the way, I’ve learned that successful projects depend not only on strong technical execution but also on creating real business value.

With my recent contract completed, I’m exploring new opportunities to collaborate with professionals who value innovation, practical problem-solving, and long-term partnerships. I enjoy discussing ideas that combine technical excellence with sound business strategy, creating outcomes that benefit everyone involved.

I believe every connection has the potential to become something meaningful. If you're interested in exchanging ideas, exploring opportunities, or simply connecting with someone who enjoys building impactful technology, I'd be happy to hear from you.

Wishing you success in your future endeavors, and I look forward to connecting.

Gamya • Jul 11

Thank you! Wishing you all the best in your future endeavors as well! 🌸

Michael Salinas • Jul 11

Thanks. Don't you have any intention of working together to achieve something biggest?

Gamya • Jul 12

I appreciate the thought! I'm pretty heads down on my own projects right now, but best of luck with yours.

Michael Salinas • Jul 12

It will not interfere with your work.
You can gain sufficient additional benefits while building your experience.
You will likely understand this if you have a detailed conversation with me.
Best

CapeStart • Jun 23

We've spent years learning how to build things. Now we are learning how to decide which things are worth building.

Gamya • Jun 24

That one sentence captures the whole shift perfectly. 😊 The tools for building have never been more accessible — which makes the "worth building" question the one that actually separates good outcomes from fast ones.

Mike Czerwinski • Jun 21

Your closing question — "deliberately practice judgment rather than accumulate it accidentally" — is the exact problem I've been trying to solve operationally.

What worked for me: treat judgments as inspectable state, not internal feelings. Every architectural decision goes into a separate store with status (proposed/accepted/locked) and a reason field. A few months in, the store is already a readable trace of how my judgment actually evolved — which calls I got right, which I reversed, why. Practicing judgment turns into reading your own record.

Wrote up the framework angle separately: dev.to/jugeni/vibe-coding-is-not-a-level-its-an-axis-12gb — yours is the why this matters, mine is one possible how.

Gamya • Jun 22

This is a really practical approach—"inspectable state, not internal feelings" is such a useful reframe. The idea of a decision store with a reason field is something I hadn't considered, but it makes a lot of sense: you can't really review a judgment you never recorded, and most of us just carry it around implicitly until something goes wrong and forces a retrospective.
Reading your own record as the practice is elegant too — it turns judgment from something abstract into something you can actually audit. Going to check out your piece on the framework angle now! 🌸

Mike Czerwinski • Jun 22

"Something you can actually audit" is the frame that matters — once it's a record you can review, the practice mostly runs itself. Most retros only fire when something breaks; a decision store flips that to retros the file schedules, not ones pain forces. Hope the framework piece lands.

Gamya • Jun 23

"Retros the file schedules, not one's pain "forces"—that's a really clean distinction. Reactive retros only catch what already went wrong visibly; a scheduled review catches the drift before it compounds. Read the framework piece and left a comment there too! 🌸

Mike Czerwinski • Jun 23

Yeah — "catches the drift before it compounds" lands. Reactive retros also self-select on visible failure, so the drift that hasn't broken anything yet just stays drift until it does. Scheduled review moves the catch upstream of the breakage. Will look for your comment — thanks for the second read.

Gamya • Jun 24

"Self-selects on visible failure" is exactly the gap — anything that's drifting but hasn't broken yet is invisible to a reactive process by definition. Scheduled review is the only way to catch what hasn't announced itself yet. Really enjoyed this whole thread!

Mike Czerwinski • Jun 24

The reactive-vs-scheduled split is the one I will be reaching for again, and the "invisible by definition" framing you put on it is the part I want to keep. Enjoyed the thread too.

Gamya • Jun 25

Really glad it landed that way — and same, this has been one of the more genuinely useful threads I've had on DEV. Thanks for bringing the operational depth to it! 🌸

Theo Valmis • Jun 24

The quote you kept coming back to is the whole shift in one sentence. The part people underweight is 'recognizing whether the result is actually good.' That judgment doesn't scale by hiring more reviewers, because at generation speed the volume outruns them. It scales when the requirements are encoded as something the output gets checked against, so 'is this good' stops being a per-PR gut call and becomes a property you can enforce. Defining clear requirements is becoming the real engineering.

Gamya • Jun 25

"Defining clear requirements is becoming the real engineering" — that reframe is really sharp. The bottleneck isn't just human judgment at the review stage, it's whether the standard for "good" is encoded anywhere that can actually keep pace with generation speed. Moving "is this good" from a per-PR gut call to an enforceable property is a completely different kind of problem than the one most teams are set up to solve. Really useful extension of the idea — thank you! 🌸

Hiren Kava • Jun 21

Hi,

Thanks for sharing your article. I really liked your perspective that AI is shifting the bottleneck from implementation to judgment—it highlights a practical understanding that building software is no longer just about writing code, but making sound technical decisions and evaluating trade-offs.

I have a couple of technical questions related to our current project:

In a production blockchain application where AI helps generate parts of the codebase, how would you validate the correctness and security of smart contract interactions and backend transaction flows before deployment?
If you were designing a high-availability backend for an on-chain payment or bridge system, what monitoring, alerting, and failure-recovery strategies would you implement to ensure reliability during RPC outages or network congestion?

Gamya • Jun 22

Thank you for the kind words! 😊 Those are really interesting questions, though I have to be honest — smart contract validation and blockchain backend architecture are quite a bit outside my current area of focus (I'm primarily in iOS/Swift land!). I wouldn't want to give you half-baked answers on something as critical as on-chain payment systems.
For those specific challenges, you'd likely get much better responses from developers with direct production blockchain experience — might be worth posting them as a standalone discussion thread on DEV where that community can weigh in properly!

Hiren Kava • Jun 22

Thanks for your honest and thoughtful response—I really appreciate the transparency.

Gamya • Jun 23

Of course! 😊 Always better to be upfront than to guess on something that critical. Good luck with the project!

Hiren Kava • Jun 23

😎

Nazar Boyko • Jun 21

The "what would a wrong but plausible-looking version look like" habit is the one I'd steal from this. Most code review trains you to check "does it work", which AI output sails right past because it usually does work, just not the way you needed. On your open question, the thing that's grown my own judgment fastest is writing down what I expect before I run something, then seeing where I was off. You don't get that feedback loop if you only ever look at the result after it's already right. Have you tried keeping a record of the calls Gemini made that you'd have made differently?

Gamya • Jun 22

"Writing down what I expect before I run something" — that's such a concrete way to build the feedback loop, and I hadn't thought about it quite that way before. The gap between prediction and result is where the actual learning happens, and you're right that skipping straight to the output short-circuits that entirely.
I haven't kept a formal record of Gemini's calls I'd have made differently, but after reading this I'm genuinely considering it — especially the localStorage gallery decision, which is the exact kind of thing that would've shown up in that kind of log. Thanks for the practical suggestion! 🌸

NOVAInetwork • Jun 21

Your line "what would a wrong but plausible-looking version of this look like" is the whole skill in one sentence, and it is also the answer to your closing question. You sharpen judgment by forcing yourself to produce the plausible-wrong version on purpose, before you trust the real one.

Concretely, the practice that has moved my judgment fastest: before I accept any non-trivial output, I write the failing test first, the one that should fail if the thing is wrong, and I make it actually fail for the reason I expect before I let the fix make it pass. The discipline is not the test, it is that I have to articulate what wrong looks like in advance. If I cannot describe the failure, I do not understand the thing well enough to judge the output yet. That gap is the signal.

The other half is refusing to accept "it works" as evidence. It works is the happy path. Judgment lives in the failure modes, so I make myself enumerate them: what does this do on malformed input, on a dropped node, on the edge state. Most plausible-wrong output passes the happy path and dies in exactly the case I did not think to name. The 90 percent right version is more dangerous than the obviously broken one, because nothing trips an alarm.

So to your question directly: judgment does not accumulate as a side effect of experience, it accumulates from the habit of predicting failure before you look for it. Experience only sharpens it if every time you are surprised, you ask why your model of "wrong" missed that case. The surprises are the training signal, not the successes.

Gamya • Jun 22

"If I cannot describe the failure, I do not understand the thing well enough to judge the output yet" — that line alone is worth saving. It reframes the whole question from "does this look right" to "can I articulate what wrong would look like," which is a much harder and more honest bar.
The point about the 90% right version being more dangerous than the obviously broken one really landed too. The thing that trips no alarms is exactly the thing that costs the most later, because nothing flags it for review and it gets built on top of.
And yes — "judgment accumulates from the habit of predicting failure before you look for it" is the clearest answer to my closing question I've seen in any of the comments. The surprises being the training signal, not the successes, is going to change how I think about this going forward. Thank you for this! 🌸

NOVAInetwork • Jun 22

Glad it landed. The one caution I would add to "surprises are the training signal": only if you actually log the surprise the moment it happens. My instinct when something surprises me is to fix it and move on, and the lesson evaporates. The habit that makes the signal real is writing down what my model predicted versus what happened, right then, before the fix makes it obvious in hindsight. The surprise is only training data if you capture it while it still feels like a surprise.

Gamya • Jun 23

That caveat is really important — "before the fix makes it obvious in hindsight" is the key phrase. Once you've solved it, the gap between prediction and reality collapses and the surprise stops feeling like a surprise, which is exactly when the lesson quietly disappears. Capturing it while it still feels wrong is the whole thing. This is basically the same point Nazar made about writing down expectations before running something — the discipline has to happen before you know the answer, not after.

محمد خالد باجرش • Jun 24

شكرا على هذا مقال

View full discussion (36 comments)