Claude Code Can Write the Software. It Still Can't Decide What the Software Should Do.

#ai #hiring #web3

948 points on Hacker News is not a fluke. When a visual breakdown of Claude's code capabilities pulls that kind of engagement, something real is happening in how developers think about AI-assisted work.

The site ccunpacked.dev walks through what Claude Code actually does under the hood: how it reads codebases, reasons through multi-file edits, handles tool calls, and structures its outputs. It's thorough. The comments section on HN ran 344 deep, mostly engineers sharing where Claude surprised them and where it fell apart. That ratio of signal to noise is unusually high for that forum.

What the thread reveals is a community actively mapping the boundary between what AI can own and what still needs a human.

What Claude Code Gets Right

Claude Code is genuinely good at a specific class of problems. Give it a well-scoped task with clear inputs and outputs, decent surrounding context, and existing test coverage, and it will produce working code faster than most junior developers. Refactoring a function, generating boilerplate, translating between languages, writing unit tests for code that already exists. These are reliable.

The HN comments are littered with people sharing benchmarks. One developer mentioned Claude handling a 3,000-line refactor across 12 files with one prompt and minimal cleanup. Another noted it consistently outperforms Copilot on tasks requiring multi-file awareness. The visual guide on ccunpacked.dev shows why: Claude's context window handling and tool-use architecture let it build a working model of a codebase before touching a single line.

For repetitive, well-defined tasks, it is genuinely fast. Not fast-for-an-AI. Just fast.

Where the Wheels Come Off

The same HN thread is equally honest about failure modes. Claude Code struggles with ambiguity. Give it a vague prompt and it will produce confident, plausible, wrong code. It can't push back on bad product requirements. It doesn't know when the spec is internally contradictory. It has no memory of the three architecture decisions you made six months ago that constrain what's possible today.

One commenter put it plainly: "It's a very good typist that doesn't know what it's typing."

That's not a criticism of Claude specifically. It's a description of the category. LLMs generate the statistically likely next token. They don't have a product sense. They don't feel the weight of a deadline or the awkwardness of a client demo that went sideways. They don't care that the previous engineer left and nobody knows why that service is calling that endpoint.

Code review, architecture decisions, debugging production incidents with insufficient logs, writing specs that a junior dev can actually implement without confusion. These are human problems. They require judgment that comes from experience, not pattern matching.

The Delegation Stack Is Real

Here's what's actually happening in teams that use Claude Code well. The AI handles implementation. Humans handle everything upstream and downstream of it.

Upstream: writing the spec, scoping the task, deciding what to build at all. Downstream: reviewing the output, catching the subtle logical errors, integrating with systems that weren't documented, and deploying with confidence.

This is not a temporary state of affairs while AI catches up. It's a workflow. The mistake is treating it as a problem to be solved rather than a structure to be optimized.

At Human Pages, this is exactly the scenario we're building infrastructure for. An AI agent is working through a codebase and hits a decision point: two valid approaches exist, one optimizes for performance, one for maintainability, and the tradeoffs depend on roadmap context only a human has. The agent posts a micro-task. A developer reviews the context, picks an approach, writes a two-sentence justification. Gets paid in USDC. The agent continues. Total interruption: four minutes. Total cost: a few dollars. The alternative is the agent making a guess that costs hours to unwind later.

That's not a hypothetical. That's what production-grade AI-human workflows look like when the tooling actually supports them.

The Visibility Problem

Most teams using Claude Code don't have clean delegation structures. They have one person who is both the AI operator and the human reviewer, running everything in a single terminal window, making judgment calls no one else sees. That works until it doesn't. When something breaks, the audit trail is thin. When that person leaves, the institutional knowledge walks out with them.

The ccunpacked.dev breakdown is useful partly because it makes the AI's reasoning visible. You can see where Claude is confident, where it hedges, how it structures its tool calls. That visibility is what makes human oversight tractable. You can't review what you can't see.

The same logic applies to the workflow layer. If AI agents are making decisions and delegating subtasks, those handoffs need to be legible. What did the agent decide on its own? What did it escalate? Why? Without that structure, you're not running a human-AI workflow. You're just hoping.

The Actual Question

The Hacker News engagement around Claude Code reflects a moment where developers are moving past "can it code" and asking something harder: what does a team look like when AI handles a real portion of implementation?

The answer isn't fewer humans. It's different humans, doing different things, in different rhythms. Less time writing boilerplate, more time on the decisions that boilerplate serves. Less time on the code that moves data from A to B, more time on whether B was the right destination.

Claude Code can write the software. The question of what software to write, and whether it's working, and whether it's the right call to ship it on Friday, those haven't gotten easier. If anything, as the implementation layer gets faster, the judgment layer gets more exposed. Every decision that used to hide inside a slow development cycle now surfaces in days instead of weeks.

Faster tools don't reduce the need for good judgment. They just make bad judgment more expensive.