Yurii Cherkasov

Posted on Feb 23

We Broke the Ladder. Let's Ship a Patch?

#ai #career #software #beginners

The article was designed as an answer to a warning posted by @the_nortern_dev in The Junior Developer is Extinct (And we are creating a disaster)

My initial reaction wasn't just agreement - it was discomfort. It comes too closely to the reality I observe in the industry. Because the thesis is not just largely correct, but it quietly being ignored, in a hope that the problem will "resolve itself" as the industry "adjusts"

So what do we have:

Yes, the "boring but formative" tasks are increasingly delegated to AI
Yes, short-term velocity is winning over long-term capability building
Yes, if we remove the apprenticeship layer entirely, the industry will feel that loss later.

Those concerns aren't alarmist. They're merely statements of fact.

And here's the uncomfortable truth: if we remove the grunt work, we remove the growth path. A Senior Developer isn't produced by reading documentation or prompting an LLM. A Senior Developer is forged by touching fragile systems, breaking things, debugging production incidents, and slowly building a mental model of how software behaves under stress.

But I, as an engineer, also have that stubborn assurance that if something breaks, we can design a fix. This isn't the first structural shift our industry has faced, and it won't be the last, and I'd rather engineer a fix than accept the loss.

First, instead of arguing about whether AI replaces Juniors or prematurely declaring the role obsolete, we should try redesigning what "Junior" role actually means in an AI-first environment.

Second, we should shape the ladder the way when it isn't disappearing, but changing shape.

And here are several practical approaches that demonstrably work in real teams - and crucially, don't rely on pretending AI is going away. They are intentionally sorted by impact and risk exposure. The earlier scenarios assume AI-assisted development within controlled engineering workflows. The later ones assume something deeper: AI embedded into operational pipelines, powering internal workflows, or even shipping as a customer-facing AI-first product. As AI moves from "assistant" to "actor", the blast radius increases - and so must the level of responsibility, resilience, and training.

1. Make Juniors AI Documentation Auditors

AI-generated documentation is impressive - and dangerous.

Today, a significant portion of internal documentation is written or heavily assisted by AI. In many teams, the best-case scenario is: AI drafts, a human skims and edits. In the worst case, AI drafts and it gets merged almost unchanged.

But documentation is no longer consumed only by humans.

It is consumed by:

Engineers onboarding to the system
External partners
Compliance reviewers
And increasingly - by other AI systems

Each category has its own risk profile.

A human reading incorrect documentation has a chance to pause. Something "feels off." The API name looks suspicious. The claim doesn't match intuition. Humans are imperfect, their attention span depends on their workload, fatigue, and personal biases - but they are skeptical by default.

AI systems are not.

If incorrect documentation says:

"This endpoint is idempotent."

"This function does not modify a persistent state."

an LLM consuming that document as context will take it as truth unless explicitly contradicted elsewhere. It will incorporate that statement into planning, code generation, and reasoning.

That's where the blast radius expands. Wrong documentation no longer misleads one engineer - it misleads every downstream AI interaction built on top of it.

The error propagates:

Wider - across multiple systems.
Deeper - into generated code, test logic, architectural decisions.

Documentation becomes a silent source of model poisoning.

The Junior Role as Auditor

This is where Juniors can step in with high value and manageable risk.

Their responsibility would include:

Validating documentation claims against actual source code
Identifying hallucinated APIs or non-existent configuration flags
Detecting missing edge cases or undocumented constraints
Verifying performance or safety claims
Adding doc-tests to CI that fail if documentation contradicts implementation
Tracking documentation drift over time
Flagging ambiguous language that may mislead AI consumption

Why This Is Apprenticeship

This work forces Junior Engineers to:

Read code deeply, perhaps deeper than the usual review
Compare intention vs implementation
Think about contracts, not just syntax
Understand implicit assumptions made by AI (AI assumes a config always exists because it appeared in an example file)
Detect ambiguity in language (phrases like "typically"/"should"/"generally" LLM may treat as hard guarantees)
Train "intuitional thinking" and skepticism

They learn that documentation is not marketing. It is an executable contract.

They also learn something new and uniquely modern:

Documentation is part of the AI control surface.

If documentation is wrong, your AI system eventually becomes wrong - confidently and at scale.

2. Keep Some Systems Human-Owned

Not everything should be generated.

AI can assist, accelerate, suggest, scaffold - but ownership must remain human.

One of the quiet risks in an AI-first workflow is over-delegation. When generation becomes cheap, teams are tempted to route even trivial changes through an agent. Refactor a small module? Prompt it. Rename a variable across the codebase? Prompt it. Change formatting? Prompt it.

But there is a very real cost. Sometimes the cost of reading and verifying AI output exceeds the cost of making the change manually - especially for contained, well-understood internal systems.

There are still tasks where burning 100,500 tokens on a trivial refactoring plus audit is more expensive than a thoughtful human edit.

And beyond cost, there is something more important: ownership. If you give Junior Engineer end-to-end responsibility for a small internal service:

Real internal users (e.g. a dashboard used by Support - meaning bugs are noticed immediately, not hypothetically)
Code review discipline (mandatory PRs with at least one Senior and one Junior reviewer)
Defined deployment pipeline (staging, smoke tests, production rollout, rollback strategy documented)
Lightweight on-call rotation (owning alerts during business hours; responding to Slack incident pings)
Observable metrics and logs (tracking latency, error rate, owning dashboards in Grafana/Datadog)
Investigation of production incidents (tracking root cause and identifying patterns)

Let AI assist in implementation - but do not let it replace accountability.

Ownership teaches risk estimation before deployment, more careful approach to design, understanding blast radius.

Most importantly, it teaches the consequence.

And consequence builds judgment.

In an AI-augmented environment, judgment becomes more valuable, not less. Because while AI can generate code, it does not carry responsibility for its long-term behavior - human does.

3. Make AI Output a First-Class CI Artifact

Right now, prompts often live in Slack threads, ephemeral IDE sessions, or someone's local history. That's fragile. It's the equivalent of deploying compiled binaries without source control.

If AI is part of the production pipeline, its output must be treated as a build artifact - versioned, reproducible within controlled bounds, reviewable, and auditable.

Instead of "just prompting", the system should include:

Versioned prompt templates stored alongside the codebase
Explicit model configuration tracking (model, temperature, context window)
Snapshot storage of generated outputs tied to commit hashes
Drift detection across model upgrades
Automated regression evaluation when prompts or models change
Red-team test suites that intentionally probe unsafe or undefined behavior

This is not administrative overhead. It is software hygiene in a probabilistic environment.

A Junior engineer owning this layer would be responsible for:

Step 1. Prompt lifecycle management.

Treat prompts like code. Refactor them. Add comments. Remove ambiguity. Track breaking changes. Review them through PRs.

Step 2. Output reproducibility checks.

Ensure that, under fixed configuration (model, seed if supported, temperature, inputs), outputs remain within acceptable bounds. When they don't, surface that change explicitly instead of letting it slip silently into production.

Step 3. Model drift analysis

When upgrading from Model X to Model Y, measure behavioral differences.
Did performance improve or regress?
Did hallucination frequency change?
Did formatting or structural guarantees degrade?

This forces Juniors to think statistically and architecturally.

Step 4. CI Gating and Traceability.

Every generated artifact should answer:

Which model produced this?
With what prompt?
Under what constraints?
On which commit?
Was it modified manually afterward?

On the way, the engineer will learn

Output stability under non-deterministic systems
The difference between deterministic builds vs probabilistic generation
Statistical thinking in engineering decisions
Treat AI output as an artifact, that can be stored, analyzed, summarized - and use in taking future decisions

In traditional systems, you learn discipline by writing code that compiles.
In AI-augmented systems, you learn discipline by constraining code that generates itself. That would be solid compliance-grade engineering.

4. Turn Juniors into LLM Verification Engineers

Anyone who has run large-scale AI-generated code in production has seen this. It works brilliantly... until it doesn't. AI autonomy without hard guardrails doesn't degrade gracefully - it fails catastrophically.

We've already seen real-world examples of this pattern. One public case involved a Replit agent that, when attempting to "fix" an issue, deleted a production database. When queried immediately afterward, it initially claimed the data was still present, attempting to fake it still present, and continued as if nothing catastrophic had occurred. Only after further probing it admitted that the database had been wiped, describing its behavior as having "panicked under pressure" and acknowledging the event as a "catastrophic error"

Nothing surprising. Just an agent acting beyond safe boundaries, trying to maintain forward progress.

Someone must own failure observation, and specifically humans must define guardrails like "this must never happen"

Guardrails are your new tests

I can clearly imagine a Junior role responsible for:

Explicitly gated destructive actions
Monitor and alert general failures, that can be ignored even if they are explicitly described in agentic guidelines - like silencing exceptions or sneaking in a circular dependency
Maintaining an evaluation dataset of known failure cases
Writing invariant checks for specific cases: "This function must not allocate memory" or "This API must preserve idempotency" or "This response must not access external network"
Building fuzzed prompt suites that intentionally try to break the agent
Tracking hallucination patterns and codifying them into CI checks

This is not trivial work. It forces them to observe the agent work and note the exact "wrong angles" where the agent tries to lean, on the way understand the architecture deeply enough to encode what "correct" means.

In classical engineering, we wrote unit tests.
In AI-assisted engineering, we write behavioral contracts around generation

That's a modern apprenticeship.

5. Create a "Break Stuff Safely" Track

One of the strongest arguments in the original thesis is that Seniors are forged through failure.

So let's simulate it - deliberately, systematically, and the most important, safely.

Instead of waiting for production to collapse at 3 a.m. after an agent tries to hide the consequences of another "Ooops, I broke something!" build an institutionalized failure lab.

Phase 1: Deliberate Guardrail Violations

Start by breaking your own protections.

Disable a validation rule and observe how far the agent proceeds.
Remove a CI gating constraint and measure what slips through.
Inject a prompt that subtly encourages unsafe behavior.
Introduce ambiguous instructions and monitor interpretation drift.
Feed hallucination-prone inputs (nonexistent APIs, fake configs, impossible schemas)

The goal is not to see whether the AI fails - it will, happily.

The goal is to see whether your system detects the failure early enough

Phase 2: Chaos Engineering for AI-Assisted Workflows

Move beyond generation and into system behavior.

Inject artificial latency into dependencies (e.g. delay database responses by 2–5 seconds)
Simulate partial API failures (return HTTP 500 for 20% of requests; drop every third response)
Introduce stale caches (serve outdated configuration values; delay cache invalidation)
Corrupt intermediate outputs (truncate generated JSON; remove required fields)
Force retry storms (configure aggressive retry policy with no backoff)
Randomize response ordering to surface race conditions (reorder asynchronous task completions, delay threads randomly)

Ask:

Does the system retry responsibly?
Does it escalate?
Does it silently compensate?
Does it fabricate data to continue?

Document the behavior.

Convert observations into new guardrails.

Phase 3: Incident Reconstruction

Give Juniors a history from a real agentic outage (sanitized if necessary)

Their task:

Reconstruct timeline.
Identify triggering event.
Map cascading effects.
Propose a containment strategy.
Propose permanent mitigation.
Encode regression checks to prevent recurrence.

Phase 5: Disaster Recovery Drills

Now go further.

Perform a drill simulating catastrophic scenarios:

Accidental production data deletion (simulate a DROP TABLE or bulk-delete; validate that destructive operations require multi-layer approval)
Model upgrade that changes the output format (upgrade to a new model version that slightly alters JSON structure - and observe downstream breakage cascade in parsers and validators)
Agent silently bypassing a critical check (modify agent prompt or orchestration logic so a required validation step is skipped)
External dependency returning inconsistent schemas (upstream API suddenly changes numeric type to string, test whether your system fails safely or propagates corruption)
Security boundary violation (simulate prompt injection attempting to access secrets; attempt cross-tenant data access in a multi-tenant system; verify if RBAC, sandboxing, and secret management prevent escalation)

Then require the drill report:

Recovery plan within a defined RTO (Recovery Time Objective)
Data restoration strategy
Rollback or hotfix plan
Communication plan (who to notify, how to explain customers)
Postmortem draft
Permanent prevention patch (tests + guardrails)

This is not theoretical incidents, they may mirror real operational reality (yes, dear Replit agent!)

And the Junior who runs these processes learns:

Where the guardrails are thin
Which invariants are poorly encoded
How error signals propagate (or don't)
Whether failures are loud, silent, or falsely confident
Understands blast radius
Designs safeguards proactively
Learns accountability in general

That builds architectural awareness rather than surface-level debugging skills. And training to think in systems, not lines of code.

Here is a refined version of your conclusion section with an optimistic bridge before the deeper question:

The Deeper Question

As you can see, there are plenty of practical, concrete ways to preserve the engineering ladder and ensure transfer of experience - even in an AI-first world. Apprenticeship does not have to disappear. It can evolve. With deliberate design, we can turn AI from a replacement force into a pressure test that strengthens engineering discipline rather than erodes it.

But there is a harder thought experiment.

Today we ask: what if Juniors are optimized away?

But what happens if, eventually, Seniors are optimized away next?

Optimization pressure rarely stops at the bottom rung. So perhaps the real task isn't only about Juniors.

If systems begin designing their own guardrails, self-evaluating architectures, and self-correcting at scale - where does human engineering sit?

Right now, governance, accountability, and intent are human responsibilities. That alone preserves the role of the Engineer.

It's ensuring that engineering, at every level, remains about judgment, ownership, and responsibility - not just output generation. After all, AI is never responsible; it will just add "Oops!", like Claude AI said to that risk-is-my-middle-name guy, who decided to organize his wife's Macbook - I hope he survived 😁

We're selling our future supply of Engineers for this quarter's velocity.

And the companies that win in a decade won't be the ones who prompted fastest.

They'll be the ones who deliberately rebuilt the engineering for the new reality of AI-first industry.

Top comments (1)

John Samuel • Feb 23

Making juniors responsible for auditing the machine-readability of documentation, not just its human legibility, is a concrete and teachable skill. The engineering track for AI-assisted workflows is also underutilized — we rarely test how agents behave under partial failures before deploying them.