The lumberjack paradox: From theory to practice

#ai #programming #cloud

Previously, I shared my thoughts on Neal Sample’s "lumberjack paradox" and the urgent need to build the systems thinkers of tomorrow. I argued that leaders must move beyond simple efficiency and focus on re-engineering the experience (Dr. Gary Klein) and creating context to ensure we don't lose the path to deep expertise.

But what does "leadership as context creator" look like in practice?

For us in Cloud DevRel Engineering, it isn't abstract. It comes down to how we manage the most fundamental unit of our developer experience: the code sample.

As Neal notes, AI will lead to the "industrialization of creativity"—an infinite supply of ideas and code. In this world, the premium shifts to discernment: the ability to distinguish the quality from the mediocre.

But this isn't a choice between the axe (manual craft) and the chainsaw (AI). The modern expert needs both.

If you only have the axe, you are restricted to the problems that fit within manual reach. It is the perfect tool for the campsite, but it cannot clear the forest.
But if you only have the chainsaw, without the judgment to guide it, you are dangerous. You lack the control to distinguish a clean cut from a destructive one.

You need the deep expertise of the axe to get the precise, consistent outcomes from the chainsaw.

From theory to practice: The catalog as ground truth

In my previous post, I mentioned Dr. Richard Cook's work on "building common ground" and Donella Meadows’ warnings about suboptimization.

In Cloud DevRel Engineering, we realized that our code samples are the primary tool for building this common ground. In Dr. Cook’s terms, they form the "Line of Representation"—the tangible surface that connects the human "above the line" to the complex system "below the line."

When a developer (the human) learns a new platform, the sample is their manual for the "axe." When an AI assistant generates a solution, the sample is the training data that guides the "chainsaw."

When we looked at our systems, we saw suboptimization. By treating samples as low-priority content maintained by individual contributors, we created a fractured reality.

We broke the Line of Representation.

We saw this failure hit on two fronts:

We break the human judgment loop: If samples are inconsistent, developers cannot learn "good" from "bad." We fail to re-engineer the experience (Dr. Klein) necessary to build expertise.
We poison the AI well: AI models ingest our official repositories. The AI learns them, scales them, and feeds them back to the user.

We are currently witnessing exactly how this hand-crafted approach fails at scale.

The high cost of "geological strata" in code

Without central standardization, our repositories accumulated "geological strata"—layers of outdated practices—because manual maintenance cannot keep up with language evolution. This makes it hard to know what is correct today.

Node.js' paradigm tax: Our Node.js repositories contain a mix of callbacks, raw promises, and async/await. A user learning Pub/Sub sees one era, while a user learning Cloud Storage sees another. The AI sees all of it and treats it all as valid, stripping away the context of "outdated" versus "modern."
Python: The contributor long tail: With over 650 contributors, our Python samples suffer from extreme fragmentation. The total cost of ownership (TCO) of manually bringing thousands of older snippets up to modern Python 3.10+ standards is astronomically high, so it simply doesn't happen. This leaves a massive surface area of "technical debt" that the AI happily recycles.

Inconsistent quality creates "false best practices"

When samples are hand-written by federated teams, personal "developer flair" masquerades as industry best practice. Users copy-paste these patterns, inadvertently adopting technical debt.

Java's Framework creep: Instead of teaching the core platform, contributors often introduce heavy frameworks for simple tasks. This increases the "time-to-hello-world" and teaches the AI that simple tasks require complex dependencies.
Python vs. Go: Most Go samples handle errors correctly because the language forces it. Many Python samples show only the "happy path," skipping critical distributed systems patterns like exponential backoff or retry logic. The AI then generates code that looks clean but fails in production.

The hidden cost of incoherence

This is the "suboptimization" Donella Meadows warned about. It is not enough for individual samples to be correct in isolation; they must function as a cohesive unit.

For a human developer, shifting between products that use different coding styles creates friction. They have to spend mental energy decoding the "dialect" of a specific product team rather than focusing on the logic.

For an AI, this lack of cohesion is even more dangerous.

The Context Gap: When our samples for Cloud Storage look structurally different from our samples for BigQuery, the AI treats them as unrelated entities. It fails to learn the underlying "grammar" of our platform.
The Integration Failure: When a user asks for a solution that combines these products, the AI struggles to bridge the gap. Lacking a consistent pattern to follow, it often hallucinates a messy, "glue code" solution that is brittle and insecure.

By allowing fragmentation, we aren't just impacting the docs; we are training the AI to misunderstand how our platform is supposed to fit together.

Get started

We cannot view code samples as static documentation. They are the active constraints of our system—the "environment" we design for our users. If we fail to maintain them, we dull the tools that build developer judgment, and we degrade the quality of the AI they trust.

Coming up next

Next in this series, I will share our structural solution: the "Golden Path." This approach moves us away from isolated automation and towards a human-led, AI-scaled system that improves consistency.

I’ll be focusing more on the strategy in this series, but the execution is its own journey. Using AI to write code is well-known, but relying on it to produce production-ready educational content? Two engineers from my team will soon be sharing the technical reality of our team's shift in a post on their 7 takeaways from generating samples at scale with Gemini.

Until then, ask yourself:

Are you trying to automate away your documentation debt without first defining a standard of quality?
Are your samples strong enough to serve as the "ground truth" for the AI models your developers rely on?

Special thanks to Katie McLaughlin, Adam Ross, and Nim Jayawardena for reviewing early drafts of this post.