Satoshi Nishimura

Posted on May 17

LLMs Diverge, Humans Converge — LLMs Can't Come Up With Ideas

#llm #claude #ai

LLMs can't come up with ideas.

The output of an LLM (Large Language Model) tends to be divergent. It moves in the direction of deriving combinations from its training data. Good ideas, on the other hand, are convergent. They solve multiple problems at once with a single mechanism.

When using LLMs, I think it's important to keep this difference in mind as you proceed.

In this article, I want to describe why it's difficult for LLMs to design databases, how this may accumulate as small effects even in programming, and what the fundamental reason behind this is.

The gravity of training data overwrites instructions in CLAUDE.md

When you have Claude Code generate SQL, short aliases are often used. Things like staff_dept as sd, orders o, or order_items oi — table names abbreviated to one or two characters.

This itself aligns with widespread convention. Look at Stack Overflow or technical posting sites, or look at SQL textbooks, and you'll often find short aliases. Claude Code faithfully reproduces this.

In intermediate-level SQL books, short aliases are treated critically. In the programming world, it's already common sense that short variable names like u or t1 are considered bad.

So you explicitly state in the project's CLAUDE.md, "Do not use table aliases; write them out in full," but Claude Code occasionally ignores this. Not frequently, but it happens repeatedly. Claude itself knows that "it's better not to use meaningless aliases like single characters." Claude would understand this if it reasoned about it on its own. It's also written in CLAUDE.md. And yet, as t1 still ends up in the SQL.

The aliases make the text shorter, but at the design level, they lose information about "what this table means."

What this seems to suggest is that LLMs are strongly pulled by the statistical majority of their training data. Instructions work locally, but especially when tackling difficult problems, they seem to tend to be overwritten by the training data.

What an LLM outputs is the "plausible continuation token" from the perspective of the training data distribution. Essentially, it's probabilistic interpolation, and the statistical tendencies of the training data become its output tendencies as-is.

In the case of SQL aliases, the overwhelming majority of SQL statements in the training data use aliases. Even if you write "don't use them" in CLAUDE.md, that's just one instruction in one document, competing head-on with patterns seen across enormous amounts of SQL during training. When directly instructed, it works, but when attention isn't directed there, the training data side tends to win.

Database design is difficult with LLMs

I believe that "designing databases with LLMs is difficult."

Code written by programming experts is published in vast quantities on GitHub, but production schemas designed by database experts don't exist much on the internet. Many are non-public internal assets of companies. On the other hand, there are tons of beginner-level "getting started" blog posts. LLMs get pulled toward beginner designs.

LLMs learn not only from schema files, but also from expert discussions on dba.stackexchange.com, technical books on normalization and index design, and migration files from production OSS like GitLab, Discourse, and Redmine. These are higher quality than the "I tried it" articles, but they have the aspect of sacrificing relational design for the sake of generality.

In addition to the bias in training data, there's also the issue of missing context.

The system's access patterns, data volume and growth rate, organization-specific business rules, and future feature plans — these are invisible from the outside. They're not written in the codebase or schema, and even if there are documents written somewhere, they're meaningless unless learned in strong association with the DB schema.

DB design needs to proceed while considering multiple query patterns, multiple business constraints, and multiple future requirements. And even if all the information were provided, there's concern that it may still be difficult for an LLM.

In the past, I asked Claude Code to improve a slow SQL query. In an environment where I could connect to the DB, investigate execution plans, and iterate on SQL through trial and error, I requested, "Please make this faster."

Claude Code's suggestions made it somewhat faster but didn't lead to a fundamental solution. Giving up, the solution I came up with was to swap the starting point of the FROM, reverse the direction of the JOIN, and reverse the WHERE search conditions as well. Then I added functional indexes and partial indexes, making it possible to use index scans as much as possible.

When I asked Claude Code afterward, it knew each individual optimization as knowledge, understood that this approach would work, and could explain it in more detail. But it couldn't come up with it.

What does this mean?

What LLMs seem to be good at is "solving a given problem by improving it little by little." Adding an index, adding a hint, revising JOIN conditions. They can do each individual thing.

Swapping the starting point of the FROM is an operation that rewrites the very premise of "what should this query be built around." It's not tweaking the inside of the current query — you need to rethink the entire thing all at once.

The direction of fine-tuning what's given and the direction of rebuilding from the premises are different. The LLM's gravity is strong in the former direction and weak in the latter.

Coding and refactoring

When you ask Claude Code to refactor, it reliably consolidates copy-paste.

If it finds three similar functions, it extracts the common parts and consolidates them into one function. If magic numbers are scattered around, it extracts them as constants. If there are multiple similar loops, it extracts them into helper functions. It does surface-level refactoring techniques almost without fail.

However, it doesn't venture into another kind of refactoring.

It doesn't point out "this class and that class are actually different representations of the same concept." Nor does it say "shouldn't this field be moved to a different class." It will do it if you instruct it in detail one by one, but it's hard for it to discover situations where "the cost of change is high but the benefits are also many."

For superficial refactoring, the fact that input/output doesn't change can be guaranteed by syntax alone. Consolidating copy-paste doesn't change the meaning of the code. LLMs can step into this through pattern matching that searches for "duplication exposed on the surface of the code."

For structural refactoring, equivalence of input/output can't be judged by looking at code alone. The judgment that "this class and that class are the same concept" requires a lot of information, such as business meaning, future extensions, and consistency with other parts.

Also, regarding ordinary coding, the code that LLMs write tends to be long and verbose. This is natural when you solve problems one by one.

A programmer above a certain level will habitually think in the back of their mind about how to write code while keeping it simple, and when the code gets messy, they feel a pull toward refactoring.

LLMs are the opposite — when given no instructions, they move in the direction of exhaustively adding individual cases. Even when strongly instructed to write while keeping things simple, they seem to tend to forget while thinking about complex specifications or code.

LLMs can't come up with ideas

A way of thinking I like comes from an episode told by Nintendo's Shigeru Miyamoto. What is an idea? An idea is something that solves multiple problems at once.

I think LLMs can't come up with ideas in this sense.

LLM output is interpolation within the training distribution. It outputs "what's likely to come next." This is strongly oriented toward combining "textbook solutions" to a given problem.

Ideas face the opposite direction. They survey multiple problems and search for "is there a single mechanism that solves all of these at once?" This is not "what's likely to come next" in the training distribution but the discovery of a problem structure that hasn't been noticed yet. It's the work of finding a common solution space hidden between problems, and by definition, it doesn't come out of statistical interpolation of training data.

Git's internal data structure has barely changed for 20 years since its first commit in 2005.

Four objects: blob, tree, commit, and tag. Content addressing by SHA-1. DAG structure. Refs. With just these, it supports distributed development around the world. The interface layer has changed, but the core data structure has not been touched.

Git's data structure solves multiple seemingly independent problems simultaneously with the single mechanism of "use the hash of content as the ID."

Tamper detection: If the hash doesn't match, it's been tampered with. This holds automatically as a byproduct of the design.
Deduplication: Files with the same content have the same hash, so they naturally become one. No dedicated mechanism for deduplication is needed.
Identity judgment of DAGs: Identity judgment of commits is done by hash comparison. Can be judged in O(1).
Consistency between distributed repositories: If two repositories have objects with the same hash, they are the same thing. Synchronization over the network is just "send only the IDs you don't have."
History consistency: A commit's hash includes the parent commit's hash. So if you rewrite even one piece of past history, all subsequent hashes change. Tampering with history is structurally detected.

These are independent challenges. They could probably also have been solved with separate mechanisms one by one. Digital signatures for tamper detection, hash tables for deduplication, timestamp comparison for synchronization, and so on.

What Linus did was bundle these problems into a single mechanism. "Use the hash of content as the ID." With just that, tamper detection, deduplication, identity judgment, distributed synchronization, and history consistency all hold as byproducts.

Multiple independent problems are solved simultaneously with a single mechanism.

It doesn't become obsolete over time. Twenty years from its release, the core data structure has not needed to change. Nothing additional needs to be added — it's complete.

This way of solving multiple problems together, converging multiple concepts into one and solving them all at once — that is what an idea is.

LLMs diverge, humans converge

So far I've used the words "add," "bundle," and "converge." That LLMs move in the direction of "adding things." That ideas take the form of "bundling multiple problems into a single mechanism." The difficulty of "converging" multiple constraints into a single design. These came up in different contexts, but structurally they point to the same thing.

To put it simply, this is divergence and convergence.

LLM output is divergent: It moves in the direction of deriving combinations within the training distribution and increasing volume. Adding aliases to SQL. Mass-producing helper functions in refactoring. Wrapping code in try/except, adding null checks, adding comments. Presenting individual solutions to individual problems. All are "adding" operations, and these can be called divergence.
Completed ideas are convergent: They solve multiple seemingly independent problems simultaneously with a single mechanism. The Mario mushroom, Git's content addressing, Unix's file abstraction — all converge multiple concepts into one, in a form that doesn't need to be decomposed further. These can be called convergence.

From this difference in direction, we can organize why LLMs can't produce this kind of idea.

1. The training data has no "world before it"

LLM training data is documents written in a world after Git became common. Reconstructing the problem set from before Git, in a world where Git is built in as a known premise, and rethinking it as "solve this with a single mechanism" — this is structurally difficult given the training data. A completed idea has seeped into the entire subsequent world, and the state with it removed cannot be reproduced from training data.

2. It's not "plausible continuation" but "rewriting the problem definition"

LLM output is something probabilistically interpolated against the given context. Because it's a device that outputs "tokens likely to come next," it's optimized for returning "textbook solutions" to given problems. An idea is not solving a given problem, but the work of rearranging the premises of the problem. This is the same structure seen with the SQL search direction reversal. "Adjusting the inside while preserving the shape of the current query" is the former, and "swapping the starting point of the query" is the latter. The LLM's gravity is in the former direction, and weak in the latter.

3. Ideas work in the direction of reducing the complexity of problems

Rather than adding new features, they converge multiple concepts into one. This is the essential form of an idea. LLM output is the opposite direction, moving toward diverging combinations of training data. It's good at "adding" and bad at "bundling."

Small idea-like judgments are always being made

The ideas used in the works of the world-famous Shigeru Miyamoto and in Linus's Git are groundbreaking. But on a daily basis, there are tons of smaller ideas.

In the work that intermediate-or-above programmers do on a daily basis, countless small convergence judgments are made in succession.

"Whether to consolidate completely different business operations that have similar structures"
"Whether to make a flag into an enum, and the storage format in the DB"
"Whether to add a deleted_at column"

Compared to the ideas of Nintendo or Git, all of these are tiny. But the structure is the same.

A programmer's job is not just coding. Whether you can always execute these small convergence judgments. In a day's work, dozens or hundreds of these micro convergence judgments are made. Their accumulation forms the quality of the entire codebase.

The same is true for DB design. "Should I split this table or merge it?" "Where should this field belong?" "How far should I generalize this naming?" Intermediate-level engineers make these judgments one by one while surveying multiple constraints.

The same is true for finalizing detailed specifications. "Should this flow branch here, or be consolidated?" "Should this edge case be an error, or a default value?" "Should this API be synchronous or asynchronous?" Each judgment is converged within countless constraints.

Returning to the topic of LLMs, the contrast becomes clear.

LLMs can execute these micro convergence judgments if instructed. If you explicitly ask, "Please consolidate this group of flags into an enum," or "Please rewrite these three cases using polymorphism," it will rewrite them. The capability is there.

The problem is that without instruction, by default it doesn't move toward convergence. Rather, when left without instruction, it drifts toward divergence. Adding flags, increasing case statements, adding arguments, adding defensive checks, mass-producing helper functions. All in the direction of "adding."

This can be expressed not as "lack of capability" but as "the direction of gravity is different." LLMs operate while constantly receiving pressure to diverge. To move them toward convergence, a human from outside needs to insert instructions and pull the direction back.

LLMs can "be made to converge if instructed" in each individual judgment, but by default they drift toward divergence. In situations where countless micro convergence judgments are needed, it's not realistic to keep instructing each one. As a result, the entire codebase gradually becomes divergent.

In other words, the limit of LLMs is not just that they can't come up with groundbreaking ideas. From the level of the small convergence judgments that intermediate-or-above programmers unconsciously make every day, the direction of gravity is already different from that of LLMs.

References to divergence and convergence

Related information about LLMs

The contrast that "LLMs diverge and humans converge" is something I wrote down as it occurred to me, but I investigated whether others have mentioned it.

AI-generated code increases code smells, which accumulate over time.
Along with the spread of AI coding, code duplication has increased and the refactoring rate has decreased.
Throwing conceptual questions at AI is more productive than leaving code generation to AI.
AI can produce ideas that look plausible, but cannot produce ideas that hold up when executed.

References:

Debt Behind the AI Boom: A Large-Scale Empirical Study of AI-Generated Code in the Wild (2026, 300,000 commits, 6,275 repositories)
AI Copilot Code Quality 2025 (GitClear, 211 million lines, 2020-2024)
How AI assistance impacts the formation of coding skills (Anthropic RCT, 2026)
The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas (Stanford, 2025)

Salesforce and non-programmers

The question "can non-programmers build business systems?" didn't start with the LLM era. It's a question that Salesforce, Airtable, Notion, and Microsoft Power Platform have been tackling for 30 years.

Their solution is not "make it so non-programmers can write code" but a direction that restricts design freedom. This can be thought of as equivalent to "letting the system itself take on the convergence judgments." Data model design requires convergence judgments, but it's difficult for non-programmers to do this on their own. So by narrowing down the convergence options themselves, the system can be made to work.

"Can AI let non-programmers build business systems?" is equivalent to "Can AI take on the role that Salesforce's constraints play, even without those platform-level constraints?"

Programmers and non-programmers have different expectations of LLMs.

For programmers, an LLM is a "divergence device": something that quickly diverges from a design they've converged.
For non-programmers, an LLM is a "translation device": something that converts what they want to do into working code.

Evaluations diverge based on these different expectations. For non-programmers, the very concept of convergence activities like refactoring is absent, and systems developed with LLMs may continue to diverge.

The divergence that followed Bitcoin

It's difficult to "produce" a groundbreaking idea, but it's also difficult to "preserve" one. A completed convergence idea is often diverged even by those who follow.

The Bitcoin paper can be called a "completed convergence idea" alongside Git. It realizes the problem of "implementing electronic cash that prevents double-spending without a trusted third party" by simultaneously solving multiple seemingly independent challenges (double-spend prevention, absence of central authority, communication with untrusted parties, consensus, incentive alignment). Multiple problems converge into the single mechanism of the blockchain.

After Bitcoin, there was a trend of applying "blockchain technology" to other domains. To supply chains, voting, certificate issuance, healthcare. But these proposals lack the perspective of blockchain as an idea that solves a problem.

In other words, not only LLMs but even humans, when handling a converged idea carelessly, end up diverging it.

Summary

LLMs can't come up with ideas. An idea is something that solves multiple problems at once. LLMs diverge, humans converge.

Even completed convergence ideas are diverged by the hands of those who follow. Divergence happens naturally if left alone. Convergence collapses unless consciously maintained.

When using LLMs, just being conscious of whether the task in front of you is "divergence" or "convergence" makes it clear what to delegate and what to do yourself.

There are many tips and theory articles about LLMs in the world, but they may be traces of trial and error in figuring out how to suppress divergence and steer toward convergence.

Afterword

The text of this article itself is something I converged into writing from things that came to mind while using Claude and from divergent exchanges in chat.

DEV Community