DEV Community

Cover image for The New Information Borders
Ken W Alger
Ken W Alger

Posted on • Originally published at kenwalger.com

The New Information Borders

Recently I came across a discussion about AI crawlers and robots.txt files. The conversation centered on a simple question:

Should website owners allow AI systems to access their content?

One proposed configuration looked something like this:

User-agent: ClaudeBot
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: PerplexityBot
Disallow: /
Enter fullscreen mode Exit fullscreen mode

At first glance this is a reasonable policy decision.

Perhaps a company has a commercial relationship with one AI vendor and not another. Perhaps it trusts one organization more than another. Perhaps it simply dislikes a particular company and would rather that company not benefit from its content.

These are all rational decisions. And worth remembering: robots.txt is a request, not a wall. It governs the crawlers that choose to honor it. The borders we are about to talk about form through compliance norms and licensing agreements, not through technical enforcement.

The interesting part is what happens when thousands of organizations make similar decisions at once.

The Web We Assumed

For most of the modern Internet era, there was an implicit assumption that people were operating from a broadly shared information environment.

Search engines differed in quality. Ranking algorithms differed. Some sources were easier to discover than others. But in general, if two people searched for information on a topic, there was a good chance they were drawing from many of the same underlying sources.

The web functioned as a largely shared corpus of knowledge.

That assumption may not hold forever.

Fragmentation Without Malice

When people discuss information fragmentation, they often jump straight to government censorship, national firewalls, or deliberate propaganda systems.

Those are real examples. But fragmentation does not require any malicious intent.

Imagine the following:

  • Company A blocks OpenAI but allows Anthropic.
  • Company B licenses content exclusively to OpenAI.
  • Company C blocks all AI crawlers.
  • Company D optimizes specifically for one AI platform.
  • Company E maintains a private agreement with a commercial search provider.

None of these organizations is trying to create information silos. Each is making what looks like a reasonable local decision.

Collectively, those decisions begin to produce different information environments. The divergence does not emerge from AI reasoning. It emerges from AI access.

None of these organizations is trying to create information silos. They are simply trying to protect their intellectual property or negotiate a survival-level licensing deal in an ecosystem that no longer sends them traffic. Each is making what looks like a reasonable local decision.

Two Kinds of Access

It helps to separate two things that fragment differently.

The first is what a model was trained on. The second is what a model can reach at the moment you ask it a question.

Today these overlap heavily. Most large models are built from many of the same underlying sources: the same crawled archives, the same bulk licensing deals, the same public web that has been scraped for years. At the training layer, the corpus is still mostly shared.

Retrieval is where the divergence is already happening.

When a model answers using live access to the web, the robots.txt rules, the licensing agreements, and the private deals all decide what it is permitted to pull in right then. One system can cite a source. Another is told it may not look. Same question, different evidence, and the difference has nothing to do with how either model reasons.

So the honest version of the claim is not that Claude and ChatGPT already see two different webs. It is narrower and more defensible:

Retrieval access is fragmenting now. Training access could follow.

That second part is the one worth watching. If exclusive licensing becomes the norm rather than the exception, the divergence stops being a retrieval-time quirk and starts being baked into what each model knows at all. The shared corpus we have taken for granted would quietly stop being shared.

The Difference Between Thinking and Seeing

When two AI systems produce different answers, we tend to assume the difference lies in how the models reason.

Sometimes that is true. Increasingly, though, the more important question may be a different one: what information was the model allowed to see?

An answer generated from complete evidence and an answer generated from partial evidence can both arrive with equal confidence. Only one of them may reflect the full record.

The distinction matters.

A model cannot mourn the data it was never allowed to read. It simply synthesizes a flawless, highly confident answer out of the fragment it has, leaving the user entirely unaware of the missing horizon.

Museums Learned This Long Ago

One reason I spend so much time thinking about provenance is that museums, archives, and historians have wrestled with these questions for decades.

Researchers care not only about what artifacts exist. They care about what artifacts are missing. Absence affects interpretation. A collection missing half of its records tells a different story than a complete one, and a careful researcher never mistakes the surviving fragment for the whole.

AI systems face the same challenge. A model can only reason from the evidence available to it. If the evidence becomes fragmented, the resulting interpretations may diverge even when the underlying reasoning processes remain sound.

The Sovereign Systems Perspective

The Sovereign Systems Specification is built around a simple observation:

Information without provenance is just gossip.

Most discussions of provenance focus on where information came from. The harder and more neglected question is what was left out.

Not only:

Where did this information originate?

But also:

What information was unavailable?

What information was excluded?

What information was never allowed into the system at all?

Absence is itself a provenance category. A record of what a system could not see is as much a part of its lineage as a record of what it could. Those questions become more important, not less, as AI systems become primary interfaces to knowledge.

While commercial cloud models hide their data deficits behind a smooth conversational curtain, a Sovereign system must explicitly map its own borders—declaring exactly what lies within its registry, and where the boundary of its knowledge ends.

The New Information Borders

I do not believe AI is creating separate realities. We are.

Not through any coordinated effort. We are simply making thousands of local decisions about access, licensing, trust, governance, and control.

The cumulative effect may be the emergence of informational borders that are far less visible than national borders but no less consequential.

So here is the thing to watch for. The next time two AI systems hand you different answers, do not stop at asking which one reasoned better. Ask what each one was allowed to see. The gap between them may have nothing to do with intelligence and everything to do with access.

The web once assumed a largely shared corpus of knowledge. The next generation of knowledge systems may not.

When two AI systems disagree, are we observing different reasoning? Or are we observing different worlds?

Top comments (0)