Most data tools claim they “discover” relationships by reading metadata:
schemas, column names, declared keys.
That works—only in ideal systems.
Most real data environments are not ideal.
The Hidden Assumption Behind Metadata
Metadata-driven tools assume:
Naming conventions are consistent
Foreign keys are declared and maintained
Schemas accurately describe meaning
If two columns share a name, a relationship is inferred.
If they don’t, discovery stops.
This logic breaks the moment systems evolve independently.
What Real Data Looks Like
In practice:
The same concept appears under different names
Keys are implicit, not declared
Legacy systems outlive documentation
You might see order_no, source_id, and ref_key all representing the same thing—
with no metadata linking them together.
From a metadata view, these tables are unrelated.
From a business view, they are tightly coupled.
Names Are Optional. Data Is Not.
Column names are conventions.
Data values are facts.
Names change.
Schemas drift.
Values persist.
If two fields share overlapping value distributions or inclusion patterns,
they are related—regardless of how they are labeled.
Metadata tools don’t look there.
Discovery Must Be Content-Based
At small scale, humans bridge the gaps.
At large scale, that knowledge becomes tribal and fragile.
True relationship discovery must analyze data content, not just structure.
Otherwise, every integration starts from zero.
Closing Thought
Metadata explains what systems intend to mean.
Data reveals what they actually do.
If your discovery logic stops at metadata,
you’re not discovering relationships—you’re assuming them.
And assumptions don’t scale.

Top comments (0)