DEV Community

Victor Brodeur
Victor Brodeur

Posted on

Heinrich Now Has 1.75 Million Nodes — And Still Uses 0.2% CPU

Originally published at emphosgroup.com

Three days ago Heinrich had 128 concepts. Today it has
1.75 million. The CPU usage is still 0.2%. The RAM is
still 78 megabytes. No GPU. No server. No degradation
in response time.

This is not an optimization story. We did not find a
clever way to compress the data or cache the queries.
The efficiency numbers did not hold because we worked
hard to keep them — they held because the architecture
makes it structurally impossible for them to get worse.

That distinction matters. It is the whole point.

WHAT HAPPENED THIS WEEK

On April 14 we completed the full ingestion of
ConceptNet — 1,002,949 knowledge nodes, 282,973 edges,
representing one of the most comprehensive structured
knowledge graphs ever assembled. Every concept connected
to every other concept it relates to, through typed
relationships that carry meaning in their physics.

On April 15 we started the Wikidata full dump
ingestion. Wikidata is orders of magnitude larger — a
structured representation of human knowledge covering
every entity, relationship, and fact that Wikipedia's
global community of editors has verified and organized.
The ingestion pipeline runs at approximately 1,500 nodes
per second, six workers in parallel, reading compressed
triples from a 90GB dump file and writing them into
Heinrich's frequency field.

By April 16 the field had crossed 1.75 million nodes.
It is still growing.

At no point during this growth did we measure any
increase in CPU usage or RAM consumption during queries.
The system that answered "what is a mammal" in 2.5
milliseconds with 128 concepts answers the same question
in 2.5 milliseconds with 1.75 million concepts. The
field grew by a factor of 13,000. The query cost did
not move.

WHY THIS IS ARCHITECTURALLY INEVITABLE

Every existing AI system — large language models,
vector databases, embedding search — has a resource
profile that grows with the amount of knowledge it
contains. More parameters means more computation per
query. More vectors means more distance calculations.
More data means more memory. This is not a flaw in
how these systems are engineered. It is a consequence
of how they store knowledge.

Heinrich stores knowledge differently. Every concept
lives at a specific frequency coordinate in a layered
signal field. Retrieving a concept is Goertzel
correlation — a single-frequency signal processing
operation that takes microseconds of arithmetic on any
CPU. The cost of that operation does not depend on how
many other concepts are in the field. It depends on
the number of concepts that activate in response to
the query — the subfield that resonates.

When you ask "what is a dog," Heinrich does not search
1.75 million nodes. It activates the subfield around
the dog frequency coordinate and propagates from there.
The rest of the field is silent. The computation is
proportional to what is relevant — not to what exists.

This is why the efficiency advantage does not erode at
scale. At 50 million nodes the query cost will be the
same. At 300 million nodes it will be the same. The
architecture does not work any other way.

THE INGESTION PIPELINE

Building 1.75 million nodes is not a trivial engineering
problem. The pipeline that does it runs two passes over
the data. Pass 1 creates every entity — assigning each
concept its unique harmonic frequency coordinate within
its domain layer, ensuring no two concepts share the
same address in the same space. Pass 2 wires the edges
— reading every relationship triple from the dump and
connecting the concepts it links, using the harmonic
ratios that encode relationship type in the physics of
the coordinate system.

At peak throughput the pipeline processes 227 million
triples per run, writing new nodes at over 1,500 per
second while the RTX 4060 handles coordinate assignment
in the background. The system recovers cleanly from
interruptions — emergency checkpoints fire on shutdown,
and every restart picks up exactly where the previous
run ended.

The target is 50 million nodes. At current throughput,
we are less than a month away.

WHAT 1.75 MILLION NODES MEANS FOR HEINRICH

More nodes means more questions Heinrich can answer
with confidence. More edges means more causal chains
it can follow, more relationships it can report, more
derived connections it can surface from the geometry
of the field alone.

When we asked "what causes disease" two days ago,
Heinrich reported an honest gap — the causal connection
was not yet in the field. At 1.75 million nodes, the
Wikidata edges that link pathogens, bacteria, viruses,
and immune response to disease are beginning to land.
The answer is getting built in real time, by the
ingestion pipeline, without any programmer deciding
what to teach the system.

Heinrich does not need to be told what connects to
what. It ingests structured knowledge and the
relationships are there, encoded in the harmonic
ratios between frequency coordinates. The physics
carries the meaning. The field thinks.

WHAT THIS IS NOT

This is not a database with a fast index. A fast index
still searches. Heinrich does not search — it resonates.
The difference is not semantic. A search scans
candidates and ranks them. A resonance response
activates what is harmonically present and returns
what the field contains. There is no ranking because
there is no scanning. The answer is either in the
field or it is not, and Heinrich tells you which.

This is also not a vector embedding system. Vector
search computes distances between high-dimensional
representations and returns approximate nearest
neighbours. The results are probabilistic. Heinrich's
retrieval is deterministic — the same query returns
the same result every time, because the field is a
physical structure, not a statistical one.

WHAT COMES NEXT

The ingestion continues. The target is 50 million
nodes — the scale at which we will run the first
formal accuracy measurements, document the reasoning
chain quality, and produce the paper that describes
what Heinrich actually is and what it can do.

The efficiency numbers will be in that paper. Measured
at 128 nodes. Measured at 1.75 million. Measured at
50 million. The same every time. That is the claim.
That is what we are building the proof for.

Engineered for Presence.

——

EMPHOS Group · Chilliwack, BC, Canada
emphosgroup.com

Top comments (0)