Originally published at emphosgroup.com
GPT-4o uses 0.34 watt-hours per query. Heinrich uses
0.00003. That is not a 10% improvement. That is not
even a 10x improvement. That is approximately 11,000
times less energy per query — and the gap does not
close at scale. It widens.
This is not a marketing claim. The Heinrich number is
measured directly on a production system running on a
standard Windows laptop, no GPU, no optimization
applied. The GPT-4o number is from OpenAI's own public
disclosure. Both numbers are cited in EMPHOS Group's
Environmental and Resource Efficiency Report, published
April 2026, with full methodology and source
documentation.
The question is not whether the gap is real. It is.
The question is why it exists — and whether anything
the AI industry is currently doing will close it.
The answer is no. And understanding why requires
understanding what the energy problem actually is.
WHAT THE NUMBERS ACTUALLY SAY
GPT-3 required 1,287 megawatt-hours to train. That is
the energy consumption of approximately 120 average
US homes for a full year, spent once to produce a
single model. GPT-4 required an estimated 16,200
megawatt-hours — the equivalent of 1,500 homes for
a year. These are not operational costs. They are the
cost of creating the system before a single user query
is processed.
BLOOM 176B — one of the most carefully measured large
language models — consumes 3.9 watt-hours per query,
measured directly on 16 Nvidia A100 GPUs by Luccioni
et al. in 2022. At 1 billion queries per day that is
3,900,000 kilowatt-hours of electricity consumed
daily by a single model. At the IEA's 2023 global
average grid intensity of 0.4 kilograms of CO₂ per
kilowatt-hour, that is 1,560 tonnes of CO₂ per day
from one model running at scale.
Heinrich at 1 billion queries per day consumes
approximately 30,000 kilowatt-hours. That is 12 tonnes
of CO₂ per day at the same grid intensity. The
difference is approximately 50,000 tonnes of CO₂ per
year — equivalent to removing 10,000 cars from the
road — from a single deployment decision.
WHY THE INDUSTRY CANNOT FIX THIS WITH BETTER HARDWARE
The response from the AI industry to energy concerns
has followed a predictable pattern: better chips, more
efficient data centers, renewable energy procurement,
and claims of improved performance per watt on each
new hardware generation.
None of this addresses the structural problem.
Large language models store knowledge as numerical
parameters — billions of floating-point weights in a
matrix. Retrieving knowledge from those parameters
requires matrix multiplication. This operation requires
GPU-accelerated hardware because CPUs are too slow to
do it at the scale required. The compute cost is
proportional to the number of parameters. The energy
cost is proportional to the compute cost.
Better chips reduce the energy per floating-point
operation. They do not change the number of
floating-point operations required. The efficiency
gains from hardware improvements are real but bounded.
The structural cost of the architecture remains.
Renewable energy procurement changes where the energy
comes from. It does not change how much is consumed.
A data center running on solar power still consumes
the same number of kilowatt-hours as one running on
coal. The carbon intensity changes. The energy demand
does not.
WHY HEINRICH IS STRUCTURALLY DIFFERENT
Heinrich does not store knowledge as parameters. It
stores knowledge as frequency coordinates — sinusoidal
components in a layered signal field. Retrieving
knowledge is Goertzel correlation: a single-frequency
signal processing operation that determines whether
a specific frequency is present in a signal. This
operation runs in microseconds on any CPU. It requires
no GPU. It requires no matrix multiplication.
The compute cost per query is proportional to the
number of concepts that activate in response to the
query — the resonant subfield — not to the total size
of the knowledge base. Heinrich at 50 million nodes
costs the same to query as Heinrich at 1.75 million
nodes, because the subfield that activates for any
given query is the same size regardless of how large
the surrounding field is.
This is why the 0.2% CPU and 78 megabyte RAM
measurements taken at 128 concepts in April 2026 have
not changed as the field has grown to 1.75 million
concepts. The architecture does not work any other
way. The efficiency advantage is not something that
will erode at scale. It is a structural property of
how knowledge is stored and retrieved.
You cannot get to this efficiency by optimizing a
large language model. The parameter matrix is the
bottleneck. The only way to remove the bottleneck
is to not have a parameter matrix. That is what
Heinrich is.
THE TRAINING PROBLEM
The energy numbers above cover inference — running
a model after it has been trained. The training
numbers are worse.
Every large language model requires a full training
run before it can answer a single question. GPT-4's
estimated training cost of 16,200 megawatt-hours is
spent once. But it is not spent once in the way a
factory is built once and then runs indefinitely. It
is spent once per version. When the model needs to
be updated with new knowledge, the options are full
retraining — spend the energy again — or fine-tuning,
which is partial retraining and still requires
significant compute.
Heinrich has no training run. Knowledge is added by
writing a value to a frequency coordinate. The cost
of adding one concept to the field is the cost of
computing its harmonic address and writing it to a
database. The field has grown from 128 concepts to
1.75 million in three days of continuous ingestion
at near-zero marginal energy cost per concept.
The total training energy cost of Heinrich AI to date
is effectively zero. Not low. Not efficiently managed.
Zero in any meaningful comparison to the systems it
is being measured against.
WHAT THIS IS NOT
This is not an argument that large language models
should not exist. They are remarkable systems that
have demonstrated genuine capability across a wide
range of tasks. The argument is narrower: the energy
cost of those systems is structural, not incidental,
and the approaches being taken to manage it do not
address the structural cause.
Heinrich is not a replacement for every AI
application. It is a fundamentally different
architecture for storing and retrieving structured
knowledge — one that is honest about what it knows,
deterministic in how it retrieves it, and structurally
efficient in a way that no parameter-based system
can match.
WHAT COMES NEXT
The ingestion continues. The target is 50 million
nodes — the scale at which we will run the first
formal accuracy measurements and produce the paper
that describes what Heinrich actually is.
The efficiency numbers will be in that paper. Measured
at 128 nodes. Measured at 1.75 million. Measured at
50 million. The same every time. That is the claim.
That is what we are building the proof for.
Engineered for Presence.
——
EMPHOS Group · Chilliwack, BC, Canada
emphosgroup.com
Top comments (0)