Hillock: A brain-inspired, CPU-bound memory gate for local LLMs

#ai #machinelearning #opensource #python

Hi everyone,

I've been hacking on a personal, local project called Hillock. Honestly, it's very much a work in progress and it isn't all that, but I wanted to see if we could build a lightweight, offline memory layer for local LLMs without the overhead of running a heavy neural vector database or wasting precious VRAM.

It is named after the biological Axon Hillock—the region of a human neuron that sums up incoming electrical charges and decides whether to fire (open the gate) or remain silent (block).

How the stack works:

Hard Facts (SQLite): Stores raw facts as simple database triples (Subject-Predicate-Object) so the system has a solid symbolic foundation.
Synapses (Hebbian Plasticity): Tracks which concepts co-occur during a conversation to dynamically build gradient-free associative weights.
Context (Hyperdimensional Computing): Maintains a 10,000-dimensional leaky context vector that rolls, binds, and accumulates history. This helps the system resolve pronouns (like "he/she") and decide when to block a query to prevent hallucinations.

The "Smarter Model, Lower Score" Paradox

I wrote a tough, 30-sentence scientific benchmark with complex sentence structures and hard negatives to see where this breaks on local hardware.

When I ran Qwen 2 (1.5B), it got around 50.0% Retrieval Accuracy. But when I upgraded to the much smarter Qwen 3 (5.2GB), its score actually dropped to 15.0%!

Why? Because Qwen 3 is too expressive for my rigid evaluation script:

The test expected Marie_Curie born_in Poland. Qwen 3 extracted [Marie_Curie] -[spent_childhood_in]-> [Poland].
The test expected Albert_Einstein. Qwen 3 extracted [albert_einstein] (lowercase), which broke the exact-string checks.
The test expected compiler. Qwen 3 extracted [first_compiler].

So, while Qwen 3 populated the database with beautiful, highly accurate, and conversational triples, it got penalized by the rigid evaluation harness.

The codebase is written in pure Python, is fully open-source (under the AGPL-3.0 copyleft license), and is designed to run entirely offline on consumer hardware.

If anyone is interested in VSAs, alternative cognitive architectures, or has feedback on the HDC context-binding math, I'd love for you to check it out!

GitHub Repository: https://github.com/roandejager/Hillock