DEV Community

Muhammed Shafin P
Muhammed Shafin P

Posted on

Beyond the Standard Model :Introducing the "Cousins" of the Memory-Native Neural Network Family

Author: @hejhdiss (Muhammed Shafin P)


The MNNN experiment expands with four specialized architectures: DTPN, Hyper-AMN, SGW-AMN, and NDM.

In our initial exploration of the Memory-Native Neural Network (MNNN) family, we introduced a world where memory isn't an external database, but an intrinsic property of the neurons themselves. We explored AMRC, PMRC, and the flagship AMN.

But the experiment didn't stop there.

Today, we unveil four "Independent Cousin" architectures—experimental, specialized models that push the memory-native philosophy into radical new territories: from multi-head associative manifolds to networks governed by differential equations.


🧬 The "Cousin" Philosophy

Unlike the standard models found in api.py, these cousins are specialized tools.

  • They are standalone architectures
  • They require dedicated C-compiled backends
  • They are not general-purpose models

Each is designed for specific, complex temporal challenges that demand unique ways of remembering.


1. DTPN — The Universal Persistence Hybrid

Dual-Track Persistence Network

If the standard AMN is a master of context, DTPN is the master of persistence.

It is the most comprehensive memory model in the collection, bridging immediate reaction and long-term knowledge through three distinct retention tracks:

🔹 Track 1: The Echo (Temporal Fluidity)

  • Retains a fraction of the immediate previous output (β factor)
  • Ensures smooth transitions between time steps

🔹 Track 2: The State (Stateful Neurons)

  • Individual neurons maintain a decaying internal reservoir (α factor)
  • Acts as a medium-term memory buffer

🔹 Track 3: The Manifold (Global Memory)

  • A shared associative whiteboard
  • Stores long-term contextual information

Best For:

Tracking micro-fluctuations, medium-term states, and long-term facts simultaneously.


2. Hyper-AMN — The Multi-Head Specialist

Multi-Head Associative Manifold

While a standard AMN uses a single global memory manifold, Hyper-AMN introduces a multi-head memory system.

Think of it as a brain with specialized compartments.

🧠 Head Gating Mechanism

Information is routed into domain-specific manifolds:

  • Spatial Manifold — Positional and structural patterns
  • Emotional Manifold — Sentiment and tone
  • Logical Manifold — Reasoning and causal links

Best For:

Complex data streams where categorical separation (e.g., how something is said vs what is said) is essential.


3. SGW-AMN — The "Conscious" Bottleneck

Sparse Global Workspace

Inspired by Global Workspace Theory, SGW-AMN proposes that memory is strongest when forced through a bottleneck.

Instead of all neurons broadcasting at once:

  • Thousands of neurons compete
  • Only a few enter a tiny global workspace
  • Memory becomes attention by compression

This competitive routing ensures that only the most salient features are stored.

Best For:

Feature extraction and high-noise environments where identifying the signal matters more than raw data volume.


4. NDM — The Fluid Network

Neural Differential Manifolds

NDM abandons static weight updates in favor of continuous weight evolution using Ordinary Differential Equations (ODEs).

  • Weights evolve in real time: dW/dt
  • Learning follows Hebbian traces (“neurons that fire together, wire together”)
  • The network rewires itself dynamically

This is true neuroplasticity—structure and learning are inseparable.

Best For:

Non-stationary environments where rules change faster than traditional training can adapt.


🛠️ Summary of the Cousins

Architecture Key Innovation Best For
DTPN Triple-Track Persistence Maximum data retention across all time scales
Hyper-AMN Domain-Specific Heads Logic vs Emotion vs Structure separation
SGW-AMN Competitive Bottleneck Extracting signal from noise
NDM ODE Weight Evolution Constantly changing environments

🚀 The Experiment Continues

These cousins live on the fringe of memory-native research.

They prove there is no one-size-fits-all intelligence:

  • Sometimes you need a bottleneck (SGW)
  • Sometimes you need specialization (Hyper-AMN)
  • Sometimes your weights must flow like liquid (NDM)

This remains an open-source experiment:

  • Code is available
  • C-libraries are ready to compile
  • Exploration has only just begun

📝 Note on Development

While these architectures were originally conceptualized with assistance from Claude Sonnet 4.5, they have been manually edited, refined, and tested by me to ensure they function as standalone research-grade models.


🔗 Join the experiment: GitHub Repository

Top comments (0)