Beyond “Violent Aesthetics”: A Self-Corrected Modular, Brain-Inspired LLM Architecture
From “synchronous oscillations” to “syntactic skeleton”, from “slips of the tongue” to aphasia evidence – how a thought experiment on decoupling intelligence becomes rigorous
Preface
A month ago, I published an article titled Beyond “Violent Aesthetics”: A Modular, Brain-Inspired LLM Architecture, attempting to replace the monolithic large model paradigm with a decoupled, modular, brain-like design. The article sparked lively discussion but also revealed serious logical gaps and engineering blind spots.
Through repeated debates with peers and AI assistants, I gradually realized that my original idea confused hypotheses with established facts in neuroscience, and analogies with implementable solutions. However, this does not mean the modular, brain-inspired direction is wrong – provided we extract engineering‑able principles from how the brain actually works, rather than copying unverified hypotheses.
This article is a complete record of my self‑correction. I will:
Honestly list the disproven parts of the original proposal (and why)
For four key problems, provide rigorous, neuroscience‑grounded solutions
In particular, for entity alignment I will detail the multi‑object scenario, insights from “slips of the tongue”, and aphasia case studies that prove functional separation
Finally present a prototype‑ready modular architecture
If you have ever been attracted to “modular AI” but frustrated by “how to make it work”, I hope this article offers a starting point for discussion.
I. Three Fatal Flaws in the Original Idea (Abandoned)
Flaw Why it fails Replacement
Synchronous oscillation binding No natural global phase in digital systems; few distinguishable frequencies (<20); cannot represent nested structures Structured data passing (JSON/AMR)
Scheduler does automatic task decomposition Equivalent to the AI‑complete planning problem, no existing solution Scheduler only integrates, never decomposes
Serial sub‑modules + independent memory retrieval Inference time grows linearly; memory redundancy Parallel broadcast + shared working memory + chunked pipeline
II. Rethinking Four Critical Problems
Below I address each of the most challenged problems. For each:
① Precise statement of the problem (clarifying previous vagueness)
② How the brain actually solves it (neuroscience consensus, not speculation)
③ Engineering solution
④ Feasibility evidence
2.1 Entity Alignment (The Toughest – previously unclear about multiple objects)
Precise problem statement
My earlier description only said “color module outputs ‘red’, shape module outputs ‘circle’”, but did not specify two different objects. The real challenge is:
Input: “a red circle and a blue square.”
Color module outputs: {red, blue}
Shape module outputs: {circle, square}
Question: How does the scheduler know whether the mapping is red→circle, blue→square or red→square, blue→circle?
This is the core difficulty: with multiple objects, attributes must be correctly matched to their respective individuals.
How the brain solves this?
The brain does not do post‑hoc matching. Instead, spatial location or syntactic structure serves as the binding skeleton from the start.
Vision: Retinotopic mapping ensures colour and shape information are tagged with location (e.g. “upper‑left”). Thus “red at upper‑left” and “circle at upper‑left” are naturally bound.
Language: Syntactic structure. In “a red circle”, the adjective “red” syntactically modifies the noun “circle” – the modifier relation specifies ownership. For multiple objects, languages use coordination or separate clauses: “a red circle and a blue square”. A parser can identify two independent noun phrases, each with self‑contained modifier relations.
Key insight: The brain does not need an explicit “aligner” – syntactic/spatial structure already implies binding.
Insight from “slips of the tongue”
Our grammatical module is not perfect. We often say “red square” when we meant “red circle”. This phenomenon (semantic‑lexical mapping error) occurs both in healthy people and aphasia patients. It shows:
Thought (abstract semantics) and language production (syntax/lexical retrieval) are separate. The prefrontal lobe produced an intention “circle + red”, but Broca’s area retrieved the wrong noun.
Such errors do not disrupt binding itself – even if the wrong noun is said, the listener still knows that “red” modifies that (wrong) noun, because the syntactic position remains. This shows the robustness of the syntactic skeleton.
Aphasia cases: Hard evidence of functional separation
Pure Broca’s area lesion (Broca’s aphasia):
Patient can understand language, has clear intentions (knows what they want to say)
Cannot produce grammatically correct sentences: effortful, telegraphic, missing function words (“red… circle… want”)
Crucially, in non‑language tasks (e.g. sorting red‑circle vs red‑square cards) they perform normally. This means entity alignment (binding) via syntactic comprehension is relatively preserved, while language production is impaired.
Pure Wernicke’s area lesion (Wernicke’s aphasia):
Patient speaks fluently, grammar largely intact, but content is empty, semantic confusion (“the red… well, no, it’s square… I mean…”)
Crucially, they lose the normal binding of semantics to syntactic positions – they may say “red square” while pointing to a circle. This indicates Wernicke’s area is critical for attaching semantic features to correct syntactic slots.
Double dissociation tells us:
Syntactic skeleton construction (Broca) and semantic‑syntactic binding (Wernicke and surrounding areas) are different functions.
But neither requires an explicit alignment algorithm – binding emerges from hierarchical phrase structure.
Engineering solution
Core idea: Mimic the brain’s syntactic skeleton. First run a grammar module to parse the input into a list of noun phrases (NPs). Each NP contains its head noun and modifiers. In multi‑object scenarios, each object corresponds to a distinct NP, with attributes naturally bound inside that NP.
Example:
Input: “a red circle and a blue square.”
Grammar module output:
json
[
{
"np_id": 1,
"head": "circle",
"modifiers": ["red"]
},
{
"np_id": 2,
"head": "square",
"modifiers": ["blue"]
}
]
The colour module simply looks for colour words within each NP’s modifiers and attaches the colour to that NP – no cross‑NP matching needed.
Handling complexities:
Coreference: “John took an apple. It is red.” → Run a coreference resolution module first, link “it” to “apple”, then inherit attributes under the same entity ID.
Cross‑NP modification: “red circle and blue square” → two independent NPs.
Nesting: “the boy holding a red balloon” → parser produces nested NP structures; attributes are attached hierarchically.
Feasibility evidence:
Dependency parsers (spaCy, Stanza) achieve NP recognition F1 > 90% on well‑formed text.
Coreference models (FastCoref, NeuralCoref) achieve F1 ≈ 80% on OntoNotes – acceptable.
Grammar module is lightweight (<1GB), inference <10ms/sentence.
Conclusion: Entity alignment, even with multiple objects, is solvable via the NP skeleton from a grammar module. Aphasia cases prove the brain uses a similar mechanism and that functional separation is feasible.
2.2 Heterogeneous Outputs from Sub‑modules
Problem: Colour module outputs a string, memory module outputs a long text paragraph, numeric module outputs a float… How can the scheduler handle all formats uniformly?
Brain inspiration: Prefrontal working memory uses slots for different modalities. Each slot corresponds to one object, and different attributes fill different fields (Miller & Cohen, 2001).
Engineering solution: The entity skeleton from the grammar module provides a uniform attachment point. Each sub‑module formats its output as {entity_id, attribute_name, value}. The scheduler aggregates by entity_id.
Feasibility: This pattern is widely used in knowledge graph construction. Global attributes (e.g. sentiment) can be attached to a virtual ID global.
2.3 Redundant Computation and Interference
Problem: Broadcasting the entire text to all sub‑modules forces each module to process the whole text – redundant compute; distant information may interfere with local decisions.
Brain inspiration: Working memory capacity is limited (7±2 chunks). Reading is done sentence by sentence; only the current local information is kept active (Baddeley, 2003).
Engineering solution: Chunked pipeline. Split the text into sentences (or clauses). Process each sentence sequentially: grammar module → sub‑modules (parallel) → update global working memory. Then move to the next sentence.
Feasibility: Streaming / incremental parsing frameworks exist (e.g., Rasa). Computational complexity drops from O(L²) to O(N·l²) where l is chunk length.
2.4 Complexity of the Central Scheduler
Problem: If the scheduler must both integrate information and generate natural language, it essentially becomes a large language model – nullifying the modular advantage.
Brain inspiration: Prefrontal cortex (intention/decision) and Broca’s area (language production) are functionally separated. Broca’s aphasia patients have clear intentions but cannot produce sentences – direct evidence of separation (Geschwind, 1970).
Engineering solution: Split the scheduler into two parts:
Central scheduler (lightweight): Only integrates sub‑module outputs, resolves conflicts, and produces an abstract semantic representation (e.g., JSON, AMR). Can be a small MLP (100–500M params) or even rule‑based.
Language generation module (Broca‑like): Specialised in converting abstract semantics into natural language. Can be a lightweight neural model (e.g., T5‑small, 300M params) or template‑based.
Parameter comparison:
Original (scheduler + generation) : at least 3B parameters
After split: scheduler 100M (or 0 with rules) + generator 300M = 400M → 87% reduction.
Feasibility: Abstract‑semantics‑to‑text is a mature task (AMR‑to‑text, table‑to‑text). T5‑small achieves strong results.
III. Revised Architecture (Text‑only Version)
text
Input text (possibly long)
│
▼
Chunker (sentence splitter)
│
▼ loop over each sentence
┌─────────────────────────────────────────────────┐
│ Pipeline for current sentence │
│ ┌──────────────┐ │
│ │ Grammar mod │ → NP skeleton (JSON) │
│ │ (spaCy) │ │
│ └──────┬───────┘ │
│ │ │
│ ▼ broadcast skeleton to sub‑modules │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Colour │ │ Memory │ │ ... │ │
│ │ (rule/NN)│ │(retrieval)│ │ │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ └────────────┼────────────┘ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Update │ │
│ │ global WM │ │
│ └─────────────┘ │
└─────────────────────────────────────────────────┘
│ after all sentences
▼
┌─────────────────────────────────────────────────┐
│ Central Scheduler (lightweight / rule‑based) │
│ Resolve conflicts → output abstract semantics │
│ e.g. {"answer_type":"colour", "entity_id":1, │
│ "colour":"red"} │
└────────────────────┬────────────────────────────┘
▼
┌─────────────────────────────────────────────────┐
│ Language Generation module (Broca‑like) │
│ T5‑small / template │
│ Abstract semantics → natural language answer │
└─────────────────────────────────────────────────┘
Module list:
Module Implementation Params
Chunker NLTK sentence split 0
Grammar spaCy en_core_web_sm ~500MB
Colour etc. rule or tiny BERT 0~100M
Global WM Python dict 0
Central scheduler rule (if‑else) 0
Language generation T5‑small (300M) or template 0~300M
Total parameters (typical): ~300‑500M – one order of magnitude smaller than LLaMA‑7B (7B).
IV. Prototype Plan
Task: Product attribute extraction and QA on Amazon product descriptions (colour, size, material).
Evaluation: Attribute extraction F1, QA accuracy, latency (ms/query), total parameters.
Expectation: On this narrow task, performance close to T5‑small, but with far fewer parameters and much higher interpretability.
V. Conclusion
From “synchronous oscillations” to “syntactic skeleton”, from ignoring multi‑object scenarios to introducing aphasia evidence – this self‑correction has taught me that brain‑inspired AI is not a romantic metaphor but a rigorous cross‑disciplinary endeavour.
Abandon oscillations – digital systems are not neurons.
Abandon scheduler‑as‑orchestrator – that is AI‑complete.
Keep the grammar module – syntactic structure is the most reliable skeleton for entity alignment.
Keep functional separation – aphasia proves its necessity.
This architecture will not replace GPT‑4. But in vertical domains like contract analysis, product attribute extraction, technical document QA, it may offer a lighter, more transparent, and more maintainable alternative.
“Take the best algorithms, generate the best corresponding functions, and combine those best parts.”
The road is long, but every step is more solid now.
April 2026, Suzhou
(Comments and further challenges welcome)
Key references
Friederici, A. D. (2012). The cortical language circuit. Trends Cogn Sci.
Miller, E. K., & Cohen, J. D. (2001). Prefrontal cortex function. Annu Rev Neurosci.
Baddeley, A. D. (2003). Working memory. Nat Rev Neurosci.
Geschwind, N. (1970). Organization of language and the brain. Science.
Goodglass, H., & Kaplan, E. (1972). The assessment of aphasia and related disorders.
Top comments (0)