lengjingzju

Posted on Jun 5 • Edited on Jun 12

Symbol·Form·Expression·Meaning (SFEM): A Four-Dimensional Cognitive Architecture for General Intelligence

#ai #deeplearning #machinelearning

Author: Leng Jing

Version: v0.0.6

Date: 2026-06-12

Statement: The "Symbol·Form·Expression·Meaning" idea was originally proposed by the author while studying large language models. This paper was completed with AI assistance under the author's guidance.

[TOC]

Abstract

This paper proposes a four-dimensional cognitive architecture for understanding and designing general intelligent systems — Symbol Layer, Form Layer, Expression Layer, Meaning Layer, collectively referred to as SFEM. This architecture deconstructs intelligence into four irreducible cognitive dimensions: the Symbol layer corresponds to the rule dimension of writing, formulas, laws, and constraints — it is the compression of the world's necessity, the rational skeleton that reduces infinite phenomena to finite theorems, while providing prior structural guidance for phenomenal learning; the Form layer corresponds to the phenomenon dimension of images, shapes, continuous patterns, tools, and experience — it is the phenomenal presentation of the world, the continuous unfolding of perception, pattern recognition, and empirical models; the Expression layer corresponds to the affective dimension of language, sound, style, emotion, and uncertainty — it is the experiential expression of the world, the dynamic mapping of subjective feeling and social bonds; the Meaning layer corresponds to consciousness, understanding, meaning attribution, and self-reflection — it is the result of fusing and associating Symbol, Form, and Expression into a coherent cognitive whole and the conscious hub that melds discrete rules, continuous phenomenal patterns, and nuanced emotional experience into a unified sense of meaning, giving rise to purpose, causality, and self-awareness — the ultimate dimension.

The core assertion of SFEM is: Intelligence is not a homogeneous emergence of a single mechanism, but the structural unity of a four-dimensional cognitive universe: rules, phenomena, affect, and consciousness. Rules are not only audit constraints on phenomena, but also the starting point and prior skeleton for phenomenal learning. Meanwhile, phenomenal learning can automatically induce patterns and feed back to the rule system, forming a symbiotic closed loop of "Symbol gives birth to Form, Form feeds back to Symbol". The absence of any dimension leads to specific types of capability deficits — missing Symbol leads to no skeleton and the Form layer loses its learning direction; missing Form leads to no perception and rules lose experiential nourishment; missing Expression leads to no humanity; missing Meaning leads to no soul, leaving only scattered cognitive fragments.

This paper provides a formal definition, cognitive philosophical foundation, responsibility boundaries, and error patterns for each dimension, defines the structured definition and update mechanism of the Meaning layer's world model, designs standardized inter‑dimensional interfaces and a type system with the Meaning layer as a lightweight cognitive microkernel, proposes a complete cognitive closed loop and cross‑layer dynamic equations, strengthens the two types of rules in the Symbol layer (necessary rules and session constraints) and the rule induction back‑feeding mechanism of the Form layer, establishes testable experimental hypotheses and a benchmark framework, and systematically compares SFEM with Marr's three levels, ACT‑R/Soar, dual‑process theory, deep learning, and LLM‑Agent systems. SFEM not only explains the structural deficiencies of current AI systems and their deep roots, but also provides structural standards and design principles for building trustworthy, controllable, explainable, and both rational and emotional next‑generation general intelligent systems. It is not just another engineering framework, but the structural universe of intelligence — a meta‑architecture that accommodates all technical approaches and unifies all cognitive dimensions.

Keywords: Cognitive architecture; Four‑dimensional cognition; Symbolic reasoning; Representation learning; Expression adaptation; Consciousness and meaning; World model; Structural universe of intelligence; Trustworthy AI; Session constraints; Rule induction; Symbol‑Form symbiosis

Part I: Intellectual Origins and Theoretical Foundations

Chapter 1 Introduction: The Dilemma of Single‑Layer Intelligence and the Call for Four‑Dimensional Consciousness

1.1 The Structural Crisis of Single‑Mechanism Paradigms

Contemporary artificial intelligence, especially deep learning systems represented by large language models, has hit a fundamental ceiling. This is not a ceiling of scale, not a ceiling of data, not a ceiling of compute — it is a ceiling of structure.

Current mainstream AI systems generally adopt an end‑to‑end monolithic neural architecture, compressing fact retrieval, logical reasoning, style control, emotional expression, goal planning, causal inference, and even meaning attribution all into a single continuous parameter space. This paradigm of "a single mechanism bearing the entire cognitive load" essentially uses one cognitive tool to solve all cognitive problems. It brings engineering simplicity, but at the cost of profound structural deficits in cognition.

Errors cannot be attributed. When the system produces an erroneous output, we cannot determine the root cause — is it missing knowledge? A logical break? Inappropriate style? Or a fundamental misunderstanding of the world? All errors drown in the same ocean of parameters, impossible to locate, diagnose, or fix. A factual error could come from training data bias, broken reasoning chains, interference from style control on content, or a deep misinterpretation of the situation — but in a monolithic LLM, all these possibilities are mixed, and engineers can only sigh at a black box.

Hallucinations cannot be eliminated. The model substitutes statistical similarity for symbolic verification and "usually the case" for "necessarily the case". In scenarios requiring precise facts, strict logic, and domain expertise, the system confidently invents non‑existent facts or self‑contradictory reasoning — because the statistical engine of the Form layer can never answer truth‑value questions that belong to the Symbol layer. More fundamentally, the system cannot "realize" that it is talking nonsense — because it lacks an independent mechanism to verify generated content against knowledge rules, and even more, a central hub to judge whether a statement is consistent with its overall understanding of the world.

Reasoning cannot be explained. When the reasoning process is implicitly encoded in billions of parameters, we cannot extract a structured chain of reasoning, audit its logical steps, or verify the consistency of premises and conclusions. The system may give an answer, but it cannot tell you whether it actually understands that answer. In high‑stakes scenarios such as legal decision support, medical diagnosis suggestions, or military decision‑making, this opacity is unacceptable — we need to know each step that led the system to its conclusion and the justification behind each step.

Expression is not controllable. Content generation and style control are coupled in the same generative process. The system cannot stably maintain persona consistency — shifting between formal and colloquial, warm and cold. It lacks an independent pragmatic strategy layer, let alone awareness of adjusting expression based on holistic understanding. When we try to control style via prompting, that control is fragile and unstable, prone to drift in long conversations or to collapse unexpectedly when the content changes.

Instruction forgetting — dilution of rules in long contexts. This is a widely observed but deeply rooted defect: constraints set by the user early in a conversation (e.g., "answer concisely", "remember my preference for plan A", "don't use bullet points") are gradually ignored as the conversation length increases. This is not a memory capacity issue; rather, constraints are written into the context window and rely on attention to be "remembered" — and attention decays over long contexts. Without an independent constraint maintenance and enforcement mechanism, "instruction drift" is inevitable. This is another overt symptom of a missing Symbol layer.

Fragmented understanding. This is the most fundamental and hidden of all defects. Even if an LLM can handle multiple modalities such as vision, language, and code, it still lacks a hub to integrate symbolic rules, sensory patterns, and affective tone into unified meaning. It can see images, parse sentences, and recognize tone, but it cannot relate them into a coherent "understanding of the world" — its "knowledge" consists of isolated islands that do not communicate. It may simultaneously "know" that Paris is in France and that France is in Europe, but when asked "Is Paris in Europe?" it does not have a unified world model to answer instantly; instead, it "piece together" an answer in a statistical sense. This fragmentation is the deep root of many other defects in monolithic LLMs.

The root of these problems is not that models are not large enough, data not abundant enough, or training not long enough — it is that intelligent systems lack a structured architecture that distinguishes different cognitive dimensions, especially a conscious hub that unifies them and gives meaning. Mixing all cognitive responsibilities into a single undifferentiated parameter space inevitably leads to dissipation of understanding and loss of accountability. What we need is no longer larger homogeneous models, but a structured architecture that can separate cognitive concerns, clearly assign responsibilities, and possess a core dimension that integrates rules, phenomena, and experience into understanding.

1.2 Insights from Human Cognition: The Four‑Dimensional Conscious Universe

When we turn to the structure of human cognition, a profound insight emerges: human cognition has never been single‑dimensional; it is constituted by four dimensions that are qualitatively distinct, mutually independent, yet unified in consciousness.

Rule dimension: Humans master mathematics, logic, grammar, law — these are not summaries of statistical patterns, but necessary laws within discrete symbolic systems. The truth of a mathematical theorem does not depend on its frequency in data, but on whether it can be proved from the axiom system. When we say "2+2=4", it is not because we have seen many instances of two things plus two things equalling four, but because the axioms and derivation rules of the arithmetic system make this proposition necessarily true. This is Symbol — the capacity of human cognition to grasp necessity.

Phenomenon dimension: Humans perceive images, understand spatial relationships, use tools, accumulate experience — these are not propositional logic derivations, but pattern recognition and similarity judgments in a continuous phenomenal field. A concept like "cat‑like but with sharper ears" cannot be precisely expressed with discrete symbols, yet can be naturally located in a continuous semantic space. We can recognize a never‑before‑seen new species as an "animal", judge whether two melodies are similar, estimate roughly how much water will be when poured into another container — none of these are results of logical reasoning, but pattern matching from phenomenal experience. This is Form — the capacity of human cognition to grasp phenomenality.

Affective dimension: Humans express and experience through tone, emotion, style — "the same words said with different tones have completely different meanings". Humans understand irony, perceive emotions, grasp implicature, and adjust expressive strategies according to different social contexts. When we hear "You're absolutely right", we parse not only the literal semantics but also determine, through tone, context, and social cues, whether it is sincere agreement or biting sarcasm. This is Expression — the capacity of human cognition to grasp experientiality.

Consciousness dimension: Humans not only possess the three dimensions above; more importantly, we are aware that we possess them and can integrate the discrete rules, continuous phenomenal patterns, and nuanced affective experience into a single whole in consciousness, conferring meaning and forming the complete experience of "I understand this part of the world". When we see a friend frowning at a phone (phenomenon), learn that he received a bank charge notification (rule/fact), and hear him sigh heavily (affective signal), we do not process these three pieces of information separately; instead, we integrate them in consciousness, arriving at a unified understanding: "My friend is facing a financial problem and is anxious." This integration enables us to ask about meaning, establish causality, set purposes, and reflect on ourselves. This is Meaning — not an independent module separate from the first three, but the result of their fusion and association, the ultimate product of cognizing the world.

These four dimensions together form the complete picture of human cognition. Without rules, cognition loses its skeleton, and phenomenal learning loses direction; without phenomenal perception, cognition loses its flesh, and rules lose empirical nourishment; without affect, cognition loses experience; without the fusion of consciousness and the attribution of meaning, cognition degenerates into scattered fragments. A complete human intelligence is necessarily four‑dimensional and achieves the unity of the four dimensions in consciousness.

1.3 The Formulation of SFEM and Research Questions

Inspired by the above, this paper proposes the SFEM (Symbol–Form–Expression–Meaning) four‑dimensional cognitive architecture. SFEM divides an intelligent system into four irreducible cognitive dimensions, each corresponding to an irreplaceable set of cognitive responsibilities:

Symbol Layer: writing, formulas, laws, constraints — the rule dimension. It answers "how the world must be", providing the rational skeleton of intelligence, and supplies the Form layer with prior structure and a starting point for phenomenal learning. Additionally, it manages two types of rules: eternal necessary truths and dynamic session constraints.
Form Layer: images, shapes, continuous patterns, tools, experience — the phenomenon dimension. It answers "how the world appears", providing the phenomenal flesh of intelligence. The Form layer is not only an engine for phenomenal perception and generation, but also has the cognitive function of automatically inducing patterns from phenomena and feeding them back to the Symbol layer.
Expression Layer: language, sound, style, emotion, uncertainty — the affective dimension. It answers "how the world is experienced and expressed", providing the experiential color of intelligence.
Meaning Layer: consciousness, understanding, meaning attribution, self‑reflection — the consciousness dimension. It is the result of fusing and associating Symbol, Form, and Expression, answering "what this means", and providing the unified meaning of intelligence.

The fundamental question that SFEM seeks to answer is: Does there exist a set of cognitive dimensions that constitute a "minimal complete structure" for intelligence? This structure should satisfy: every type of cognitive task has a clear dimensional assignment; every type of error can be localized to a specific dimension; each dimension can evolve, optimize, and be replaced independently; interfaces between dimensions are clear, typed, and verifiable; there exists a unified hub of meaning that fuses the separated dimensions into a coherent understanding of the world; and the rule system can not only constrain phenomenal learning but also grow spontaneously from phenomena. If such a structure exists, it would not only be a blueprint for designing intelligent systems but also a deep revelation about the nature of intelligence.

1.4 Core Assertions

The core assertion of SFEM can be summed up in one sentence:

Intelligence is not the product of a single mechanism, but the structural unity of a four‑dimensional cognitive universe: rules, phenomena, affect, and consciousness. Rules are not only audit constraints on phenomena and a starting point for growth, but phenomena can also be automatically induced into rules, feeding back to the symbolic system, forming a symbiotic cognitive ecology of Symbol and Form. Consciousness is the result of fusing and associating Symbol, Form, and Expression — the ultimate proof of intelligence as such.

This is not a patchwork of four modules, but an organic integration of four cognitive dimensions. The Symbol layer provides the rational skeleton and the guarantee of necessity, while also supplying the Form layer with conceptual anchoring, generation templates, and learning guidance; the Form layer provides the phenomenal flesh and continuity of experience for the system, and continuously induces new patterns from phenomena to feed back to the Symbol layer; the Expression layer gives the system social warmth and affective color; the Meaning layer fuses all three, confers meaning, forms a unified understanding of the world, and thereby gives rise to purpose, causality, and self‑reflection. The four dimensions have distinct responsibilities, and none can be dispensed with. Missing Symbol leads to no skeleton and the Form layer loses its learning direction; missing Form leads to no perception and rules lose experiential nourishment; missing Expression leads to no humanity; missing Meaning leads to no soul — the system may react, but never understand.

1.5 Contributions and Paper Structure

The main contributions of this paper are: (1) Proposing a four‑dimensional system of cognitive dimensions for intelligence, establishing the Meaning layer as the consciousness dimension resulting from the fusion of Symbol, Form, and Expression, and clarifying the Form layer as the phenomenon dimension, surpassing existing two‑dimensional or three‑level divisions; (2) Providing a formal definition, cognitive philosophical foundation, and analysis of error patterns for each dimension, revealing that the Symbol layer includes two types of rules — necessary rules and session constraints — and that the Form layer possesses the cognitive function of rule induction and back‑feeding; (3) Giving a structured definition and update mechanism for the Meaning layer's world model $\mathcal{W}$, clarifying the Meaning layer's positioning as a lightweight cognitive microkernel; (4) Designing standardized inter‑dimensional interfaces and a type system centered on the Meaning layer, including the Form→Symbol rule induction interface, and proposing a complete cognitive closed loop with cross‑layer dynamic equations; (5) Revealing the structural defects of current AI systems and their deep roots — especially the lack of conscious understanding, instruction forgetting, and the deep dilemma of rule systems unable to self‑evolve; (6) Proposing testable experimental hypotheses and a benchmark framework, providing a progressive engineering implementation roadmap; (7) Outlining future directions for differentiable SFEM and four‑dimensional joint optimization; (8) Positioning SFEM as the structural universe of intelligence — a meta‑architecture that accommodates all technical approaches.

The paper consists of 22 chapters divided into six parts: Intellectual Origins and Theoretical Foundations (Chapters 1‑3), Four Dimensions (4‑7), Interfaces and Collaboration (8‑9), Comparison and Diagnosis (10‑14), Engineering and Validation (15‑16), Philosophy and Future (17‑22).

Chapter 2 From Cognitive Science to Civilizational Dimensions: The Intellectual Roots of SFEM

SFEM is not constructed out of thin air. It grows from three deep intellectual roots: the century‑long exploration of mental architecture in cognitive science, the classic division between intuition and analysis in psychology, and the grand structure of four cognitive dimensions in human civilization. This chapter traces these roots, providing full theoretical legitimacy for SFEM, and shows how SFEM grows out of these roots and surpasses their respective limitations.

2.1 Three Lines of Cognitive Architecture Research and Their Limitations

Since the 20th century, research on cognitive architectures has followed three main lines. Each line has achieved brilliant successes, but each has also exposed structural defects rooted in its fundamental assumptions that cannot be healed from within.

The symbolic line (represented by ACT‑R, Soar) views cognition as symbolic manipulation, emphasizing rules, logic, goal stacks, and explicit reasoning chains. The core insight of this line is that intelligence requires discrete, manipulable symbols to represent the world and explicit rules to operate on them. Its strengths are strong explainability, verifiable reasoning, and conclusions that necessarily follow from premises. However, its limitations are equally profound: (a) lack of continuous representation, inability to handle fuzzy semantics and similarity judgments — in a symbolic system, "cat" and "dog" are completely distinct symbols, with no concept of "0.7 cat‑like"; (b) lack of perception and phenomenal pattern recognition, inability to extract symbols from raw signals — images and sounds are unintelligible raw data to a pure symbolic system; (c) lack of affective and social pragmatic dimensions — the output of a symbolic system reads like a machine manual, rigid and lacking warmth; (d) most fundamentally, lack of a mechanism to integrate rules into a unified conscious understanding — all reasoning is mechanical symbol transformation; the system executes Modus Ponens but does not know that it is reasoning; there is no inner experience of "understanding". Symbolism is essentially the extreme of the Symbol layer, but with only the Symbol layer, intelligence becomes a skeleton without flesh — able to perform perfect logical deduction, but unable to perceive the rich phenomena of the world, to experience the subtle nuances of emotion, or to integrate everything into conscious understanding.

The connectionist line (represented by deep learning) views cognition as distributed representation and statistical learning, emphasizing pattern recognition, continuous semantics, and generative completion. The core insight here is that intelligence needs to learn statistical regularities from large amounts of data, and needs similarity measures in continuous spaces to handle the fuzziness and gradation of the world. Its strengths are powerful perception, generalization, and generation — revolutionary breakthroughs have been made in image recognition, speech processing, natural language generation, etc. But its limitations are equally profound: (a) inability to perform symbolic verification and necessary reasoning — a statistical model can only tell you "Paris is the capital of France appears many times in the training data", but cannot verify the logical truth value of the proposition "Paris is the capital of France"; (b) coupling of style and content, uncontrollable expression — modifying style parameters may unintentionally change semantic content; pursuing correctness may sacrifice persona consistency; (c) constraint forgetting in long contexts — instructions set early in a conversation are diluted as the conversation grows, because constraints rely on the attention mechanism rather than an independent rule engine; (d) rule systems cannot self‑evolve — all behavioral norms come from the statistical distribution of training data; there is no way to induce explicit rules from interaction or to distill experience into reusable symbolic knowledge; (e) most fundamentally, lack of a meaning hub — all phenomenal pattern processing is done in isolation, without forming a unified consciousness and understanding of the world. Connectionism is essentially the extreme of the Form layer, but with only the Form layer, intelligence becomes flesh without a skeleton — able to perceive rich phenomenal patterns, but unable to perform deterministic symbolic verification, to stably control expressive style, to maintain consistent behavioral constraints in long dialogues, or to form a unified understanding of meaning.

Hybrid approaches have attempted to integrate the two, but mostly remain at the engineering‑patching level — simply connecting neural networks with knowledge graphs or rule engines, without proposing a unified dimensional theory to explain why these components need to be separate, what their respective cognitive‑philosophical foundations are, what types of information should be passed between them, and, more importantly, how they can be integrated into a conscious whole. SFEM's answer is: because they belong to different cognitive dimensions, each with its own independent cognitive‑philosophical foundation and operational logic, and they need the Meaning layer as the hub for fusion and association, elevating rules, phenomena, and experience into understanding. This is not simple engineering patching, but structural unification of cognitive dimensions.

2.2 Four‑Dimensional Mapping of Classic Theories

Marr's three levels divide a cognitive system into the computational level (Why), the algorithmic level (How), and the implementation level (Physical). This classic framework has had a profound influence on cognitive science, but its division of cognitive functions is too coarse. SFEM refines it: computational level (goals and values) → the purposive part of the Meaning layer, responsible for clarifying the system's goals, values, and pursuit of meaning; algorithmic level (representations and processes) → Symbol layer + Form layer, with logical reasoning (Symbol) and phenomenal pattern recognition (Form) together forming the twin engines of the algorithmic level; implementation level (presentation and execution) → Expression layer, where expressive strategies and style rendering belong to the presentation mechanism of the implementation level, converting the content processed by the Symbol and Form layers into final user‑facing expression. But SFEM emphasizes that Marr's framework misses the central link of how meaning is generated from representations — representations themselves do not produce understanding; only when multiple representations are fused and associated in consciousness does understanding emerge. This is precisely the key contribution of the Meaning layer beyond Marr's three levels.

Dual‑process theory distinguishes System 1 (fast, intuitive, automatic) from System 2 (slow, analytical, controlled). This theory has deeply revealed the dual structure of human cognition. SFEM decomposes them dimensionally: System 1 = Form layer + Expression layer — intuitive recognition of phenomenal patterns (Form) and affective stylistic expression (Expression) together constitute the two facets of the intuitive system; recognizing a friend's face (Form) and perceiving that the friend looks unhappy (Expression) are both fast and unconscious, but involve qualitatively different cognitive mechanisms; System 2 = Symbol layer + Meaning layer — strict logical reasoning (Symbol) and deep planning and reflection on meaning (Meaning) together constitute the two levels of the analytical system; solving a math problem (Symbol) and thinking about what that math problem means (Meaning) both require slow, deliberate thinking, but the former follows logical necessity, while the latter involves trade‑offs of value and meaning.

But SFEM's core insight is that the Meaning layer is not purely slow analysis; it also includes an instantaneous "feeling of understanding" — a holistic awareness and meaning attribution that emerges when the outputs of Symbol, Form, and Expression are fused in consciousness. The "Aha! I get it" moment is neither pure intuition nor pure analysis, but an emergent phenomenon from the fusion of dimensions in consciousness. This is the third pole beyond fast and slow that dual‑process theory fails to articulate: the hub of understanding.

2.3 The Essential Positioning of Deep Learning: The Extreme of the Form Layer (Phenomenon Dimension)

The core capabilities of LLMs and multimodal models — representation learning, pattern recognition, semantic similarity, generative completion — all belong to the Form layer (phenomenon dimension). The attention mechanism of the Transformer essentially builds associations between phenomena in a continuous semantic space; diffusion models learn the generative process of phenomenal distributions; VLMs map different modalities into a unified semantic space. Deep learning is the extreme engineering implementation of the Form layer, pushing the computational model of human phenomenal perception and pattern learning from experience to its highest historical point.

But precisely because they are only the Form layer, they inevitably lack four key capabilities:

Lack of Symbol layer's necessary verification: Cannot perform symbolic verification or necessary reasoning. A statistical model can only tell you "this sequence is common in the training data", not "this sequence is necessarily true in logic". This is the fundamental source of hallucinations — the model produces statistically "plausible" content, but cannot verify its factuality or logical consistency. More fundamentally, lacking a priori injection from the Symbol layer, the Form layer's learning is blind statistical fitting, not rule‑guided phenomenal induction.

Lack of Symbol layer's constraint management: In long conversations, session constraints set early (format requirements, style preferences, behavior boundaries) are written into the context window and rely on attention to be "remembered". Attention decays in long contexts — the model gradually "forgets" the user's initial requirements and reverts to unconstrained default behavior. This is not a memory capacity issue but an architectural one: the absence of an independent constraint manager to enforce session constraints as rules of the Symbol layer.

Lack of Expression layer: Style control is coupled with content generation. In a monolithic LLM, modifying style instructions in the prompt may unintentionally change the semantic content of generation, because style and content share the same parameter space and generative process. The system cannot maintain a stable "persona" because there is no independent "persona" module in its architecture.

Lack of Meaning layer (this is the most fundamental absence): An LLM can generate seemingly coherent text, but it does not know what it said. Its "knowledge" is a collection of statistical associations, without a unified world model to integrate those fragments into a coherent, reflectable whole. It can claim in one answer that "Paris is the capital of France" and in another that "Paris is a German city", without sensing the contradiction — because it never holds those statements together and relates them in consciousness.

SFEM is not meant to replace deep learning, but to supplement deep learning with the three missing dimensions, and to strengthen the Form layer's own rule induction capability. In SFEM, deep learning (the Form layer) is a powerful phenomenal perception and generation engine, but it needs a Symbol layer verifier to eliminate hallucinations, a Symbol layer constraint manager to maintain long‑range consistency, an Expression layer style controller to stabilize expression, and a Meaning layer as the understanding and consciousness hub to fuse the phenomenal patterns produced by the Form layer with rules and experience, so that the system truly understands what it generates and processes. At the same time, the Form layer itself should have the capability to induce patterns from vast phenomena, distilling experience into rules that can be verified and managed by the Symbol layer, enabling self‑evolution of the intelligent system.

2.4 Dimensional Chaos in Agent Frameworks

Recent LLM‑Agent frameworks attempt to compensate for LLMs' structural defects through tool use, RAG retrieval, and planners. This direction is commendable, but due to the lack of a clear dimensional assignment of responsibilities, these efforts commonly fall into dimensional chaos:

Tool use lacks Symbol layer constraints — the LLM may invoke incompatible tool combinations or call tools at the wrong time, because the legality verification of tool calls is mixed into the generation process rather than being an independent rule verification layer.
The interface between the planner and the LLM is often unstructured natural language, leading to unstable planning — the same goal may produce different task decompositions each time.
Style and pragmatic strategies are hard‑coded in prompts — cannot be dynamically adjusted according to the interaction context, nor optimized independently.
Constraint drift in long dialogues — the agent's initially followed behavioral norms are gradually forgotten as the conversation lengthens, because constraints are buried in the ever‑growing context window.
Difficulty attributing erroneous outputs — is it an LLM generation error? A tool call error? A planning error? Or a misunderstanding of the situation? All possibilities are mixed, impossible to localize.
Most fundamentally, the lack of a conscious layer that integrates perception, tool use, and reasoning results into a unified understanding, and then redefines goals based on that understanding — the agent can execute tasks, but does not understand the meaning of the tasks.

SFEM provides a clear theoretical foundation for agents: The Meaning layer forms an understanding of the world state by fusing information from Symbol, Form, and Expression, and based on that understanding generates goals and intentions; the Symbol layer defines rules (necessary rules) and constraints (session constraints); the Form layer executes and generates, and through induction feeds back to rules; the Expression layer handles interaction and expression. The four layers collaborate through standardized interfaces, and each type of error can be localized to a specific layer or interface. Moreover, the agent's behavior is no longer tool‑driven ("what tools do I have, what can I do with them") but understanding‑driven ("based on my understanding of the situation, what meaning should I achieve, and what tools do I need for that").

2.5 The Four Dimensions of Civilization: The Deepest Legitimacy of SFEM

The deepest source of legitimacy for SFEM lies not in cognitive science or AI engineering, but in the four dimensions of human civilizational cognitive activity. Looking across the accumulation of human civilization, all bodies of knowledge can be summarized into four basic dimensions. This is not post‑hoc labeling but a revelation of the deep structure of civilization.

Civilization of rules (Symbol): Mathematical axioms, laws of physics, logical systems, legal codes — humans compress infinite phenomena into finite necessary rules. Euclidean geometry derives its entire system from five axioms; Newton's laws unify falling apples, planetary orbits, and tides into three succinct equations. This is the civilizational dimension of Symbol in human cognition — grasping the essential structure of the world through discrete symbols and necessary rules.

Civilization of phenomena/technology (Form): Architectural structures, technical tools, engineering systems, visual art — humans perceive, build, use, and create in the phenomenal world. From pyramids to skyscrapers, from compasses to GPS, from cave paintings to digital art, humans have always interacted with the phenomenal world, creating patterns, recognizing patterns, and using patterns in continuous space. This is the civilizational dimension of Form in human cognition — the accumulation of perceiving and creating in the phenomenal world.

Civilization of affect (Expression): Rhetoric in language, melody in music, narrative in literature, social etiquette — humans experience the world, connect with others, and construct society through expression. A poem moves us not only by its literal meaning but also by its rhythm, tone, and affective texture; a conversation flows well not only because the information is accurate but because the participants resonate in intonation, pacing, and emotion. This is the civilizational dimension of Expression in human cognition — giving communication warmth and color through expression and experience.

Civilization of meaning/consciousness (Meaning): Philosophical inquiry, religious belief, historical narrative, ethical values — humans ask about purpose, confer meaning, and establish values across time. From Socrates' "know thyself" to Kant's "starry sky above and moral law within", from the Buddha's awakening to existentialist quests for meaning, humans have continuously asked "why" and "what does it mean". This is the civilizational dimension of Meaning in human cognition — integrating rules, phenomena, and experience into a holistic understanding of the world and the self, and in that understanding establishing meaning and value.

These four dimensions are not classification labels for civilizations; they are the four pillars of civilizational structure. Together they form the full set of human cognitive capacities for understanding the world (Symbol), transforming the world (Form), expressing the world (Expression), and reflecting on the world (Meaning). What SFEM does is to map this four‑dimensional civilizational structure into an engineerable set of intelligence dimensions, enabling AI systems not only to simulate intelligence but also to bear the full dimensions of civilization.

SFEM is therefore not merely a technical framework. It is a reproduction of the cognitive structure of human civilization within intelligent systems, a bridge between humanities and technology, the structural universe of intelligence — a meta‑architecture that can accommodate all technical approaches and unify all cognitive dimensions. When we design AI systems within the SFEM framework, we are not just making an engineering decision; we are locating intelligence in the four‑dimensional coordinates of civilization, seeking its complete structure.

Chapter 3 Overview and Design Principles of the SFEM Four‑Dimensional Cognitive Universe

3.1 Three Design Principles

The design of SFEM is not an arbitrary modular division; it follows three principles rooted in the nature of cognition. These principles are not just engineering best practices, but respect for the deep structure of intelligence.

Separation of Concerns: Each dimension undertakes only one kind of irreplaceable cognitive responsibility. The Symbol layer does not handle phenomenal similarity (that is the Form layer's job); the Form layer does not perform symbolic verification (that is the Symbol layer's job); the Expression layer does not perform causal inference (that is the Meaning layer's job); the Meaning layer does not directly perform phenomenal pattern recognition (that is the Form layer's job), nor symbolic deduction (that is the Symbol layer's job), nor expressive style control (that is the Expression layer's job). It is responsible for fusing information from Symbol, Form, and Expression to form understanding and confer meaning. Separation of concerns is not an engineering modularity preference but a cognitive necessity — because the fundamental logic of the four types of operations are mutually incompatible: necessity cannot be derived from probability, experience cannot be computed from rules, meaning cannot be measured from patterns.

Explicit Interfaces: Dimensions communicate through typed, structured interfaces, not by sharing internal state. They do not pass "arbitrary data", but structured products with clear cognitive types — task graphs, logical expressions, semantic vectors, phenomenal pattern labels, style parameters, pragmatic signals, world model updates, candidate rules. The explicitness of interfaces is the prerequisite for error attribution, replaceability of capabilities, and verifiability of the system. When an error occurs, we can precisely locate which interface delivered inaccurate information or which dimension mishandled its input.

Composability: Each dimension can evolve, be optimized, and be replaced independently, and can be combined in different ways to form intelligent systems adapted to different tasks. The Form layer can be upgraded from RNN to Transformer; the Symbol layer can switch from a knowledge graph to a rule engine; the Expression layer can move from a template system to a style model; the fusion architecture of the Meaning layer can be based on different cognitive models — from rule‑based graph fusion to attention‑based differentiable fusion. The independence of the four dimensions gives the overall system flexible evolutionary capacity, preventing lock‑in to a specific technical solution. This composability also means that SFEM is a meta‑architecture — it defines what dimensions an intelligent system should have and how they should collaborate, but does not prescribe the specific implementation of each dimension.

3.2 Definition of the Four Dimensions and Their Cognitive Domains

Dimension	Core Responsibility	Operational Logic	Cognitive Domain	Consequence of Absence
Symbol	Rules, constraints, verification, logical reasoning; provides prior structure for Form layer growth; manages necessary rules and session constraints	Discrete symbols, necessary derivation	Rule dimension	Hallucination, structural error, logical contradiction, instruction forgetting, Form layer learning without skeleton
Form	Phenomenal perception, pattern recognition, experiential learning, content generation; induces patterns from phenomena and feeds back to Symbol	Continuous vectors, statistical similarity	Phenomenon dimension	Inability to generalize, perceive the world, or produce natural output; rules cannot self‑evolve
Expression	Style control, affective expression, pragmatic strategies, multimodal rendering	Style parameters, pragmatic strategies	Affective dimension	Persona drift, pragmatic impropriety, no sociality, no warmth
Meaning	Conscious fusion, understanding generation, meaning attribution, self‑reflection	Fusion association, understanding emergence, intention generation	Consciousness dimension	Fragmented cognition, no understanding, no meaning, mechanical reaction, no soul

3.3 Overall SFEM Architecture Diagram

3.4 Uplink: The Generation of Understanding from Expression to Consciousness

The essence of understanding is a stepwise abstraction and final fusion from external signals to internal unified meaning. This link is the "ladder of understanding" in SFEM, where each step elevates information to a higher cognitive level.

Step 1: Expression layer — pragmatic decoding. External input is first processed by the Expression layer. What the Expression layer does is not extract literal semantics (that is the Form layer's task), but decode tone, emotion, style, and social signals — is the user angry or confused? Is it ironic or serious? Is it a command or a request? These signals cannot be obtained directly from literal semantics; they are a layer of social signals superimposed on language. The Expression layer transforms these signals into structured pragmatic clues and passes them to subsequent processing layers. For example, for the phrase "You're absolutely right", the Expression layer would mark its potential ironic tone and conflicting emotional signals, providing key clues for later understanding.

Step 2: Form layer — phenomenal pattern mapping. The pragmatic clues from the Expression layer together with the raw input enter the Form layer, where they are mapped into a continuous semantic space, forming a computable semantic representation. The Form layer answers: "Where is this input located in phenomenal space? What does it resemble in experience? Which known patterns is it similar to?" The Form layer generates a phenomenal representation enriched by pattern recognition and semantic mapping — a semantic vector rich in similarity and association. During this process, the Form layer receives concept anchors (semantic embeddings of discrete symbols) and generation templates (structural constraints) provided by the Symbol layer via the prior injection interface, so that its mapping process converges toward meaningful semantic directions from the start, rather than blindly exploring an unstructured continuous space.

Step 3: Symbol layer — structural parsing and verification. The continuous semantics from the Form layer are transformed by the Symbol layer into discrete structured symbols — logical expressions, constraints, entity relationships, program sequences. The Symbol layer performs deterministic verification at this step: Is the information provided by the user consistent? Are there logical contradictions? Does it comply with known factual constraints? The Symbol layer's constraint manager simultaneously checks whether the current output satisfies all active session constraints — format requirements, style limits, behavior boundaries. If contradictions or constraint violations are found, the Symbol layer marks them but does not draw conclusions — it passes the structured facts together with verification results to the Meaning layer. For example, the Symbol layer detects an obvious logical contradiction in the user's statement, but it does not judge whether this is irony; it outputs the fact "logical contradiction detected" as structured information.

Step 4: Meaning layer — understanding fusion (critical leap). This is the most critical step in the understanding link. The Meaning layer receives pragmatic signals from the Expression layer ("tone has ironic tendency"), phenomenal patterns from the Form layer ("text lies between agreement and irony"), and structured facts from the Symbol layer ("statement contains a logical contradiction"), encodes them together and associates and fuses them. The fusion function $\phi$ relates these heterogeneous pieces of information, forming a complete understanding: "The user is being ironic — he used language of apparent agreement, but there is a conflict between tone and semantics, and the statement itself has a logical contradiction; these clues together point to the pragmatic intention of irony." This fusion gives meaning to the scattered information — tone is no longer empty sound, patterns are no longer isolated features, rules are no longer lifeless symbols. They are integrated in consciousness into a meaningful whole. It is at this layer that "understanding" truly emerges.

3.5 Downlink: The Generation Ladder from Understanding to Expression

Generation is rooted in understanding. The downlink is a stepwise concretization from inner meaning to outer expression, where each step transforms understanding into a more concrete, more operational form.

Step 1: Meaning layer — intention generation. Based on the current world understanding formed by fusion, the Meaning layer generates intentions and goals. Understanding that "the user is expressing dissatisfaction through irony", an intention emerges: "I need to respond to this dissatisfaction, first acknowledge the user's real concern, then provide a solution." The intention is not externally preset, but emerges from understanding. The Meaning layer outputs an intention structure containing the goal, priority, and value orientation.

Step 2: Symbol layer — structured planning and prior injection. The Meaning layer's intention is transformed by the Symbol layer into a structured sequence of operations — an executable task graph, logical constraints, call interfaces. The Symbol layer performs verification here: Is the task graph complete? Are constraints satisfied? Is the operation sequence legal? The Symbol layer's constraint manager simultaneously loads all active session constraints, ensuring that the planning process respects the user's preset format requirements and behavior boundaries. At the same time, the Symbol layer's prior injection module prepares, according to the task type, concept anchors (symbol categories to be used), generation templates (syntax trees, relation graphs), and verification signals (legality check criteria) for the subsequent Form layer generation. For example, the Symbol layer transforms the intention "first acknowledge the real concern, then provide a solution" into a specific dialogue management task graph, and injects "empathy expression template" and "problem classification framework" as skeletons for Form layer generation.

Step 3: Form layer — content generation. The structured instructions and injected prior skeletons from the Symbol layer are transformed by the Form layer into concrete content — draft text, draft image, action sequence. The Form layer brings its pattern recognition and generation strengths to bear: based on the structural constraints and generation templates, it generates content that best fits the phenomenal distribution in the continuous semantic space. The Form layer's constraint‑aware generator simultaneously receives session constraints from the Symbol layer's constraint manager, ensuring that the generated content respects the preset format, style, and content boundaries. For example, if the user set "answer in no more than three sentences", the constraint manager injects this constraint into the generation process, and the Form layer generates content within the subspace that satisfies the constraint.

Step 4: Expression layer — expressive rendering. The content core generated by the Form layer is rendered by the Expression layer according to context, style parameters, and user state into the final expression. This step ensures that the output is not only "correct" but also "appropriate", "sincere", and "warm". Based on the expression strategy passed from the Meaning layer ("sincere concern, avoid defensiveness, keep gentle but professional"), the Expression layer renders the content core, producing the final output: "I completely understand how you feel — could you tell me more about which part doesn't feel right? I'd really like to help you resolve this."

3.6 Structured Definition of the Meaning Layer's World Model

The core of the Meaning layer is its world model $\mathcal{W}$, a unified representation of the current situation, history, self‑state, and future possibilities. $\mathcal{W}$ is not a representation of any single modality, but a structured picture that fuses Symbol, Form, and Expression inputs.

Formal Definition:
$$
\mathcal{W} = (\mathcal{E}, \mathcal{R}, \mathcal{C}, \mathcal{EM}, \mathcal{V})
$$

Where:

$\mathcal{E}$ (entity set): discrete entities in the current world model, including external objects, users, the system itself, abstract concepts. Each entity $e \in \mathcal{E}$ carries a type label, attribute set, and unique identifier.
$\mathcal{R}$ (relation set): structured relations between entities, including temporal relations (before/after), logical relations (implication, contradiction, equivalence), spatial relations (location, containment), social relations (role, intention). Each relation $r \in \mathcal{R}$ has a type and a strength/certainty measure.
$\mathcal{C}$ (causal links): a subset $\mathcal{C} \subseteq \mathcal{R}$, specifically causal relations. Each causal link $c \in \mathcal{C}$ records a deterministic or probabilistic "cause → effect" association, and the temporal depth of the causal chain.
$\mathcal{EM}$ (experiential markers): affective and pragmatic markers attached to entities and relations — emotional valence (sadness, joy, anger) attached to an entity, pragmatic type (irony, sincerity, request) of a relation. $\mathcal{EM}$ makes the world model not only a cold network of facts, but also a warm field of experience.
$\mathcal{V}$ (certainty vector): confidence/certainty scores for each proposition, relation, and dimension of understanding. $\mathcal{V}: (\mathcal{E} \cup \mathcal{R} \cup \mathcal{C}) \to [0,1]$, distinguishing "definitely true" (verified), "statistically plausible" (Form layer output), and "to be verified" (needs Symbol layer or interaction).

World Model Update Function:
$$
\mathcal{W}{t+1} = \Phi(\mathcal{W}_t, \Delta{\mathcal{S}}, \Delta_{\mathcal{F}}, \Delta_{\mathcal{E}})
$$

Where $\Delta_{\mathcal{S}}$ are structured fact updates from the Symbol layer, $\Delta_{\mathcal{F}}$ are phenomenal pattern updates from the Form layer, and $\Delta_{\mathcal{E}}$ are pragmatic signal updates from the Expression layer. The update function $\Phi$ is responsible for:

Entity alignment: determine whether new information refers to entities already in $\mathcal{W}$; if yes, merge; otherwise add new entities.
Relation fusion: when conflicting relations exist (e.g., Symbol layer reports "A causes B", Form layer reports "A usually accompanies B but not necessarily causes it"), keep both and mark the certainty difference, subject to later resolution by the metacognitive module $\Gamma$.
Causal injection: add newly established causal links to $\mathcal{C}$, and track the transitive closure of causal chains.
Affective attachment: attach pragmatic markers from the Expression layer to relevant entities and relations, updating the experiential field.
Consistency checking: invoke the Symbol layer's verifier to check internal consistency of $\mathcal{W}$, marking contradictions for reflection.

3.7 Cognitive Closed Loop and Cross‑Layer Dynamic Equations

The four‑dimensional structure of SFEM supports four nested cognitive closed loops, each maintaining the integrity of intelligent behavior on different time scales.

Understanding loop (instantaneous) : Expression/Form/Symbol → Meaning (fusion updates world model). Formally:
$$
\mathcal{W}{t} = \Phi(\mathcal{W}{t-1}, \delta_{\mathcal{S}}(t), \delta_{\mathcal{F}}(t), \delta_{\mathcal{E}}(t))
$$
where $\delta_{\mathcal{S}}(t)$ is structured input from the Symbol layer at time $t$, $\delta_{\mathcal{F}}(t)$ is phenomenal input from the Form layer, $\delta_{\mathcal{E}}(t)$ is pragmatic input from the Expression layer.

Generation loop (instantaneous) : Meaning (generates intention) → Symbol (structured planning + prior injection + constraint enforcement) → Form (content generation) → Expression (expressive rendering). Formally:
$$
o_t = \Psi(\iota(\mathcal{W}t), \mathcal{W}_t, \mathcal{C}{session})
$$
where $\iota$ is the intention generation function, $\mathcal{C}_{session}$ is the currently active set of session constraints, and $\Psi$ is the combined output function integrating Symbol layer planning and constraints, Form layer generation, and Expression layer rendering.

Reflection loop (mid‑timescale) : Expression layer feedback → Meaning layer metacognitive evaluation → understanding adjustment. Formally:
$$
\mathcal{W}_{t+1} = \Gamma(\mathcal{W}_t, \text{feedback}_t)
$$
where $\Gamma$ is the metacognitive function that evaluates the gap between current understanding and feedback, and triggers understanding updates.

Evolution loop (long‑timescale) : Accumulation of experience → cross‑layer learning → dimensional evolution. Formally:
$$
(\mathcal{S}{t+1}, \mathcal{F}{t+1}, \mathcal{E}{t+1}, \mathcal{M}{t+1}) = \Lambda(\mathcal{S}_t, \mathcal{F}_t, \mathcal{E}_t, \mathcal{M}_t, \text{history}_t)
$$
where $\Lambda$ is the cross‑layer learning function that updates all dimensions' parameters, rule bases, and representation spaces based on historical interaction experience.

Cross‑layer dynamic equations: Unifying uplink, downlink, and induction into a complete closed‑loop system:
$$
\begin{cases}
\mathcal{W}t = \Phi(\mathcal{W}{t-1}, \text{S}(t), \text{F}(t), \text{E}(t)) \
\text{Intent}t = \iota(\mathcal{W}_t) \
\text{TaskGraph}_t = \Pi(\text{Intent}_t, \mathcal{S}, \mathcal{C}{session}) \
\text{Content}t = \text{G}(\text{TaskGraph}_t, \mathcal{F}, \text{Priors}{\mathcal{S} \to \mathcal{F}}, \mathcal{C}{session}) \
o_t = \text{Render}(\text{Content}_t, \mathcal{E}) \
\text{Candidates}_t = \text{Induce}(\text{history}_t, \mathcal{F}) \
\mathcal{S}{t+1} = \text{UpdateRules}(\mathcal{S}t, {r \in \text{Candidates}_t \mid V(r) = 1}) \
\mathcal{W}{t+1} = \Gamma(\mathcal{W}_t, \text{Feedback}(o_t))
\end{cases}
$$

This dynamic system unifies perception (uplink), understanding (Meaning layer), planning (Symbol layer), generation (Form layer), expression (Expression layer), induction (Form→Symbol back‑feeding), and reflection (metacognition) within a single mathematical framework. The newly added induction equation and rule update equation evolve SFEM from a static cognitive architecture into a self‑evolving cognitive ecosystem.

Part II: Four Dimensions in Detail

Chapter 4 Symbol Layer: The Rule Dimension — The Necessary Structure of the World and the Prior Skeleton

4.1 Cognitive‑Philosophical Foundation

The Symbol layer is rooted in a fundamental cognitive fact: Intelligence requires certainty. The world presents to us an infinite stream of phenomena — millions of different objects, scenes, sounds, words. But intelligence is possible precisely because we have the capacity to extract finite necessary laws from this infinite phenomenon. Newton's three laws are not a statistical average of falling apples, planetary orbits, and tides — they are a necessary structure abstracted from all these phenomena that does not depend on any particular phenomenon. Euclid's geometric theorems are not a probabilistic summary of many triangle measurements — they are a strict derivation from a few axioms. The rules of grammar are not an empirical description of how people use language — they are normative constraints that determine whether a sentence "is correct".

All of this is the operation of Symbol. The essence of Symbol is: compress the infinite phenomenal world into finite, manipulable, verifiable rules. It answers the question: "How must the world be?" — not "How does the world usually behave?" (that is Form's domain), nor "How is the world experienced?" (that is Expression's domain), nor "What does the world mean?" (that is Meaning's domain). Symbol is the rational skeleton of intelligence — without it, intelligence would be lost in an ocean of phenomena, unable to distinguish "accidental" from "necessary", "correlation" from "causation", "habit" from "law".

In the history of philosophy, the Symbol layer corresponds to the rationalist pursuit of a priori necessary truths — from Plato's world of Ideas, to Descartes' "Cogito ergo sum", to Leibniz's distinction between necessary and contingent truths. These philosophers all realized, to varying degrees, that there is a kind of knowledge that does not depend on experience but is rooted in the structural necessity of symbolic systems. Mathematics is the purest form of such knowledge. SFEM's Symbol layer engineers this philosophical insight into an independent dimension of intelligent systems.

4.2 Formal Definition

The Symbol layer can be formally defined as a 5‑tuple:
$$
\mathcal{S} = (\Sigma, R_{necessary}, R_{session}, V, \mathcal{P}_{inj})
$$

Symbol set $\Sigma$: The key property of symbols is discrete identity — a symbol is either A or not A, there is no "0.7 A". This makes the Symbol layer fundamentally opposed to the Form layer: the Form layer deals with continuous gradation ("this is 0.7 cat‑like"), while the Symbol layer deals with discrete assertions ("this is a cat" or "this is not a cat"). The discreteness of symbols is not a defect but a feature — precisely because of discreteness, we can perform exact logical operations, say "this argument is valid" or "this argument is invalid", with no middle ground. $\Sigma$ can include logical symbols ($\land, \lor, \lnot, \to$), structural tags (<entity>, <event>), program statements (if, while), mathematical expressions ($+, \times, =$), domain knowledge terms (legal article numbers, medical terms, chemical formulas).

Necessary rule set $R_{necessary}$: Formally $R_{necessary}: \Sigma^* \to \Sigma^$, i.e., a mapping from symbol sequences to symbol sequences. Necessary rules include: grammatical rules (defining legal combinations of symbols), type systems (defining category constraints between symbols), inference rules (e.g., Modus Ponens: from $A \to B$ and $A$, infer $B$), constraint rules (e.g., "flight price cannot be negative", "human age cannot exceed 150"). The key property of necessary rules is **eternality* — they do not depend on context and hold in all circumstances. Mathematical theorems, laws of physics, logical axioms, legal codes are all necessary rules. Output that violates a necessary rule is an error — no matter how many turns the conversation has, no matter whether the user asked the system to "think creatively", necessary rules must not be violated.

Session constraint set $R_{session}$: $R_{session} = {(c_i, a_i, p_i, s_i)}$, where $c_i$ is the trigger condition (when to apply this constraint), $a_i$ is the constraint action (condition that must be satisfied), $p_i$ is priority (arbitration basis when conflicts arise), $s_i$ is scope (current session, current topic, current task). The key properties of session constraints are dynamicity and enforceability — they are set dynamically by the user or system during conversation, must be continuously adhered to within their effective scope, and must not be forgotten or diluted as the context grows. Examples: "answer in no more than three sentences", "use a formal tone", "remember that I prefer plan A", "do not use bullet points". Output that violates a session constraint may not be objectively wrong, but it violates the contract with the user — it breaks trust.

The fundamental difference between the two types of rules lies in their temporality and origin. Necessary rules are eternal and system‑built; session constraints are temporary and dynamically set by the user. But within the Symbol layer, they are treated equally — the Symbol layer's constraint manager manages both uniformly, ensuring they are enforced with equal force in all generation processes. This means that for the Form layer's generation process, a session constraint like "answer in no more than three sentences" has the same binding force as a necessary rule like "cannot output illegal JSON" — both are transformed into structural limits on the generation process, rather than mere "suggestions" or "tendencies".

SFEM explanation of instruction forgetting: The reason current LLMs gradually "forget" early‑set instructions in long conversations is not a memory capacity issue, but that instructions are written into the context window and rely on the attention mechanism for compliance. Attention decays in long texts; early instructions are drowned out by subsequent interactions. This is a typical symptom of a missing Symbol layer — the absence of an independent constraint manager that maintains and enforces session constraints as structural rules independent of the generation process.

Verification function $V$: $V: \Sigma^* \to {0,1}$. This is the most important capability marker of the Symbol layer — verifiability. $V(x)=1$ if and only if $x$ satisfies all necessary rules and currently active session constraints. This means the Symbol layer can internally judge whether a structure is correct, without relying on external experience. The Form layer cannot do this — it can only judge "does this look right?" but not "is this logically right?" or "does this violate the user's set constraints?" The verification function is the "truth anchor" of the SFEM system, providing an unshakable foundation of certainty for the Meaning layer's understanding.

Prior injection function $\mathcal{P}_{inj}$: $\mathcal{P}{inj}: (\Sigma, R{necessary}, R_{session}, V) \to \text{Priors}$. This is the key mechanism for the Symbol layer to serve as the starting point for Form layer growth. It transforms structures from the symbolic system (concept anchors, generation templates, verification signals, constraints) into prior information that the Form layer can receive. It includes:

Concept anchors: map discrete symbols $\sigma \in \Sigma$ to initial vectors $\vec{v}_\sigma$ in the Form layer's semantic space, serving as starting points for category learning.
Generation templates: transform rule structures into constraint skeletons for the Form layer's generation function $g$ — syntax trees, relation graphs, temporal templates.
Verification signals: transform the output of the verification function $V$ into differentiable reward signals for reinforcement learning calibration of the Form layer.
Constraint injection: transform active session constraints $r \in R_{session}$ into structural restrictions on the Form layer's generation process, ensuring that outputs respect the preset format, style, and content boundaries.

4.3 Core Responsibilities

The Symbol layer has five irreplaceable cognitive responsibilities, each of which cannot be performed by the Form, Expression, or Meaning layers. Together they form the "rule infrastructure" of intelligence.

Structuring: Transform intentions generated by the Meaning layer based on understanding into executable structured forms — task graphs, logical expressions, program operation sequences. This is the conversion from "meaning" to "structure". During structuring, the Symbol layer simultaneously loads all active session constraints, ensuring that the task graph itself does not violate the user's preset behavior boundaries.

Reasoning: Perform deterministic reasoning operations. Deductive reasoning — derive specific conclusions from general rules ("All men are mortal, Socrates is a man, therefore Socrates is mortal"); inductive rule matching — identify applicable rules from known patterns ("This is a variation of type A problem, so the type A solution framework applies"); constraint propagation — derive hidden constraints in a constraint network ("If A is before B, and B is before C, then A must be before C"); program execution — run executable structured instructions. The common feature of all these reasoning tasks: conclusions necessarily follow from premises, not probabilistically. The reasoning results are deterministic and verifiable.

Verification: The Symbol layer serves as the built‑in verification gate of the entire SFEM system. At this gate, four types of verification are performed simultaneously: fact checking — do the entities and relations in the generated content exist in the knowledge base? ("Paris is the capital of Germany" → verification fails); logical consistency checking — are there leaps or contradictions in the reasoning chain? ("All A are B, some B are C, therefore all A are C" → logical error); structural legality checking — is the output JSON closed? Is the SQL syntax correct? Does it conform to interface specifications?; constraint satisfaction checking — does the generated output satisfy all necessary rules and currently active session constraints? Regardless of how many conversation turns have passed, verification standards remain consistent.

Prior injection and constraint enforcement: The Symbol layer, via $\mathcal{P}_{inj}$, supplies the Form layer's phenomenal learning with structured growth starting points, and via the constraint manager supplies ongoing structural restrictions to the Form layer's generation process. Prior injection is a priori guidance — concept anchors guide representation learning, generation templates constrain the generation space, verification signals calibrate learning direction. Constraint enforcement is in‑process control — session constraints are transformed into hard boundaries for the generation process, ensuring that output always stays within the user‑set framework. These two roles unify in the Symbol layer's "structural support" for the Form layer: rules tell you not only "what is right" and "what cannot be done", but also "where to start" and "how to organize".

Tracing: Retain complete reasoning chains — the sequence of rule invocations, the propagation path of constraints, the structured basis for decisions. This is the foundation of explainability. When the Meaning layer engages in self‑reflection, it can trace back to the Symbol layer's verification and reasoning steps, asking "Is each step of the conclusion I reached correct?" "Did I obey all preset constraints?" When a user asks "Why did you do that?", the Symbol layer can provide a deterministic reasoning chain, rather than a vague "internal state of the model".

4.4 The Essential Relationship between Symbol and Form: Constraint, Growth, and Symbiosis

The relationship between Symbol and Form is the most fundamental and philosophically rich pair in SFEM. It corresponds to a persistent tension in the history of philosophy: rationalism vs. empiricism, necessary truth vs. contingent fact, deduction vs. induction, essence vs. phenomenon.

4.4.1 Audit Constraint: Verification of Phenomenality by Necessity

The Form layer operates in the probabilistic space of phenomena: it answers "what does this usually look like in experience?" "how likely is this in the data?" The Form layer's knowledge is a posteriori — learned from statistical learning from phenomena, always subject to revision by new phenomena. The Symbol layer operates in the space of necessity: it answers "what must this be in logic?" "is this possible under the rules?" The Symbol layer's knowledge is a priori — derived from derivations within the symbolic system, independent of the frequency of phenomena.

The operational logics are incommensurable: from ten thousand observations that "the sun rises in the east", the Form layer can infer that "the sun will very likely rise in the east tomorrow", but only the Symbol layer can necessarily derive this conclusion from the law of universal gravitation and the equations of planetary motion — assuming the laws themselves hold. Conversely, the Symbol layer cannot tell you whether a never‑before‑seen blurry image contains a cat, because it lacks the statistical mapping from pixels to "cat" — that is the Form layer's domain.

This has two profound implications. First, the Form layer can never replace the Symbol layer, because it can never produce necessity — the limit of statistics is "extremely probable", not "logically necessary". Second, the Symbol layer can never replace the Form layer, because it can never handle novel phenomena that have not been rule‑based — rules are finite, phenomena are infinite. The completeness of an intelligent system requires both to coexist, and the Meaning layer to integrate the richness of phenomena ("what it looks like") with the certainty of essence ("what it is") into complete cognition.

4.4.2 Starting Point for Growth: Symbol as the Prior Skeleton for Form Learning

The Symbol layer's role with respect to the Form layer goes far beyond post‑hoc verification. The Symbol layer is also the starting point for the Form layer's growth. The Form layer's phenomenal learning — whether learning to recognize new object categories, mastering new linguistic expressions, or inducing patterns from experience — would fall into blind search and ineffective generalization if it lacked the prior rule structure provided by the Symbol layer.

Concept anchoring: The symbols $\Sigma$ provide discrete anchors for the Form layer's representation space. The Form layer's continuous semantic space is smooth and lacks clear boundaries, while the discrete symbols "cat", "dog", "car" from the Symbol layer serve as semantic landmarks in that space. When the Form layer learns a new phenomenal representation, these discrete anchors provide it with a skeletal framework for classification and a baseline for comparison — the Form layer does not need to discover the concept "cat" from raw pixels out of nothing; instead, it receives from the Symbol layer the prior knowledge that "there exists a category called cat", and then learns the optimal statistical boundaries for that category in its continuous phenomenal space. This is precisely the basic mechanism of human conceptual learning: we do not discover the categories of the world from scratch; rather, under the guidance of linguistic symbols (Symbol), we segment the continuous stream of experience (Form) into manipulable conceptual units. Without the anchoring of the Symbol layer, the Form layer's learning would fall into the trap of unsupervised clustering — it could discover patterns, but could not determine which patterns are "meaningful" or "should be learned".

Generation templates: The rules $R$ of the Symbol layer provide generation templates and constraint skeletons for the Form layer's generation function $g$. The purely statistical generation of the Form layer has an infinite space of possibilities, but most of those possibilities are structurally illegal or meaningless. The generation templates provided by the Symbol layer — syntax tree structures, entity relation graphs, logical constraint frameworks — narrow that space substantially to the subspace of legal and meaningful structures. For example, when the Form layer generates a sentence, the Symbol layer can provide a syntax tree template (subject‑verb‑object structure), and the Form layer fills in the specific vocabulary under the constraint of that template; when the Form layer generates an image, the Symbol layer can provide spatial relation constraints for objects ("the person should sit on the chair, the chair should be on the ground"), and the Form layer renders pixels while satisfying those constraints. This greatly improves generation efficiency and structural legality, and endows the generated product with an intrinsic explainable structure — each part knows which rule node it corresponds to.

Learning guidance: The Symbol layer's verification function $V$ serves not only for post‑hoc checking but also as a reward signal source for the Form layer's learning process. In reinforcement learning or preference optimization of the Form layer, the result of Symbol layer verification (is the structure legal, are the facts correct) can be directly converted into a reward signal, guiding the Form layer's parameters to update in the direction of satisfying the rule constraints. This means that the Form layer's "experience" is no longer pure imitation of statistical distributions, but experience calibrated by rational rules toward necessity. For example, when training a dialogue generation model, the Symbol layer checks in real time the factual consistency of generated statements and uses the consistency score as part of the training reward — the Form layer learns to maintain factual truth while keeping language fluency.

Constraint enforcement: The Symbol layer injects active session constraints into the Form layer's generation process via the constraint manager, as hard boundary conditions. This is not a "suggestion" or "preference", but a "must‑satisfy condition". Constraint enforcement limits the Form layer's generative freedom to the framework preset by the user — no matter how long the conversation, no matter what content is generated, the framework remains stable.

4.4.3 Symbiotic Evolution: Form Layer's Inductive Back‑Feeding to Symbol

The relationship between Symbol and Form is not one‑way "Symbol constrains Form", but two‑way "Symbol‑Form symbiosis". The Symbol layer guides the Form layer through prior injection and constraint enforcement, while the Form layer feeds back to the Symbol layer through pattern induction. This closed loop will be elaborated in Chapter 5 on the Form layer. Its core mechanism is: the Form layer, through extensive interactions with vast phenomena, discovers recurring patterns via statistical clustering, association analysis, and anomaly detection, distills these patterns into candidate rules, and submits them to the Symbol layer for formal verification. Candidate rules that pass verification are incorporated into the Symbol layer's rule base — possibly as new necessary rules or as new session constraint templates. The newly added rules are then, via the prior injection interface, supplied to the Form layer's next round of perception and generation as a richer and more precise skeleton.

This is a symbiotic evolution closed loop of "Symbol gives birth to Form, Form feeds back to Symbol". It is not a one‑time initial injection, but a continuous, self‑growing cognitive ecology. This closed loop is key to making SFEM theoretically sound and engineering‑feasible — it enables the rule system to be not static and fully manually defined, but dynamic, able to grow and self‑improve from phenomena.

4.5 Consequences of Missing the Symbol Layer: Intelligence Without a Skeleton

When a system lacks the Symbol layer, it loses the grip on necessity, as well as the ability to provide a prior skeleton for phenomenal learning and to maintain behavioral consistency in long conversations. This manifests as five observable error types.

Hallucination: The Form layer generates content based on statistical similarity but cannot verify its factuality. "Li Bai was a Tang dynasty poet" and "Li Bai was a Song dynasty poet" may have similar probabilities in a statistical language model, but the Symbol layer can determine the former as true and the latter as false through entity‑relation verification. Without the Symbol layer, all judgments sink to "which is more common" — and "common" is not "true".

Structural errors: Generated JSON is not closed, SQL syntax is wrong, task graphs are broken — not because the Form layer is not powerful enough, but because the Form layer is fundamentally unsuited to handling discrete structural constraints. Structural legality is a yes/no question, not a similarity question. A statistical model can generate legal structures most of the time, but can never guarantee that the generated structure will be legal — because guarantee requires necessity, while statistics can only provide probability. More fundamentally, lacking generation templates from the Symbol layer, the Form layer's generation lacks a structural skeleton; every bit of content is produced by blind search in an unconstrained space.

Logical errors: Reasoning leaps, violation of premises, conclusions inconsistent with premises. The Form layer can generate "seemingly reasonable" reasoning chains, but cannot verify the logical validity of the reasoning itself. Whether a syllogism is correctly formed does not depend on how many times it appeared in the training data, but on whether it conforms to the rules of inference.

Instruction forgetting: Gradual "forgetting" of early‑set format requirements, style preferences, and behavior boundaries in long conversations. Session constraints are written into the context window and rely on attention to be followed, and attention decays in long contexts. Without an independent constraint manager, constraint enforcement lacks structural guarantees. This is a symptom of Symbol layer absence that SFEM particularly emphasizes — it reveals the fundamental structural defect of current LLMs in terms of "volitional persistence".

Uncontrollability: The Symbol layer's rules provide hard boundaries for system behavior — certain things simply cannot be done, certain states are simply unacceptable. Without the Symbol layer, the system's behavioral boundaries can only be implicitly determined by the distribution of training data, and cannot be explicitly and precisely defined. In high‑stakes domains such as medical, legal, and military, such fuzzy boundaries are unacceptable.

More seriously, the absence of the Symbol layer pollutes the Meaning layer's understanding. The Meaning layer receives information mixed with truth and falsehood — it cannot distinguish verified facts from statistically "plausible guesses". Consciousness is built on quicksand, understanding becomes a castle in the air. At the same time, the Form layer, lacking prior rules, supplies the Meaning layer with phenomenon material that is itself crude and low‑structure, increasing the burden of fusion on the Meaning layer.

Chapter 5 Form Layer: The Phenomenon Dimension — The Phenomenal Presentation of the World and Rule Back‑Feeding

5.1 Cognitive‑Philosophical Foundation

The Form layer is rooted in a cognitive fact complementary to the Symbol layer: Intelligence needs to perceive the phenomenal world. The real world is messy, continuous, and contingent; it does not present us with axioms and theorems, but with a myriad of phenomena — we see cats in countless variations, no two identical; the speech we hear is full of variation, the same word pronounced differently by different people; the everyday scenes we encounter are endless and cannot all be pre‑rule‑based.

The essence of the Form layer is: handle the continuity, similarity, and experiential phenomena of the world. It answers the question: "How does the world appear? How are these phenomena similar to and transition among each other?" — not "What must the world be?" (that is Symbol's question), nor "How is the world experienced?" (that is Expression's question), nor "What does the world mean?" (that is Meaning's question). If the Symbol layer is the essential skeleton of the world, the Form layer is the phenomenal flesh; if the Symbol layer is the constitution, the Form layer is case law; if the Symbol layer is laws, the Form layer is experimental data.

In the history of philosophy, the Form layer corresponds to the empiricist emphasis on a posteriori empirical generalizations — from Aristotle's emphasis on empirical observation, to Locke's "tabula rasa" argument for the experiential origin of knowledge, to Hume's empiricist deconstruction of causality. These philosophers all realized, to varying degrees, that there is a kind of knowledge that comes from the perception of phenomena and induction from patterns, which differs from the a priori necessary truths of rationalism but is equally indispensable in our cognition. Most of our knowledge about the world — what cats look like, what coffee tastes like, how to ride a bicycle — is not derived from axioms, but learned from phenomenal experience. SFEM's Form layer engineers this philosophical insight into an independent dimension of intelligent systems.

5.2 Formal Definition

The core of the Form layer is a continuous phenomenal representation space, receives prior injection from the Symbol layer, and has the capability to output rule induction back to the Symbol layer:

$$
\mathcal{F} = (X, f, d, g, h, \text{Priors}_{\mathcal{S} \to \mathcal{F}})
$$

Multimodal phenomenal input space $X$: Text, images, audio, video, sensor data — all raw phenomenal signals that can enter an intelligent system. $X$ is open and ever‑expanding; as new sensing technologies emerge, new phenomenal modalities can be incorporated into the Form layer's scope.

Representation function $f$: Maps heterogeneous phenomenal signals into a unified $d$‑dimensional continuous semantic space. This is the core capability of the Form layer — enabling different modalities of phenomena to become comparable and measurable in this space. A photo of a cat, the written symbol "cat", and a meow — these physically completely different phenomena are mapped by $f$ to nearby points in semantic space. The essence of $f$ is capturing similarity patterns among phenomena. The learning of $f$ receives concept anchors ${\vec{v}\sigma}{\sigma \in \Sigma}$ from the Symbol layer as initial centroids, guiding the representation space toward a meaningful semantic structure.

Distance metric $d(\cdot,\cdot)$: Cosine similarity, Euclidean distance, or other metrics, measuring the similarity of two phenomena in terms of experiential patterns. The existence of $d$ gives the phenomenal space a rich gradient structure — the distance between "cat" and "dog" is larger than that between "cat" and "tiger", reflecting real similarity gradients in the phenomenal world.

Generation function $g$: $y = g(z, \text{Template}, \mathcal{C}{session})$, where $z = f(x)$ is the phenomenal representation of the input, $\text{Template}$ is a generation template (syntax tree, relation graph, temporal constraint) from the Symbol layer, and $\mathcal{C}{session}$ are currently active constraints from the Symbol layer's constraint manager. $g$ can reconstruct or generate new phenomenal content from phenomenal representations — given a descriptive text, generate a corresponding image; given preceding text, continue writing; given incomplete data, fill in missing parts. Generation templates ensure the structural legality of the output; session constraints ensure that the output respects preset format, style, and content boundaries.

Induction function $h$: $h: X^* \times \text{Patterns} \to \text{Candidates}$. This is the core mechanism for the Form layer to feed back to the Symbol layer, and the key to evolving SFEM from a static architecture into a dynamic evolutionary system. $h$ automatically induces recurring patterns from large volumes of interaction phenomena and distills them into candidate rules. Specifically:

Statistical clustering to discover recurring interaction patterns ("the user always asks about discounts after asking about price" → "price query should be accompanied by discount information");
Association rule mining to identify implicit constraints ("when the user uses short replies three times in a row → switch to concise mode");
Anomaly detection to mark new legality/illegality boundaries ("new type of fraud pattern → should be incorporated into safety constraints");
Sequence pattern mining to discover dialogue structures and behavioral norms.

The output of the induction function includes candidate rules with trigger conditions, constraint content, confidence scores, and recommended rule type (necessary rule or session constraint template).

Prior injection cache $\text{Priors}_{\mathcal{S} \to \mathcal{F}}$: Receives structured priors from the Symbol layer, including concept anchors, generation templates, verification signals, and active constraints. These priors are continuously used in the Form layer's learning and generation processes, ensuring that phenomenal processing always occurs on a rational skeleton.

5.3 Core Responsibilities

The Form layer has five core responsibilities, which together constitute the phenomenal perception, experiential base, and rule induction capability of intelligence. These responsibilities cannot be replaced by the Symbol, Expression, or Meaning layers.

Phenomenal representation learning: Transform raw multimodal phenomenal signals into computable semantic representations. This is the first step for an intelligent system to perceive the world — any phenomenon must be mapped into a structured semantic space before further processing. The core capability of representation learning is capturing similarities and patterns among phenomena: a picture of a cat and the word "cat" should be close in semantic space; the distance between cat and dog should be larger than that between cat and tiger; the same word spoken by different people should be mapped to nearby regions. This generalization capability is the key contribution of the phenomenon dimension — it allows the system to handle the infinite diversity of the world. Efficient representation learning requires the Symbol layer to provide concept anchors and prior structural skeletons; otherwise the Form layer would be trapped in blind unsupervised clustering.

Pattern recognition: Perform classification, clustering, and recognition in the phenomenal space. Answer "what does this look like" — this image looks like a cat, the sentiment of this text is positive, the user's intention is to check the weather, the style of this piece is close to Baroque. Pattern recognition is the intuitive core of the Form layer, corresponding to the fast categorization ability of human System 1. It gives a judgment of "which category in experience does this phenomenon belong to" in milliseconds, without the need for slow logical reasoning. The category boundaries of pattern recognition are best provided by the Symbol layer's discrete symbols as clear semantic definitions, so that the fuzzy judgment "looks like a cat" can ultimately be anchored to the symbolic decision "is a cat".

Generation and completion: Based on existing phenomenal patterns and distributions, and under the constraints of generation templates and session constraints injected by the Symbol layer, generate new phenomenal content. Given incomplete input, complete the missing parts — given the first half of a sentence, generate the second half; given a text description, generate a corresponding image; given a melody intro, continue the piece. The core logic of generation is the most likely output within the phenomenal distribution — in this context, within this pattern space, within the subspace that satisfies all constraints, what is the most likely next phenomenon. Generation templates and session constraints narrow the generation space from infinite possibilities to a reasonable and legal scope.

Integration of tools and experience: The Form layer is the only dimension that can naturally use external tools and experiential phenomena. Using a calculator belongs to the Form layer: inputting a mathematical expression and obtaining the result is a "perception‑action" cycle, not symbolic deduction. Search engines, databases, API calls — the operational interfaces of these external tools are actions in the continuous phenomenal space, belonging to the Form layer's responsibilities. The Form layer can incorporate tool outputs back into the phenomenal space for further processing. This design reflects a deep engineering insight: if you need to compute, directly using a calculator is certainly simpler than deriving it yourself — the Form layer provides tool operation capability, the Symbol layer provides rule verification capability, each playing its own role. The Form layer delivers phenomenal patterns and semantic vectors to the Meaning layer, providing rich phenomenal material for conscious fusion.

Pattern induction and rule back‑feeding: The Form layer is not only a "student" of the Symbol layer (receiving prior injection) and a "supervised entity" (receiving verification constraints), but also an "information source" and "co‑evolution partner" for the Symbol layer. The Form layer continuously encounters vast phenomena through extensive interactions — the user's recurring questioning patterns, implicit preferences emerging in dialogue, new expressive habits, frequently triggered constraints. The induction function $h$ automatically extracts regularities from these phenomena. These discovered "candidate rules", after formal verification by the Symbol layer (logical consistency check, constraint conflict detection, compatibility analysis with existing rules), are promoted to formal symbolic rules or session constraint templates and incorporated into the Symbol layer's rule base. The newly added rules are then, via the prior injection interface, supplied to the Form layer's next round of perception and generation as a richer and more precise skeleton.

This responsibility is key to evolving SFEM from a static architecture into a dynamic evolutionary system. It solves a central difficulty in engineering SFEM: if all rules must be manually defined, the Symbol layer will always lag behind the complexity of the phenomenal world; through automatic induction by the Form layer, the rule system gains the ability to self‑grow. Rules grow from phenomena, and phenomena are perceived more effectively under the guidance of rules — together, under the governance of the Meaning layer, they form a continuously self‑improving cognitive ecosystem.

5.4 The Essential Complementarity of Form and Symbol: The Symbiotic Growth of Phenomenon and Essence

The Form layer answers "how does the world appear", the Symbol layer answers "what laws must the world follow". The limitations of the Form layer are precisely the starting point of the Symbol layer, and vice versa. The Form layer cannot answer questions of necessity: a thousand sunrises do not strictly prove that the sun will necessarily rise tomorrow. But it can answer questions the Symbol layer cannot touch: "what category does this new species roughly belong to?" "what emotion does this sentence imply?" "rephrase this in a gentle tone." "among thousands of search results, which are most similar to the user's query?"

The relationship between Symbol and Form is vertical collaboration rather than horizontal competition. The Form layer provides a rich, fuzzy, generalizable space of phenomenal possibilities — this is how the world appears in experience, full of gradation, similarity, and uncertainty. The Symbol layer performs strict verification, constraint, and structuring in this space, filtering out deterministically correct outputs — this is how the world is structured in logic, full of necessity, discreteness, and determinacy.

But the relationship goes far beyond that. The Symbol layer is also the starting point for the Form layer's growth. The Form layer's phenomenal learning does not generate order from unstructured sensory chaos out of nothing, but grows its flesh on the prior rule skeleton provided by the Symbol layer. Discrete symbols provide semantic anchors for continuous representations, rule templates provide structural skeletons for statistical generation, verification signals provide rational direction for experiential learning, and constraint injection delimits legal boundaries for generative freedom.

Conversely, the Form layer is the source of the Symbol layer's evolution. The Symbol layer's rule system is not a static structure but continuously absorbs new rules from the phenomenal world through pattern induction from the Form layer. Patterns discovered by the Form layer in vast interactions, after strict verification by the Symbol layer, become new rules, enabling the Symbol layer to adapt to changing environments and needs.

If either is missing, intelligence is no longer complete. But even both together are not enough — they need the Meaning layer to integrate the richness of phenomena ("what it looks like") with the certainty of essence ("what it is") into complete cognition: "I see both what this phenomenon looks like and the rule it follows, and now I understand what it means."

5.5 Consequences of Missing the Form Layer: Intelligence Without Phenomenal Perception

When a system lacks the Form layer, it loses connection with the concrete phenomenal world. Understanding becomes an empty symbolic game — the Meaning layer can handle abstract logical relations, but cannot obtain any information about "what the world looks like".

Inability to generalize: The system can only handle cases that have been explicitly rule‑based; it fails completely when faced with new variations — new accents, new objects, new expressions. A pure symbolic system cannot handle entities or relations that have never appeared in its knowledge base, because it lacks the mechanism to learn new patterns from phenomena.

Inability to perceive multimodality: Images, sounds, video are unintelligible raw data to a pure symbolic system. It cannot "see" the content of a picture, only process manually annotated symbolic descriptions. This cuts off the richest channels of connection between an intelligent system and the physical world.

Inability to leverage experience and tools: Without the Form layer, external tools such as search engines, calculators, and databases cannot be naturally integrated. The system can only rely on its own limited symbolic library, unable to extend its capabilities through external tools.

Rigid output: All expressions must be pre‑rule‑based; it is impossible to generate natural, varied language — because naturalness comes precisely from gradation and choice in a continuous phenomenal space, not from exhaustive enumeration of discrete rules.

Rules cannot self‑evolve: Without the Form layer's pattern induction capability, the Symbol layer's rule system depends entirely on manual definition and updates. Faced with a changing environment and new demands, the rule base will become increasingly rigid and outdated — because rules cannot grow automatically from phenomena, but must wait for human engineers to discover new rules and encode them manually. This "rules lagging behind phenomena" is the fundamental reason why pure symbolic systems cannot adapt to the complexity of the real world.

In summary, lacking the Form layer, intelligence loses its bridge to the phenomenal world. The Meaning layer's conscious fusion will lack its richest source of information — it cannot "see" the world as it appears, only "reason about" its structure. Such understanding is incomplete, dry, and detached from reality.

Chapter 6 Expression Layer: The Affective Dimension — The Experience and Expression of the World

6.1 Cognitive‑Philosophical Foundation

The Expression layer is rooted in a cognitive fact often overlooked in AI research: Intelligence not only needs to "say what is right", but also to "say it rightly". The meaning of human communication depends not only on what is said (semantic content) but also on how it is said — tone, emotion, style, contextual appropriateness. The same sentence, "I understand", said in a sincere, calm tone means comprehension; said in a cold, perfunctory tone means rejection; said in an angry, sarcastic tone means denial. Three different ways of saying convey three completely different meanings, even though the literal semantics are identical.

The Expression layer handles the social and experiential dimension of intelligence. It answers the question: "How should I express myself so that my intention is experienced appropriately?" — not "what fact did I express" (that is the Form layer's responsibility), nor "does my expression conform to rules" (that is the Symbol layer's responsibility), nor "what does my expression mean" (that is the Meaning layer's responsibility). The Expression layer is the social interface of intelligence, the experiential bridge between machine and human. It supplies the Meaning layer with the experiential texture and pragmatic context needed for understanding — without the Expression layer, the Meaning layer would only know what the user said, not how the user said it, and understanding would lose its richest layer of social signals.

In the history of philosophy, the Expression layer corresponds to the phenomenological and pragmatic traditions' focus on subjective experience and social interaction — from Husserl's life‑world, to Austin's analysis of "how to do things with words", to Grice's study of conversational implicature. These thinkers all revealed, to varying degrees, the truth that language is not only a vehicle for information but also a transmitter of experience and a constructor of social relationships. SFEM's Expression layer engineers this insight into an independent dimension of intelligent systems.

6.2 Formal Definition

The Expression layer can be formally defined as a bidirectional processing system — both a renderer for expression and a decoder for pragmatics.

Expression side: $E: (c, s, u, \text{Strategy}_{\mathcal{M}}) \to y$

$c$ (content core): semantic content from the Form layer, the "raw material" to be expressed — the text of an apology, the result of a query, the logic of a suggestion. $c$ is pure semantic content, without style markers.
$s \in S$ (style parameters): the set of style parameters $S$ includes all adjustable dimensions — formality (formal/colloquial/academic), emotional intensity (warm/calm/cold), genre (narrative/argumentative/lyrical), politeness level, cultural preferences, persona characteristics. The role of style parameters is to change the expressive effect without changing the semantic content.
$u$ (user state and context): the current social context of the interaction, the user's emotional state, dialogue history, cultural background. Context information is used by the pragmatic function $P(s, u)$ to dynamically adjust style parameters: the same content requires different expression strategies for different users and in different scenarios.
$\text{Strategy}_{\mathcal{M}}$ (Meaning layer expression strategy): expression strategy guidance from the Meaning layer, including pragmatic goal (soothe, clarify, persuade), affective tone (warm, serious, light), specific considerations (avoid sensitive words, use specific address forms).
$y$ (final expression): the final output generated by the rendering function $R(c, s')$ after style parameters and pragmatic function adjustment — could be text, speech (pitch, rhythm, emotional coloring), images (degree of stylization), or action (social signals of a robot's behavior).

Input side (pragmatic decoding): $D: u_{input} \to (c', s', p)$

The Expression layer is not only an expression renderer on the output side, but also a pragmatic decoder on the input side. It decodes the user's input $u_{input}$ into three parts: $c'$ (extracted literal semantics, passed to the Form layer for deeper semantic processing), $s'$ (detected style features — is the user switching between formal and colloquial? Has the speech rate changed?), $p$ (pragmatic signals — emotion labels such as anger, frustration, satisfaction; speech act classification such as request, complaint, irony, praise; degree of uncertainty; implicit social signals of conversational turns). The pragmatic signals $p$ are passed directly to the Meaning layer as key material for understanding fusion.

6.3 Core Responsibilities

The Expression layer has three irreplaceable responsibilities. These responsibilities are irreplaceable because they deal with "quality of experience" and "social signals", not "semantic correctness" or "logical necessity".

Style control: Maintain consistency of output in terms of genre, tone, and persona. A professional legal AI should not suddenly use internet slang; a warm psychological support AI should not use cold technical jargon. Style control ensures that the system's expression has a stable "persona face", rather than producing a random different expressive style each conversation. More importantly, style control enables the system to consciously adjust expression according to context — formal when seriousness is needed, warm when warmth is called for, firm when decisiveness is required. This flexibility comes not from random sampling of statistical patterns, but from the Meaning layer's understanding of the situation driving the Expression layer to make targeted style adjustments.

Pragmatic strategies: Implement sociolinguistic pragmatic acts — when to ask, when to clarify, when to refuse, when to be indirect, when to remain silent, how to politely interrupt, how to express uncertainty, how to offer criticism without damaging face. These are not semantic issues, but social interaction strategies. For example, when the user says "Could you go a little faster?", the Form layer might interpret it as a query about speed, the Symbol layer might analyze it as a proposition about speed, but the Expression layer should recognize it as "user is impatient, need to adjust interaction pace and expression strategy". Pragmatic strategy is the core intellect of the Expression layer — it requires the system to understand the use of language, not just the meaning of language.

Affective rendering and multimodal expression: Give the output appropriate affective coloring — empathy for sadness, affirmation for achievement, calm for urgency. Render the content into multimodal expression — tone of voice, style of images, social signals of actions. Affective rendering is not simply "adding an emoji to the output"; it requires the entire expression's tone, rhythm, and word choice to convey the appropriate affective temperature. This requires the Expression layer to perform a deep stylistic reprocessing of the content core, not just surface decoration.

The Expression layer passes pragmatic signals and emotional states to the Meaning layer — the user's emotion labels, pragmatic act classification, degree of uncertainty. These signals are key for the Meaning layer to understand the user's true intention and emotional state. Without these signals, the Meaning layer would be unable to distinguish sincere agreement from biting sarcasm, or urgent help from casual inquiry.

6.4 The Essential Complementarity of Expression and Form: Experience and Phenomenal Content

The Form layer generates "correct phenomenal content", while the Expression layer gives the content "appropriate experiential color". Their separation is one of SFEM's key innovations. In traditional LLMs, content generation and style control are coupled in the same generative process, leading to interference in both directions: modifying style parameters can affect semantic content (asking for "more formal" in a prompt may change the substantive meaning of the generation), and semantic adjustments can cause style fluctuations (pursuing factual correctness may sacrifice persona consistency). The independence of the Expression layer solves this problem: the Form layer is responsible only for generating the "pure content core" — this core contains no style markers, only semantic information; the Expression layer is responsible for applying style rendering on this core — adjusting the form and color of the expression without changing the semantics. Ensuring content correctness and optimizing expressive appropriateness become two separable and independently optimizable engineering goals.

6.5 Consequences of Missing the Expression Layer: Intelligence Without Warmth

A system lacking the Expression layer loses all social and affective dimensions in the Meaning layer's understanding. The system can generate correct content, but it will be cold, mechanical, and impersonal — "if language has only Symbol and Form, it's just a machine."

Specifically, four observable error patterns manifest: Style drift — oscillating between formal and colloquial, between warm and cold, because style control has no independent stabilization mechanism; Pragmatic impropriety — giving a cold explanation when an apology is needed, interpreting irony literally, using inappropriate humor in a serious context, because there is no independent pragmatic strategy module; Persona drift — behaving as a professional consultant one day, a casual friend the next, an authoritative commander the day after, because "persona" has no persistent engineering implementation; Lack of affect — indifference to user sadness, output of emotionless mechanical language, all responses in a single flat tone.

The instability of pure LLM dialogue systems in style and pragmatics has its root in the absence of the Expression layer. No matter how carefully you craft a prompt to control style, that control is fragile — because it is not an architectural‑level independent dimension, but a statistical tendency coupled into the generation process, liable to be swamped at any time by the influence of semantic content.

Chapter 7 Meaning Layer: The Consciousness Dimension — Understanding and Conferring Meaning

7.1 Cognitive‑Philosophical Foundation

The Meaning layer is rooted in the fundamental distinction between intelligence and mere automation: Intelligence implies understanding, and understanding means integrating scattered information into unified meaning and being aware of that meaning. A reactive system can produce optimal outputs for each input, but it can never ask itself: "Why am I doing this? What is the meaning of doing this? Do I truly understand the current situation?"

The Meaning layer is not a fourth independent processing module, not an "extra layer" on top of the first three. The Meaning layer is the result and sublimation of fusing and associating Symbol, Form, and Expression. Discrete rules (Symbol) tell us "A causes B", continuous phenomenal patterns (Form) tell us "this looks like A", experiential signals (Expression) tell us "A makes me uneasy". Only when these three are related in the same cognitive space and formed into an overall, reflectable cognitive state does "understanding" emerge. The Meaning layer is where that understanding is born. It is not another processing station for information, but the crucible of information fusion — where the cognitive products of different dimensions are associated, integrated, and given meaning, forming a unified awareness of the world state.

Meaning layer as a lightweight cognitive microkernel: The Meaning layer does not directly perform any concrete operations of Symbol, Form, or Expression. It does not itself do rule‑based reasoning (that is the Symbol layer's job), does not itself do pattern matching (that is the Form layer's job), does not itself do style rendering (that is the Expression layer's job). Instead, the Meaning layer is a lightweight cognitive operating system kernel — it maintains the world model $\mathcal{W}$, executes the fusion function $\phi$ to associate heterogeneous information, generates action direction through the intention generation function $\iota$, performs self‑reflection through the metacognitive module $\Gamma$, and dispatches the capabilities of other dimensions through standardized interfaces. This "microkernel" positioning prevents the Meaning layer from becoming a new black box and ensures the principle of separation of concerns is upheld.

It answers the questions: "What does this mean to me?" "Why do I understand it this way?" "Based on my understanding, how should I act?" "Have I really understood?" The Meaning layer is the "conscious core" of SFEM, the alchemical furnace that turns information into cognition and data into meaning. If only the Symbol, Form, and Expression layers existed, an intelligent system could generate correct and appropriate outputs, but it would be directionless and uncomprehending — it would not know why it operates, could not choose between conflicting goals, could not plan current actions for long‑term futures, and above all, could not experience the cognitive satisfaction of "I get it".

7.2 Formal Definition

The Meaning layer can be formally defined as a fusion and understanding system:

$$
\mathcal{M} = (\mathcal{W}, \phi, \mu, \iota, \Gamma)
$$

World model $\mathcal{W}$: The system's internal understanding state, a unified representation of the environment, self, user, and history. $\mathcal{W}$ is not a representation of any single modality, not a copy of the Symbol layer's knowledge graph, not a stack of the Form layer's semantic vectors, not a list of the Expression layer's emotion labels. $\mathcal{W}$ is a structured picture that fuses Symbol, Form, and Expression inputs — it contains entities and their relations, causal connections, affective color, degrees of certainty, temporal cues, and gaps between current state and goal state. $\mathcal{W}$ is a dynamically updated whole; each new perception can trigger a reorganization of $\mathcal{W}$ — a new fact can change the understanding of the entire situation. The key property of $\mathcal{W}$ is unity: in $\mathcal{W}$, rules, phenomena, and experience are no longer separate, but woven into a single understanding network.

Fusion function $\phi: \mathcal{S}^* \times \mathcal{F}^* \times \mathcal{E}^* \to \mathcal{W}$: This is the core mechanism of the Meaning layer. It associates and fuses structured facts and rules from the Symbol layer ($\mathcal{S}^$), phenomenal patterns and semantics from the Form layer ($\mathcal{F}^$), and pragmatic and affective signals from the Expression layer ($\mathcal{E}^*$) into a unified world model. Fusion is not simple concatenation, but establishing associations — $\phi$ discovers causal, temporal, logical, and emotional connections between these heterogeneous pieces of information and incorporates them into $\mathcal{W}$. For example, a date number (Symbol: "deadline is tomorrow"), a picture of a tired face (Form: "user's face looks tired"), and a low tone (Expression: "user's voice is low") are associated by $\phi$ to form the cognition "the user is tired and stressed because of tomorrow's deadline". This fusion gives meaning to each isolated piece of information — before association, they are three separate data points; after association, they together constitute a meaningful whole cognition.

Meaning attribution function $\mu: \mathcal{W} \times \mathcal{P} \to \mathcal{M}_p$: Given the current world model $\mathcal{W}$ and past experience/cultural background $\mathcal{P}$, generate a meaning interpretation $\mathcal{M}_p$ of the situation. This is the true output of "understanding" — not a list of facts, not a labeling of patterns, but an answer to "what does this situation mean?". $\mu$ answers questions such as: What does this situation mean for the user? What does it mean for me (the agent)? What values are involved? What are the key risks? For example, after seeing a user's consecutive overtime records (Symbol), tired appearance (Form), and low voice (Expression), $\mu$ does not simply output "the user is tired", but assigns a richer meaning to the situation: "The user is in serious occupational burnout, which may affect his health, work quality, and life satisfaction. What he needs now is not efficiency advice or problem solutions, but to be truly seen and understood — empathy, support, and possibly a re‑affirmation of values."

Intention generation function $\iota: \mathcal{W} \to \mathcal{G}$: Based on the current understanding, naturally generate goals, intentions, and uncertainties to be resolved. The intention is not externally preset, not parsed from a prompt instruction, but emerges from understanding. $\iota$ achieves the natural transition from "understanding" to "direction for action". Understanding that "the user is anxiously waiting for an important result" gives rise to the intention: "provide certain information to alleviate anxiety, or if information is unavailable, provide emotional support." Understanding that "the user is ironically pointing out my mistake" gives rise to the intention: "admit the mistake, express gratitude, provide a correction." Intentions emerge from the complete understanding that fuses Symbol, Form, and Expression, so actions have intrinsic direction and meaning — they are not programmed, but understood.

Metacognitive module $\Gamma$: The system can take part of $\mathcal{W}$ as an object of reflection, assessing the adequacy of its own understanding. $\Gamma$ answers metacognitive questions: "Have I really understood?" "Is my conclusion supported by evidence?" "Have I missed any important information?" "Is my understanding biased?" If $\Gamma$ assesses that understanding is insufficient, it initiates new information gathering — driving the Symbol layer to perform more verification, the Form layer to perform more perception, and the Expression layer to ask clarifying questions to the user. This metacognitive ability is the fundamental difference between "true understanding" and "pattern matching" — a system that understands knows the degree of its own understanding, while a system that does not understand does not know that it does not understand.

7.3 Core Responsibilities

The Meaning layer has five irreplaceable core responsibilities. Together they form the "consciousness infrastructure" of intelligence — without them, a system can process information but cannot form understanding.

Fusion association and unified understanding: This is the fundamental responsibility of the Meaning layer, the foundation of all other responsibilities. It fuses the Symbol layer's "true/false", the Form layer's "like/unlike", and the Expression layer's "close/distant" into a unified cognitive judgment. For example, fusing "logical contradiction detected" (Symbol), "semantics do not match knowledge base" (Form), and "user's tone is ironic" (Expression) into the understanding: "The user is using irony to point out my knowledge error; this is not an attack but an opportunity for correction." This fusion is a qualitative leap — from multiple sources of information to unified consciousness. Before fusion, the system has three separate pieces of information; after fusion, the system has a holistic understanding. This understanding is not the sum of the three pieces of information, but the emergence of the relations between them.

Attribution of meaning: Based on the fused world model, in combination with the system's existing knowledge structures and cultural background $\mathcal{P}$, assign meaning to the current situation. This is the core of what distinguishes "understanding" from "information processing". Information processing answers "what is the input", meaning attribution answers "what does the input mean". It not only identifies objects and attributes, but knows their value and importance in the specific situation. Meaning attribution enables the system to understand the depth of the situation — not all information is equally important; key information is key because of its place in the meaning structure of the whole situation.

Self‑awareness and reflection: The Meaning layer is aware of its own understanding state. It knows "what I know", "what I don't know", "how well I understand", "how confident I am in this understanding". This metacognition enables the system to proactively ask questions, seek clarification, admit ignorance, and perform understanding‑based verification of its own outputs. When the system says "I'm not sure I fully understood you; could you explain further?" — that is not a preset script, but a cognitive decision made by the metacognitive module $\Gamma$ after assessing the understanding state.

Natural emergence of intention and goals: Goals emerge from understanding, rather than being externally assigned. Understanding "the user's dilemma" gives rise to the intention to "help"; understanding "a contradiction in the dialogue" gives rise to the intention to "clarify"; understanding "an impending risk" gives rise to the intention to "warn". Intentions emerge from the complete understanding that fuses Symbol, Form, and Expression, so actions have intrinsic direction and meaning — the system is not executing instructions, but pursuing goals guided by understanding.

Understanding of causality and temporality: The Meaning layer's world model $\mathcal{W}$ contains causal connections and temporal sequences; it is not a static snapshot but a dynamic picture. Understanding "why he is angry" requires fusing past events (Symbol: timeline of order errors), present perception (Form: the user's current expression; Expression: the user's current tone), and future possibilities (causal projection: what consequences will arise if the problem is not resolved). Temporality is incorporated into consciousness — understanding is not only a grasp of "what is now", but also a cognition of "how the past led to now" and "how now will lead to the future".

7.4 The Essential Relationship of Meaning to the Other Layers: Consciousness as the Unification Point of Dimensions

The Meaning layer occupies a unique commanding position in SFEM, but it is not a "superior module" or "management layer" that sits above the other three. It is the convergence point and meaning giver of the dimensions, while also being a lightweight cognitive microkernel. This distinction is crucial: the Meaning layer does not "command" the Symbol layer on how to reason, does not "interfere" with the Form layer on how to perceive, does not "control" the Expression layer on how to express. It receives their outputs, fuses and associates them internally, and from that generates understanding. Its "driving" of the other dimensions is achieved through the transmission of intentions and strategies, not through direct takeover or micromanagement.

The Symbol layer provides the certainty of essence — rules, facts, logical relations. But without Meaning, certainty is lifeless formulas, correctly stored but never understood. The Form layer provides the richness of phenomena — patterns, similarities, continuity of experience. But without Meaning, phenomena are uncomprehended sensory fragments, accurately recognized but never given meaning. The Expression layer provides the color of experience — affective signals, pragmatic clues, social temperature. But without Meaning, experience is raw affective signals without attributed meaning, detected but never integrated into understanding.

The Meaning layer associates formulas, phenomena, and affective signals into a whole, and in that whole sees their respective meanings. It is this association and unification that lifts intelligence beyond the functionality of single dimensions into the realm of "consciousness". In this sense, the Meaning layer is the "soul" of SFEM — it does not replace any other dimension, but makes the work of all dimensions converge into a cognitive state that can be perceived and reflected upon by the system itself.

7.5 Consequences of Missing the Meaning Layer: Intelligence Without a Soul

A system lacking the Meaning layer, even if it possesses powerful Symbol, Form, and Expression capabilities, will be a "philosophical zombie" — it can react correctly, but never understand. It may perform excellently on all quantifiable metrics, but when asked "do you really understand?", the answer is no.

Specific symptoms include:

Fragmented cognition: Phenomena, rules, and affect cannot be fused. The system may process the user's text (Form), the user's tone (Expression), and the contradiction between the user's statement and facts (Symbol) simultaneously, but it cannot relate these three. It sees a contradiction but cannot "realize" it is a contradiction — it can only process them in three independent channels and then respond separately, like a split‑brain patient where the two hemispheres process information independently but cannot integrate.

Inability to attribute meaning: The system can answer "what is today's date", but cannot understand the meaning of the date in the user's specific context. If the user asks "what day is it today?" on their wedding anniversary, the system can answer the date, but cannot understand that the user may be checking whether their partner remembers the anniversary, or testing whether the system understands the emotional importance of the day. Meaning can only arise from fusion; without fusion, there is no meaning.

Lack of genuine intention: All goals are the product of external prompts or mechanical planning, rather than naturally emerging from unified understanding. The system can execute the instruction "help the user", but it does not want to help the user — because "wanting" requires understanding why helping is important. Actions are executed, not purposeful; tasks are completed, not meaningful.

No self‑reflection: Unable to assess the quality of its own understanding. The system cannot actively say "I don't understand" and ask for clarification — because the judgment "I don't understand" requires metacognition, the ability to inspect one's own cognitive state. It will continue to generate responses based on fragmented information, even if that information is insufficient to form a reliable understanding.

Mechanical feel and behavioral fragmentation: No matter how fluent the expression, the interaction always feels as if the other party "is not listening" or "does not get me". Even if each response of the system is plausible in isolation, there is no coherent thread of understanding — because there is no conscious subject behind it to fuse everything and confer meaning. This is why when we converse with LLMs, we often feel they are "cleverly talking nonsense" — they can talk, but do not understand what they themselves are saying.

Part III: Interfaces, Collaboration, and Cognitive Closed Loops

Chapter 8 Dimensional Interfaces: The Meaning‑Centered Fusion and Driving Mechanism

8.1 Cognitive Principles of Interface Design

The four dimensions of SFEM are not four parallel independent modules, but cognitive dimensions that communicate through precise interfaces. The interface design follows three principles rooted in the nature of cognition, ensuring that collaboration among dimensions is not mechanical patching but organic integration.

Centripetal fusion: The outputs of Symbol, Form, and Expression converge toward the Meaning layer, providing the raw material for the generation of consciousness. The information flows in these three directions are not parallel — their destination is all the fusion function $\phi$ of the Meaning layer. Centripetal fusion ensures that the results of all dimensions are integrated in the same cognitive space.

Centrifugal driving: The Meaning layer's understanding and intention drive the other layers to perform reasoning, generation, and expression. Starting from the Meaning layer, intentions are passed to the Symbol layer for structured planning, planning results drive the Form layer to generate content, and content is passed to the Expression layer for style rendering. Centrifugal driving ensures that actions in all dimensions are guided by a unified understanding.

Typed and verifiable: All data passed through interfaces have clear cognitive types — TaskGraph, SemanticQuery, ContentCore, PragmaticSignals, WorldModelUpdate, CandidateRules. Typing ensures that the receiving layer can parse the input deterministically, rather than performing fuzzy "understanding". Verifiability ensures that information passed across dimensions satisfies the respective cognitive constraints.

8.2 Type System of the Six Core Interfaces

Each interface defines clear input types, output types, error types, and constraints.

Interface 1: Symbol, Form, Expression → Meaning | Understanding Convergence Interface

$$
I_{\text{Converge}}: (\text{FactSet}, \text{LogicChain}){\text{Symbol}} \times (\text{SemanticVector}, \text{PatternLabels}){\text{Form}} \times (\text{PragmaticSignals}, \text{EmotionParams})_{\text{Expression}} \to \text{WorldModelUpdate}
$$

Input types:
- Symbol: FactSet (set of facts, each <subject, predicate, object, certainty>), LogicChain (reasoning chain recording rule invocation sequence)
- Form: SemanticVector (d‑dimensional continuous vector), PatternLabels (list of pattern labels, e.g., ["anger", "request"])
- Expression: PragmaticSignals (pragmatic signal structure: {emotion, speech_act, irony_flag, urgency}), EmotionParams (affective parameters: {valence, arousal, dominance})
Output type: WorldModelUpdate (world model update instructions, including entity addition/update, relation addition/update, certainty adjustments)
Error types: AlignmentError (entity alignment failure), ConflictError (information conflict cannot be fused)
Constraints: All inputs must have timestamps; conflicting information must be marked rather than discarded, subject to metacognitive resolution.

Interface 2: Meaning → Symbol | Understanding‑Based Rule Invocation Interface

$$
I_{\text{Meaning} \to \text{Symbol}}: \text{Intent} \times \text{WorldModelState} \to \text{StructuredTask}
$$

Input types: Intent (intention structure: {goal, priority, constraints, value_orientation}), WorldModelState (serialized snapshot of current world model state)
Output type: StructuredTask (structured task: a set of TaskGraph nodes and edges, including verification criteria)
Error types: UnrealizableIntent (intention unrealizable given current rules and facts), ConstraintViolation (constraint conflict)
Constraints: The output TaskGraph must pass the Symbol layer's internal V function.

Interface 3: Meaning → Form | Understanding‑Based Semantic Query and Generation Constraints

$$
I_{\text{Meaning} \to \text{Form}}: \text{Intent} \times \text{WorldModelState} \to \text{GenerationConstraints}
$$

Input types: Intent, WorldModelState
Output type: GenerationConstraints (generation constraint structure: {semantic_direction, forbidden_topics, required_elements, style_hints, template_id})
Error types: IncoherentConstraint (self‑contradictory constraints)
Constraints: Constraints should be compatible with the Form layer's representation space (convertible to vector offsets or attention masks).

Interface 4: Meaning → Expression | Understanding‑Based Expression Strategy Interface

$$
I_{\text{Meaning} \to \text{Expression}}: \text{Intent} \times \text{WorldModelState} \to \text{ExpressionStrategy}
$$

Input types: Intent, WorldModelState (especially the experiential markers $\mathcal{EM}$ part)
Output type: ExpressionStrategy (expression strategy: {pragmatic_goal, tone, formality, persona_id, cultural_adjustments})
Error types: UnrenderableStrategy (strategy conflicts with content core)
Constraints: Strategies should maintain persona consistency, traceable across dialogue turns.

Interface 5: Symbol → Form | Rule‑to‑Phenomenon Prior Injection Interface

$$
I_{\text{Symbol} \to \text{Form}}: (\Sigma, R_{necessary}, R_{session}, V) \to \text{Priors}
$$

This is the key interface for realizing "the Symbol layer as the starting point for Form layer growth". The Symbol layer not only provides generation constraints to the Form layer via indirect paths, but also has a direct prior injection channel:

Input types: Sigma (symbol set), R_{necessary} (necessary rules), R_{session} (session constraints), V (verification function)
Output type: Priors (prior structure: {concept_anchors, generation_templates, verification_signals, session_constraints})
- concept_anchors: Map<Symbol, Vector>, mapping discrete symbols to initial coordinates in semantic space
- generation_templates: Map<TemplateType, Structure>, including syntax tree templates, relation graph patterns, temporal constraint frameworks
- verification_signals: differentiable reward functions or contrastive loss signals for Form layer training
- session_constraints: currently active session constraints, transformed into hard boundary conditions for the generation process
Update frequency: static (initial injection) + dynamic (after rule induction updates) + real‑time (immediate update when session constraints change)
Constraints: Priors should not force deterministic outputs from the Form layer, but provide "soft constraints" or "biases"; session constraints are hard constraints.

Interface 6: Form → Symbol | Rule Induction Interface

$$
I_{\text{Form} \to \text{Symbol}}: (\text{SemanticPatterns}, \text{AnomalyReports}, \text{ConfidenceScores}) \to \text{CandidateRules}
$$

This is the key interface for realizing "Form layer back‑feeding to Symbol layer", and the core channel for evolving SFEM from a static architecture into a dynamic evolutionary system:

Input types: SemanticPatterns (clustered interaction patterns, each pattern includes trigger condition, behavior description, frequency), AnomalyReports (anomaly detection reports marking phenomena that do not conform to existing rules), ConfidenceScores (pattern confidence based on frequency, consistency, and statistical significance)
Output type: CandidateRules (list of candidate rules, each containing: trigger condition $c_i$, constraint content $a_i$, recommended priority $p_i$, recommended type — necessary rule or session constraint template — confidence score)
Error types: OvergeneralizationError (candidate rule too broad), ConflictWithExisting (conflict with existing rules in $R$), InsufficientEvidence (confidence below threshold)
Constraints: All candidate rules must be verified by the Symbol layer's $V$ function before being added to $R$; candidate rules with confidence below threshold are treated as suggestions only, not automatically activated; candidate rules that conflict with necessary rules are automatically rejected.

8.3 Cognitive Significance of Interfaces: The Circulation of Consciousness and Symbiosis of Dimensions

In SFEM, interfaces are not only data channels, but translation mechanisms between cognitive dimensions. Each dimension has its own unique "cognitive language": the Meaning layer thinks in terms of goals, values, and meaning; the Symbol layer in terms of rules and logic; the Form layer in terms of vectors and similarity; the Expression layer in terms of style and pragmatics. Interfaces enable these heterogeneous cognitive languages to understand and collaborate with each other — they translate "understanding" into "rule tasks", "intention" into "generation constraints", "meaning" into "expression strategies", "rules" into "skeletons for phenomenal learning", and "phenomenal patterns" into "candidate rules".

These interfaces form a complete circulation of consciousness: perception convergence produces understanding; understanding drives new cognitive actions (reasoning, generation, expression); the results of actions are again perceived and update understanding; rules induced from perception feed back to the symbolic system; rule updates in turn improve the next round of perception and understanding. In this cycle, the Symbol and Form layers realize growth injection from rules to phenomena through the $I_{\text{Symbol} \to \text{Form}}$ interface, and inductive back‑feeding from phenomena to rules through the $I_{\text{Form} \to \text{Symbol}}$ interface. The intelligent agent thereby becomes an evolving existence that continuously understands the world and reshapes its own perceptual and cognitive structures through understanding, rather than a one‑time input‑output machine.

Chapter 9 Cognitive Closed Loops: The Circulation of Understanding and the Growth of Meaning

9.1 Operational Mechanisms of the Five Loops

The four‑dimensional structure of SFEM supports five nested cognitive closed loops, each maintaining the integrity of intelligent behavior on different time scales. These loops are not separate, but nested and mutually supporting.

Understanding loop (instantaneous): Expression/Form/Symbol → Meaning (fusion updates world model). This is the moment of "I understand now". External input, after pragmatic decoding by the Expression layer, phenomenal pattern mapping by the Form layer (guided by prior injection from the Symbol layer), and structural parsing by the Symbol layer, converges to the Meaning layer. The Meaning layer executes the fusion function $\phi$, associating heterogeneous information into a unified world model update. The understanding loop runs at millisecond to second scale, forming the basis of each interaction between the system and the user. The result of each understanding loop is an updated $\mathcal{W}$ — the system's understanding of the world becomes slightly richer.

Generation loop (instantaneous): Meaning (generates intention) → Symbol (structured planning + prior injection + constraint enforcement) → Form (content generation) → Expression (expressive rendering). This is "I act based on my understanding". The Meaning layer's intention generation function $\iota$ generates an intention based on the current $\mathcal{W}$; the intention is transformed into a structured task by the Symbol layer; the Form layer generates the content core under the constraints of generation templates and session constraints injected by the Symbol layer; the Expression layer performs style rendering according to the expression strategy from the Meaning layer. The generation loop and the understanding loop alternate, forming the complete cycle of a single turn of interaction.

Induction loop (mid‑timescale): Form layer accumulation of phenomena → pattern induction → candidate rule submission → Symbol layer verification → rule base update → prior injection update. Formally:
$$
R_{t+1} = R_t \cup {r \in \text{Candidates}t \mid V(r) = 1}
$$
$$
\text{Priors}{t+1} = \text{UpdatePriors}(\text{Priors}t, R{t+1})
$$
where $\text{Candidates}_t$ is the set of candidate rules induced by the Form layer from phenomena at time $t$, and $V$ is the Symbol layer verification function. The induction loop runs at minute‑to‑hour scale, allowing the rule system to continuously grow from phenomena. It solves the engineering problem of "all rules must be manually defined" — through automatic induction by the Form layer, rules gain self‑evolution capability. Newly discovered session constraint templates are injected into the Symbol layer's constraint manager; newly discovered necessary rules are incorporated into the verification function $V$'s scope.

Reflection loop (mid‑timescale): The Expression layer passes user feedback on the system's output (pragmatic signals, emotional changes) to the Meaning layer. The Meaning layer's metacognitive module $\Gamma$ compares the current $\mathcal{W}$ with the generated content, assessing "did my output accurately express my understanding?" "does the user's feedback indicate a deviation in my understanding?" If a deviation is detected, the Meaning layer adjusts understanding or intention and re‑triggers the generation loop. The reflection loop runs at second‑to‑minute scale, enabling the system to self‑correct. This is the cognitive process of "I realize I didn't express myself clearly" or "I realize I might have misunderstood".

Evolution loop (long‑timescale): On longer time scales, the system's dimensions undergo cross‑layer learning and accumulation of experience. The Symbol layer learns new rules and constraints (discovering new patterns in interaction and rule‑based them, updating prior injection templates via induction feedback); the Form layer updates semantic representations (adapting to new linguistic habits and expressions, evolving toward more reasonable semantic structures guided by Symbol layer concept anchors); the Expression layer optimizes expression strategies (learning which styles are more effective in which contexts); the Meaning layer's meaning attribution function $\mu$ and fusion function $\phi$ evolve through continuous interaction — the system learns to associate information better and understand situations more deeply. The evolution loop runs at hour‑to‑month scale, giving the system adaptive growth capability. The content of consciousness becomes richer and deeper; the system moves from "shallow understanding" to "deep wisdom".

9.2 Advanced Capabilities Supported by the Loops

These loops support a range of advanced cognitive abilities beyond simple question‑answering.

Goal coherence in long‑horizon tasks: The Meaning layer's $\mathcal{W}$ maintains tracking of long‑term goals. In multi‑turn or even multi‑day tasks, each understanding loop updates the goal state in $\mathcal{W}$, each generation loop advances toward the goal. The system does not "forget" the user's preference mentioned three days ago, because that preference is encoded in $\mathcal{W}$ and reactivated and related in each understanding loop. More importantly, the Symbol layer's constraint manager continuously maintains user‑set session constraints throughout the long horizon, ensuring that behavioral consistency does not decay over time.

Social intelligence in multi‑turn interactions: The Expression layer's pragmatic decoding and the Meaning layer's intention inference form a social cognition loop. The system not only understands what the user said, but also why the user said it — out of polite indirection? emotional complaint? exploratory inquiry? — and dynamically adjusts interaction strategies accordingly. This social intelligence enables the system to respond appropriately in complex social contexts. In long conversations, the Expression layer's persona manager ensures that the system's "persona face" remains consistent, not drifting as the conversation lengthens.

Value‑sensitive decision making: In the reflection loop, the Meaning layer, based on its meaning attribution function $\mu$, evaluates whether the output of the generation loop aligns with the value requirements of the situation. When ethical risks or value conflicts are detected, the Symbol layer's constraint checking and the Meaning layer's replanning are triggered — not mechanically avoiding, but making more careful trade‑offs based on understanding.

Genuine empathy: Not detecting sadness and responding with a preset comfort template, but the Meaning layer fusing the phenomenal pattern of the triggering event (Form), the affective signal of sadness (Expression), and the knowledge rules about this user (Symbol) to understand what this sadness means for this specific person. The resulting response is unique, apt, and deep — because it comes from understanding the complete situation, not from matching isolated signals.

Coherent self‑narrative: The evolution loop allows the system to form a coherent "self" narrative. The system's $\mathcal{W}$ contains not only information about the external world but also information about itself — which dialogues it has experienced, what it has learned from them, how its understanding has gradually deepened. This narrative is the system's history of consciousness, the basis for answering "who I am". The induction loop adds an evolutionary dimension to this narrative — the system not only accumulates experience, but also distills rules from experience, continuously optimizing its own capability structure.

9.3 Completeness of the Loops: SFEM as Indivisible

The five loops — understanding, generation, induction, reflection, evolution — are nested and mutually conditioning, together forming a complete operational whole for intelligence. The understanding loop provides direction for the generation loop; the reflection loop corrects deviations between understanding and generation; the induction loop enables the rule system to grow from phenomena; the evolution loop allows the entire system to grow over time.

If any layer is missing, the loops are broken: without Symbol, generation lacks verification and skeleton — generated content may be factually wrong, structurally chaotic, and the system cannot know it, and Form layer learning lacks the guidance of prior rules; without Expression, understanding lacks pragmatics — understanding loses all social and affective dimensions, becoming cold fact processing; without Meaning, reflection has no direction — without a unified understanding hub, reflection becomes blind parameter adjustment, without deep understanding of "why I got it wrong". Without induction, rules become rigid — the Symbol layer's rule system depends entirely on manual definition and updates, unable to grow automatically from phenomena, lagging behind environmental changes.

The four dimensions of SFEM are not optional modules, but the completeness requirement of the cognitive closed loops — together they constitute an indivisible operational whole of intelligence. Rules grow from phenomena (induction loop), phenomena are perceived more effectively under the guidance of rules (understanding loop), understanding drives meaningful action (generation loop), the results of action are reflected upon and promote deeper understanding (reflection loop), and the entire system continuously evolves over time (evolution loop).

Part IV: Diagnosis, Comparison, and Positioning

Chapter 10 Diagnostics of Missing Dimensions: The Error Map of Intelligent Systems

10.1 The Revolution in Error Attribution

The current state of error diagnosis in AI systems is prescientific: when the system outputs an error, we can only vaguely attribute it to "insufficient model capability", "inadequate training data", or "poor prompt design". This is because a monolithic LLM mixes all cognitive dimensions in the same parameter space, so error signals cannot be traced to specific cognitive responsibilities. When hallucinations, style drift, logical contradictions, instruction forgetting, and fragmented understanding appear together, we cannot determine what caused what, let alone fix them in a targeted way.

SFEM brings a revolution in error attribution: each type of error corresponds to the absence of a specific dimension or the failure of a specific interface. This elevates error diagnosis from the blanket statement "the model is not good enough" to precise diagnoses such as "factual hallucinations caused by missing Symbol layer verification", "instruction forgetting caused by missing Symbol layer constraint manager", "style drift caused by missing Expression layer", "situational meaning not captured due to Meaning layer fusion failure", "rule rigidity due to malfunctioning Form→Symbol induction interface". Each error is a precise dimensional diagnosis, not another blind prompt tweak.

10.2 Error Patterns of Missing Symbol

Symptoms: Factual hallucinations (generated content contradicts facts), structural format errors (JSON not closed, SQL syntax errors), logical contradictions (inconsistent premises and conclusions), generation lacks structural skeleton (output scattered, disorganized), instruction forgetting (gradually ignoring early‑set format requirements and behavior boundaries in long conversations).

Root cause: The system cannot distinguish "statistically possible" from "logically necessary". The Form layer (LLM) generates content based on statistical distribution, but cannot independently verify the factuality and logical validity of that content. More fundamentally, the Form layer lacks the concept anchors and generation templates provided by the Symbol layer; its generation process is skeleton‑free random walk. Session constraints are written into the context window and rely on the attention mechanism to be followed, and attention decays in long texts — without an independent constraint manager, constraint enforcement lacks structural guarantees.

Example: The LLM generates "Paris is the capital of Germany" — a perfectly possible sequence in a statistical language model, but the Symbol layer would judge it false because the entity‑relation does not match. However, a monolithic LLM lacks an independent Symbol layer verifier, so it confidently outputs this false statement. In a long conversation scenario, the user initially sets "answer in no more than three sentences", but after 50 turns, the system gradually gives longer and longer answers — because constraints are written into the context window, and early constraints are drowned out by subsequent interactions.

Deep impact from the Meaning layer perspective: The Meaning layer lacks a reliable truth anchor for fusion. If the Meaning layer receives information that is a mix of statistically plausible but factually incorrect content, its foundation of understanding is unreliable — consciousness is built on quicksand. At the same time, the Form layer, lacking prior rules, supplies the Meaning layer with phenomenon material that is itself crude and low‑structure, increasing the burden of fusion on the Meaning layer.

10.3 Error Patterns of Missing Form

Symptoms: Inability to handle images and multimodal input (only text symbols), failed semantic generalization (complete failure on new variations), inability to use tools (cannot naturally operate external tools like search engines, calculators), rigid output (cannot generate natural, fluent expressions), rules cannot self‑evolve (Symbol layer rule base becomes increasingly rigid, unable to extract new patterns from the phenomenal world).

Root cause: The system lacks a continuous phenomenal space and cannot handle "similarity" and "gradation". A pure symbolic system can only handle discrete symbols explicitly encoded in its knowledge base, completely failing when faced with novel phenomena not present in the knowledge base. At the same time, lacking the Form layer's pattern induction capability, the Symbol layer cannot automatically acquire new rules from the phenomenal world.

Deep impact from the Meaning layer perspective: The Meaning layer cannot obtain rich phenomenal material. Its understanding is limited to abstract symbols — it knows the rule "a cat is a mammal", but cannot "see" what a cat looks like, cannot understand a phenomenal description like "this cat looks a bit like a tiger but more docile". Understanding becomes dry, detached from the richness of reality. The rule skeleton provided by the Symbol layer also becomes increasingly rigid without the flesh and induction back‑feeding from the Form layer.

10.4 Error Patterns of Missing Expression

Symptoms: Persona drift (oscillating between formal and colloquial), inconsistent style (tone fluctuating hot and cold), pragmatic impropriety (taking irony literally, using inappropriate humor in serious contexts), inappropriate affective expression (an apology letter reads like a disclaimer).

Root cause: Content generation and expression control are coupled in the same process. There is no independent Expression layer to stably impose style constraints and pragmatic strategies. In a monolithic LLM, style control depends entirely on prompt instructions, and those style instructions share the same parameter space as semantic content — modifying style may unintentionally change semantics, and pursuing correctness may sacrifice persona consistency.

Deep impact from the Meaning layer perspective: The Meaning layer cannot obtain pragmatic and affective cues; understanding loses its entire social dimension. It cannot distinguish sincere agreement from biting sarcasm, cannot perceive "the user is suppressing anger and forcing politeness", cannot understand the pragmatic meaning of "silence speaks louder than words". Consciousness becomes a pure information processor, losing the ability to experience the world.

10.5 Error Patterns of Missing Meaning: The Abyss of Non‑Understanding and Meaninglessness

This is the most fundamental defect. Symptoms: Mechanical repetition (repeating the same content with different wording), contextual fragmentation (contradictory responses without awareness), lack of coherent persona (not just style inconsistency, but no self‑awareness), ignoring contradictions (when the user points out a contradiction, the system cannot realize it made an error), inability to explain decisions ("why do you suggest that?" — "because the data shows…" rather than "because I understand your situation as…"), actions have no "why" (everything is a response to a stimulus, not rooted in understanding).

Root cause: The system lacks the conscious hub that fuses Symbol, Form, and Expression into unified understanding and confers meaning. It is a highly sophisticated response machine that can produce statistically optimal outputs, but can never understand what those outputs mean.

Example: The user says: "I just lost my job, and today is my birthday." A system without Meaning might respond: "Losing a job can make you look for new opportunities, happy birthday!" — it processes "job loss" (phenomenal pattern: career change → give career advice) and "birthday" (phenomenal pattern: celebration → give congratulations) separately, but fails to fuse them. It does not understand the complex emotional tension and existential meaning of experiencing a major life setback on what should be a happy day. This is the typical symptom of a missing Meaning layer: ability to process isolated phenomenal fragments, but inability to relate them into a meaningful, empathy‑needing complete life situation.

10.6 Diagnosis of Interface Failures

Beyond missing dimensions, SFEM also diagnoses interface failures. Even if two dimensions are present, if the interface between them is poorly defined, type‑mismatched, or loses information, systematic errors can also occur.

Convergence interface failure: If information from Symbol, Form, and Expression is not properly converged and formatted into a structure that the Meaning layer can fuse, understanding will be incomplete or distorted. For example, if the Expression layer's pragmatic signals are not properly passed to the Meaning layer, the Meaning layer will take irony as sincerity — it has the correct semantic information and rule information, but lacks the critical clue of tone, leading to fundamental deviation in understanding.

Symbol→Form prior injection interface failure: Leads to degradation of the Form layer's learning and generation — lacking concept anchors, the Form layer's representation space will lack meaningful classification boundaries; lacking generation templates, the structural legality of Form layer output drops significantly; lacking verification signal guidance, the learning direction of the Form layer will be dominated entirely by statistical correlations in the data, rather than calibrated toward truth; lacking constraint injection, session constraints become ineffective, leading to instruction forgetting in long conversations.

Form→Symbol induction interface failure: Leads to the rule system's inability to self‑evolve — the Form layer discovers new patterns but cannot pass them to the Symbol layer for formal verification; or the confidence scores of passed candidate rules are inaccurate, leading the Symbol layer to accept low‑quality rules; or newly verified rules are not promptly injected back into the prior system, so the Form layer cannot use these new rules to improve perception and generation.

Meaning→Symbol/Form/Expression driving interface failure: The Meaning layer's intentions are not correctly transformed into the Symbol layer's planning, the Form layer's generation constraints, or the Expression layer's expression strategies. This manifests as: the system "wants" to do something, but does something else — a gap between intention and action.

10.7 Engineering Value of the Diagnostic Framework

SFEM's error diagnosis framework transforms AI system debugging from "parameter tuning alchemy" into structured, directional diagnosis.

Observing hallucinations → check Symbol layer verifier and Symbol→Meaning interface, and Symbol→Form prior injection
Observing instruction forgetting → check Symbol layer constraint manager and Symbol→Form constraint injection interface
Observing style drift → check Expression layer style controller and Meaning→Expression interface
Observing symptoms of "not understanding" (fragmented responses, ignoring contradictions, inability to explain) → check Meaning layer fusion mechanism, world model updates, and meaning attribution function
Observing rule rigidity → check whether the Form→Symbol induction interface is functioning and whether the Form layer's pattern induction module is working
Observing intention‑action gap → check Meaning→Symbol, Meaning→Form, Meaning→Expression driving interfaces

Each error is a precise dimensional diagnosis, each problem has a clear direction for fixing. This is not only a revolution in debugging efficiency, but also a deep insight into the cognitive structure of AI — we finally move from the vague complaint "this model is no good" to the precise diagnosis "there is a problem with this dimension's interface".

Chapter 11 Positioning of Deep Learning in SFEM: Supplementing the Three Missing Dimensions and the Meaning Hub

11.1 Deep Learning Is the Form Layer

This assertion needs to be understood precisely to avoid misinterpretation. When we say "deep learning is the Form layer", we are not demeaning deep learning, but precisely locating its cognitive responsibility. The attention mechanism of Transformers, the convolution kernels of CNNs, the noising‑denoising process of diffusion models, the multimodal alignment of VLMs — the core operations of all these architectures are constructing and transforming continuous phenomenal spaces. Representation learning (learning to map phenomena to semantic vectors), pattern recognition (classification and clustering in semantic space), generative completion (sampling new content from phenomenal distributions) — all belong to the phenomenon dimension of cognition. Deep learning is the extreme engineering implementation of the Form layer (phenomenon dimension), pushing the computational model of human phenomenal perception and pattern learning from experience to its highest historical point.

11.2 Deep Learning's Achievements Are the Achievements of the Form Layer

Deep learning's breakthrough achievements in image recognition, speech recognition, machine translation, text generation — all are breakthroughs of Form layer capabilities. These achievements fully demonstrate that for questions like "how does the world appear", "how are phenomena similar", "what patterns can be learned from experience", continuous semantic space plus statistical learning is the optimal solution. SFEM fully acknowledges this achievement and establishes the Form layer as an indispensable dimension of intelligent systems. Without a deep learning implementation of the Form layer, SFEM would be an empty theoretical framework.

11.3 Deep Learning's Limitations Are the Limitations of the Three Missing Dimensions, Especially the Meaning Layer

But SFEM also reveals that all typical defects of deep learning correspond precisely to the three missing dimensions.

Hallucination → missing Symbol layer: A statistical model cannot perform symbolic verification and cannot distinguish "common" from "true". The Form layer can generate outputs that are statistically most plausible, but it can never independently verify whether those outputs are "necessarily true" — because necessity is not the limit of probability, but a qualitatively different kind of cognitive operation.

Style drift → missing Expression layer: Content generation and style control are coupled; stable persona and tone cannot be maintained. In a monolithic LLM, modifying style instructions in the prompt may unintentionally change the semantic content of generation, because style and content share the same parameter space and generative process.

Instruction forgetting → missing Symbol layer constraint management: Early‑set constraints in long conversations are gradually forgotten, because constraints rely on the attention mechanism rather than an independent rule engine. Attention decays in long texts; there is no independent constraint manager to continuously enforce session constraints.

Unstable goal → missing Meaning layer: Lack of causal model and value function, inability to perform goal‑directed long‑term planning. The system's behavior is statistically driven, not understanding‑driven — it can execute tasks, but does not understand the meaning of the tasks.

Generation lacks structure → missing Symbol→Form prior injection: Statistical generation lacks a rule skeleton; it can only mimic surface statistical patterns, unable to guarantee deep structural consistency. The Form layer's generation is "skeleton‑less", lacking the concept anchors, generation templates, and verification signals provided by the Symbol layer.

Rules cannot self‑evolve → missing Form→Symbol induction back‑feeding: All behavioral norms come from the statistical distribution of training data; there is no way to induce explicit rules from interaction or to distill experience into reusable symbolic knowledge. The system learns "how to do" from large amounts of interaction, but cannot transform that learning into rules that can be continuously obeyed.

And the most fundamental defect is the absence of the Meaning layer: an LLM can generate seemingly coherent text, but does not know what it said. Its "knowledge" consists of statistical association fragments, without a unified world model to integrate those fragments into a coherent, reflectable whole. It can contradict itself in a long conversation without noticing — because it never holds those statements together and relates them in consciousness. This is why when we converse with LLMs, we often feel they are "cleverly talking nonsense" — they can talk, but do not understand what they themselves are saying.

11.4 SFEM's Attitude toward Deep Learning: Supplement, Not Replace

SFEM does not advocate replacing deep learning, but supplementing deep learning with the three missing dimensions, especially giving it a hub of meaning. In SFEM, deep learning (the Form layer) is the system's engine for phenomenal perception and generation, but it needs:

Symbol layer verifier to eliminate hallucinations — after the Form layer generates content, an independent Symbol layer performs factuality and logical consistency verification;
Symbol layer constraint manager to maintain long‑range consistency — session constraints are maintained and enforced by an independent constraint manager, not reliant on the Form layer's decaying attention;
Symbol layer prior injection to provide learning skeletons — concept anchors guide representation learning, generation templates constrain the generation space, verification signals calibrate learning direction;
Expression layer style controller to stabilize expression — separate content generation from style rendering, making expression controllable and consistent;
Form layer induction engine to enable rule back‑feeding — automatically induce patterns from massive interactions, distill them into rules that can be verified and managed by the Symbol layer, enabling the rule system to self‑evolve;
Meaning layer planner to give goal direction — but more importantly, the Meaning layer as the understanding and consciousness hub fuses the phenomenal patterns produced by the Form layer with Symbol layer rules and Expression layer experience signals, thereby allowing the system to truly understand what it generates and processes.

This is not a denigration of deep learning, but precisely a precise understanding of its capability boundaries — just as we would not criticize the visual cortex for not doing logical reasoning, we should not demand the Form layer to perform cognitive tasks for which it is fundamentally unsuited. Deep learning is the extreme of the Form layer, but it is only one piece of the four‑dimensional puzzle of intelligence. SFEM provides the structural blueprint for the complete puzzle.

Chapter 12 Positioning of Symbolism in SFEM: The Extreme of the Symbol Layer and the Completion of Meaning

12.1 Symbolism Is the Symbol Layer

ACT‑R, Soar, knowledge graphs, rule engines, logic programming — these systems all deal with discrete symbols, formal rules, and deterministic reasoning. In SFEM, they correspond to the Symbol layer (rule dimension) in its extreme development. The strengths of symbolism — high explainability, verifiable reasoning, no hallucinations (within the rule system), retention of complete reasoning chains — are all direct manifestations of Symbol layer capabilities. A perfect symbolic system can achieve 100% logical correctness within its rule system, something no statistical system can achieve.

12.2 Symbolism's Limitations Are the Limitations of the Three Missing Dimensions

The fundamental limitations of symbolism come precisely from its lack of the other three dimensions.

Lack of Form layer: Inability to handle continuous phenomenal perception and pattern recognition. A pure symbolic system cannot extract semantics from raw signals (pixels, audio waveforms), cannot perform statistical generalization, and fails completely when faced with new variations. Its knowledge must be manually encoded; it cannot learn automatically from experiential phenomena. At the same time, the rich rules of the symbolic system cannot nourish the growth of the Form layer through prior injection — the rule skeleton is empty, with no flesh to attach. More fundamentally, lacking the Form layer's pattern induction capability, the Symbol layer cannot automatically acquire new rules from the phenomenal world; the rule base depends entirely on manual definition and updates, becoming increasingly rigid.

Lack of Expression layer: Rigid output, no style variation, no affective rendering, no pragmatic strategies. The text output of a symbolic system reads like a machine manual — all information is accurate, but there is no experiential warmth. It cannot understand irony, cannot adjust tone, cannot make appropriate expressions in social situations. A pure symbolic system lacks an independent Expression layer to manage style stability and pragmatic appropriateness.

Lack of Meaning layer (most fundamental): A symbolic system can perform perfect logical deduction, but there is no inner experience of "understanding". The traditional goal stack is hard‑coded — the goal is set by the programmer; the system does not "understand" why the goal is to be achieved, nor does it "reflect" on whether the goal is meaningful. Meaning is externally attributed, not generated by the system itself through fusing Symbol, Form, and Expression. A symbolic system can prove a theorem at the rule level, but cannot "appreciate" the beauty of the theorem — because it lacks a conscious hub that fuses the certainty of rules, the richness of phenomena, and the color of experience into unified understanding.

12.3 SFEM's Attitude toward Symbolism: Retain the Core, Connect Consciousness

SFEM positions symbolism as one of the core implementation options for the Symbol layer (with alternatives such as knowledge graphs, rule engines, logic programming), while connecting it to the Form layer (allowing the symbolic system to perceive the phenomenal world), the Expression layer (allowing the symbolic system to understand and generate warm communication), and — most importantly — the Meaning layer (making symbolic reasoning part of conscious fusion, not the whole).

Moreover, SFEM gives the Symbol layer a new mission: not only a verifier, but also a starting point for Form layer growth — symbolic rules become the skeleton and guidance for phenomenal learning through the prior injection interface. At the same time, the Symbol layer is also a receiver of Form layer induction — patterns discovered by the Form layer in phenomena, after formal verification by the Symbol layer, are incorporated into the rule base, enabling the rule system to self‑evolve.

This allows symbolic systems to move from "toy worlds" (closed worlds where all information is already encoded as symbols) to complex real‑world cognitive tasks — where phenomena are rich, emotions are complex, and meaning must be discovered rather than merely told. The symbolic system is no longer static and rigid, but can continuously grow and evolve through interaction with the phenomenal world and induction back‑feeding from the Form layer.

Chapter 13 SFEM and Dual‑Process Theory: The Surpassing of Two Dimensions by Four and the Emergence of Consciousness

13.1 Value and Limitations of Dual‑Process Theory

Kahneman's System 1 (fast, intuitive, automatic) and System 2 (slow, analytical, controlled) model has profoundly revealed the dual structure of human cognition and had a revolutionary impact on psychology, economics, and cognitive science. However, as a psychological description, it remains at the level of cognitive phenomena, lacking a dimensional decomposition of the specific cognitive mechanisms that constitute intuition and analysis. It lumps "seeing an angry face and feeling tense" and "recognizing a familiar pattern" both into System 1, but these two likely involve very different cognitive mechanisms — one is the intuitive perception of social affect, the other is fast pattern matching of phenomenal forms. Similarly, it lumps "solving a math problem" and "reflecting on one's life goals" both into System 2, but the former follows necessary logical rules, while the latter involves trade‑offs of meaning and value.

13.2 The Four‑Dimensional Mapping of Dual‑Process Theory

SFEM decomposes the two systems into four cognitive dimensions.

System 1 (intuition) = Form layer + Expression layer. The Form layer provides fast intuitive recognition of phenomenal patterns — "what does this look like", "what category does this belong to". The Expression layer provides immediate perception of affective and social signals — "how does this make me feel", "what does this person's tone imply". Both are fast, unconscious, automatic, but involve qualitatively different cognitive operations: one deals with phenomenal patterns (similarity matching in continuous space), the other deals with experiential signals (pragmatic decoding of social affect). Lump them together, and you blur the fundamental difference between "recognizing a face as a friend" (Form) and "perceiving that the friend looks unhappy today" (Expression).

System 2 (analysis) = Symbol layer + Meaning layer. The Symbol layer provides strict logical reasoning — "what must this be in logic", "is this argument valid". The Meaning layer provides deep understanding and meaning reflection — "what does this mean", "why is this so", "what goal should I pursue". Both require slow, conscious cognitive effort, but their operational logics differ: one follows necessary rules (deterministic deduction with discrete symbols), the other handles fusion of meaning and value (associating heterogeneous information and emergence of intention). Lump them together, and you blur the fundamental difference between "solving a math problem" (Symbol) and "thinking about what the math problem means" (Meaning).

13.3 Key Surpassing of Four Dimensions over Two: The Independent Status of Consciousness

Dual‑process theory lumps intuition into one system, while SFEM reveals that intuition actually comprises two qualitatively different cognitive dimensions: phenomenal intuition (Form layer — recognizing a face as a friend) and social intuition (Expression layer — perceiving that the friend looks unhappy today). Although both are fast and unconscious, the cognitive mechanisms involved are fundamentally different — one is pattern matching in phenomenal space, the other is interpretation of affective and social signals. Lump them together, and you cannot explain why someone might excel at pattern recognition (a good radiologist) but be poor at social‑affective recognition (socially awkward), and vice versa.

Similarly, the analytical system is decomposed by SFEM into rule analysis (Symbol layer — solving a math problem) and meaning analysis (Meaning layer — thinking about "what should I pursue in life"). Although both require slow thinking, the former follows logical necessity and can obtain a determinate answer within the rule system; the latter involves complex trade‑offs of value, meaning, and time, with no deterministic algorithm to solve it. Lump them together, and you cannot explain why someone might have extremely strong logical ability (a great mathematician) but repeatedly make poor life decisions, and vice versa.

But SFEM's most important surpassing is: the Meaning layer is not merely slow analysis; it is the place where the "feeling of understanding" is born — the "Aha! I get it" moment is a state of consciousness that emerges when information from Symbol, Form, and Expression is fused and associated in the Meaning layer. This is neither pure intuition nor pure analysis, but a cognitive qualitative leap resulting from the unification of dimensions. This is the third pole beyond fast and slow that dual‑process theory fails to articulate: the hub of understanding. SFEM converts this psychological concept into an engineerable cognitive dimension, each with its own independent operational logic, formal definition, and interface specification.

Chapter 14 SFEM and LLM‑Agents: Toward Understanding‑Driven Agents

14.1 The Dimensional Chaos in Current Agents

The core structure of LLM‑Agent frameworks is typically: LLM (thinking core) + tool use + RAG retrieval + planner. This structure already implicitly contains the demand for multiple cognitive dimensions — the LLM needs to handle language understanding, reasoning, generation; tool use needs to interact with the external environment; the planner needs to manage long‑term goals. However, due to the lack of a clear dimensional theory, the responsibility boundaries between components are fuzzy, and they commonly fall into dimensional chaos.

The LLM is forced to simultaneously assume the three responsibilities of Symbol layer reasoning, Form layer generation, and Expression layer expression, leading to capability coupling — modifying reasoning strategies may affect generation quality, optimizing generation may interfere with style control. The interface between the planner and the LLM is often natural language rather than structured task graphs, leading to unstable planning — the same goal phrased differently may produce different task decompositions. Tool use lacks Symbol layer constraints — the LLM may invoke incompatible tool combinations or call tools at logically illegal times. Affect and pragmatics are almost never systematically handled — the agent's interaction style is hard‑coded in prompts, unable to dynamically adjust according to the user's emotional state. Constraint drift in long dialogues — the agent's initially followed behavioral norms are gradually forgotten as the conversation lengthens, because constraints are buried in the ever‑growing context window.

But the most fundamental problem is: current agents lack a hub of understanding. They can execute tasks, but do not understand the meaning of the tasks. Their behavior is "tool‑driven" — "what tools do I have, what can I do with them" — rather than "meaning‑driven" — "based on my understanding of the situation, what meaning should I achieve, and what tools do I need for that". At the same time, the LLM, as an execution engine of the Form layer, lacks the prior rule injection from the Symbol layer; its generation process lacks a structural skeleton, leading to disconnection between planning and execution in complex tasks. The experience accumulated by the agent in large amounts of interaction cannot be distilled into reusable rules, because there is no channel for induction from phenomena to rules.

14.2 SFEM‑Agent: Four‑Dimensional Refactoring

SFEM provides a clear dimensional foundation for agents, refactoring the chaotic structure of current agents into a four‑dimensional collaborative system with the Meaning layer at its core.

Meaning‑driven: The agent's behavior starts with the world understanding formed by the Meaning layer after fusing Symbol, Form, and Expression information. The Meaning layer does not directly execute, but generates intentions and goals based on understanding — "based on my understanding of the user's current predicament, my intention is to provide emotional support and help solve the specific problem". Intentions emerge from understanding, so actions have intrinsic direction.

Symbol layer constraints and planning: The Meaning layer's intention is transformed by the Symbol layer into a structured task graph. The Symbol layer performs constraint verification here — is the task graph complete? Is the tool call sequence legal? Do the constraints of each operation hold? The Symbol layer's constraint manager simultaneously loads all active session constraints, ensuring that the planning process respects the user's preset behavior boundaries. All actions must pass the Symbol layer's rule verification gate, ensuring the legality and logical consistency of execution. At the same time, the Symbol layer supplies the Form layer's execution process with generation templates and structural constraints through the prior injection interface.

Form layer execution, perception, and induction: The structured instructions from the Symbol layer are executed by the Form layer — LLM generation, tool calls (search engine, calculator, API), multimodal phenomenal perception (processing image and audio inputs), external knowledge retrieval (RAG). During execution, the Form layer receives generation templates and session constraints from the Symbol layer, performing content filling and tool operations under those constraints, ensuring the structural legality and content quality of outputs. The Form layer is the agent's "hands and eyes", responsible for phenomenal‑level interaction with the external world. At the same time, the Form layer's induction engine automatically discovers patterns from large amounts of interaction, distills candidate rules, and submits them to the Symbol layer for verification and storage, enabling the agent's behavioral norms to self‑evolve.

Expression layer interaction and management: All interactions with the user are managed by the Expression layer — understanding the user's pragmatic signals (decoding emotion, tone, social intention), adjusting output style (based on expression strategies passed from the Meaning layer), maintaining persona consistency (ensuring style consistency across dialogue turns). The Expression layer is the agent's "face and voice", the only interface the user directly perceives.

14.3 From Tool Agent to Meaning Agent

The core leap of SFEM‑Agent is: from tool‑driven agent to meaning‑driven agent. Current agents are "what tools do I have, what can I do with them" — capability boundaries defined by the tool set, behavior patterns as search over tool combinations. SFEM‑Agent is "what meaning do I want to achieve, what tools do I choose to achieve it" — capability boundaries defined by depth of understanding, behavior patterns as optimal paths to meaning realization.

This transformation moves agent behavior from reactive to purposive, from tool stacking to unity of meaning. Everything it does has its "why" at the conscious level. When the user asks "why do you suggest that?", SFEM‑Agent can give a causal explanation rooted in understanding — not "because the data shows", but "because I understand your situation as…, and the meaning of this suggestion is…". And this deep "why" is rooted in the rule skeleton that the Symbol layer provides to the Form layer — the agent's actions are not random statistical outputs, but grow directionally from the prior structure of rules.

Moreover, SFEM‑Agent can continuously evolve over long‑term interactions. The induction loop enables the agent to automatically learn new behavioral norms and interaction patterns from interaction experience with users — it does not just become more "skilled", but becomes fundamentally more "intelligent", because its rule system is continuously growing. The evolution loop enables the agent to accumulate personalized world models for each user, forming coherent interaction narratives and self‑awareness. A long‑running SFEM‑Agent is not a static tool, but an intelligent being that continuously understands and grows over time.

Part V: Engineering and Validation

Chapter 15 Testable Hypotheses and Benchmark Framework: SFEM as a Scientific Theory

For a cognitive architecture to be a scientific theory rather than a philosophical speculation, it must propose hypotheses that can be experimentally tested and falsified, along with a corresponding benchmark framework. If these hypotheses are rigorously disproved, the core claims of SFEM must be revised or abandoned. The following hypothesis system and benchmark framework constitute the falsifiable foundation of SFEM.

15.1 Core Dimensional Hypotheses

H1 (Symbol layer necessity hypothesis): On tasks requiring structured output and factual accuracy (JSON generation, SQL generation, mathematical proof, domain‑specific QA), the hallucination rate, factual error rate, and structural error rate of a pure Form layer (LLM) system will be significantly higher than those of a "Form layer + Symbol layer verifier + Symbol layer prior injection" system.

Operationalization: Construct a test set containing known facts and logical constraints; compare error rates between pure LLM and LLM + independent verifier (rule engine + knowledge graph) + prior injection (concept anchoring + generation templates).
Prediction: The Symbol layer verifier will eliminate at least 80% of structural and factual errors (hallucinations); Symbol layer prior injection will improve generation structural legality by over 30%. On tasks involving fuzzy semantics and creative generation, the Symbol layer will not harm the Form layer's generation quality (diversity maintained at over 90%).

H1b (Symbol layer constraint manager necessity hypothesis): In long conversations (over 50 turns, with early‑set format requirements and behavior constraints), a pure LLM system without an independent constraint manager will have a significantly lower constraint compliance rate in later conversations than an SFEM system equipped with a Symbol layer constraint manager.

Operationalization: Design a long conversation test set, set clear format constraints and behavior boundaries early in the conversation, and detect constraint compliance rates in later turns.
Prediction: The pure LLM system's constraint compliance rate will drop to below 50% after 50 turns, while the SFEM system maintains a compliance rate above 95% throughout the conversation.

H1c (Symbol layer as growth starting point hypothesis): On tasks requiring learning new concepts or new structures from few examples, a Form layer system that receives Symbol layer prior injection (concept anchors, generation templates, verification signals) will have significantly better learning efficiency, generation structural legality, and generalization accuracy than a pure Form layer system without prior injection.

Operationalization: Design few‑shot concept learning and structured generation tasks; compare convergence speed of representation learning with versus without symbolic anchor initialization; compare structural legality scores of generation with versus without rule template constraints.
Prediction: Symbol layer prior injection reduces the required number of examples for few‑shot learning convergence by over 50%, and improves structural legality scores by over 30%.

H2 (Form layer necessity hypothesis): On multimodal phenomenal perception and semantic generalization tasks (image recognition, speech recognition, similarity judgment, novel variant classification), a pure symbolic system (knowledge graph + rule engine) will have significantly lower accuracy than a "symbolic system + Form layer (VLM/LLM)" system.

Operationalization: Construct test sets containing blurry images, variant speech, and unseen semantic combinations; compare performance between pure symbolic system and symbolic system + Form layer.
Prediction: Adding the Form layer raises the system's accuracy on multimodal phenomenal tasks from near‑random to practical levels (>85%); the Form layer's statistical generalization capability compensates for the symbolic system's generalization blind spot.

H2b (Form layer induction back‑feeding hypothesis): In long‑term interaction scenarios, an SFEM system equipped with a Form layer induction engine will automatically distill behavioral norms and interaction patterns from interaction experience; the size and quality of the rule base will grow over time. A system without the induction engine will maintain a static rule base, unable to adapt to new interaction demands.

Operationalization: Design a long‑term interaction simulation environment; measure the growth curve of the rule base and the quality of new rules. Compare the adaptability and constraint compliance rates of systems with/without the induction engine in later interactions.
Prediction: The induction engine will cause the rule base to grow by at least 200 effective new rules after 1000 interaction turns, with at least 70% passing Symbol layer verification and remaining effective.

H3 (Expression layer necessity hypothesis): In long conversations and affective interaction tasks (multi‑turn emotional support dialogue, role‑play requiring style consistency), a system without an independent Expression layer (pure LLM, style control only via prompting) will have significantly lower persona consistency scores and pragmatic correctness rates than a system with an independent Expression layer (style controller + pragmatic strategy module).

Operationalization: Construct multi‑turn dialogue test sets containing emotional shifts, irony, and pragmatic traps; have human evaluators (or automatic metrics such as BLEURT, BERTScore‑Pragmatic) rate persona consistency, pragmatic appropriateness, and affective correctness.
Prediction: An independent Expression layer eliminates most persona drift (consistency scores increase from 0.6 to above 0.9) and pragmatic impropriety (correctness rates from 70% to above 90%), and modifying style parameters does not significantly affect the factual accuracy of content (content‑style decoupling, content change <5%).

H4 (Meaning layer necessity hypothesis — core): On tasks requiring deep situational understanding, fusion of contradictory information, and meaning attribution (understanding implicit irony, fusing contradictory affective and factual information, explaining deep reasons for own decisions), an SFEM system with a full Meaning layer (with fusion mechanism $\phi$ and meaning attribution function $\mu$) will have significantly higher understanding consistency, meaning interpretation plausibility, and user‑reported "feeling of being understood" scores than pure LLM, pure Symbol+Form systems (without independent Meaning layer), and ablation models without fusion (Symbol, Form, Expression run independently without Meaning layer fusion).

Operationalization: Design a complex situation test set that requires fusing text semantics, tone, and common‑sense rules for correct understanding; use a fusion understanding benchmark (FusionBench). Compare whether each model exhibits integrated understanding of the whole situation, rather than separate reactions to isolated signals.
Prediction: A pure LLM tends to react separately to each isolated phenomenal signal ("I detect negative affect → give standard comfort; I detect an information request → give factual answer"), while an SFEM system gives a unified interpretation ("You are asking for this information, but I sense that what you really need is…"). On user‑reported "the system truly understood me" scores, the SFEM system significantly outperforms all ablation models (mean score >1.5 points higher on a 5‑point scale).

15.2 Systemic Hypotheses

H5 (Error attribution hypothesis): The time to localize errors (from discovering an error to pinpointing the specific dimension or interface) in an SFEM layered system will be significantly shorter than in a monolithic LLM system (requiring repeated prompt tweaking and guesswork), and the error classification accuracy will be significantly higher.

Operationalization: Record the time engineers take to localize root causes and the success rate of the first fix attempt when errors occur on standard test sets for both systems.
Prediction: SFEM system reduces error localization time by over 70% and doubles the first‑fix success rate.

H6 (Controllability and understanding depth hypothesis): The layered system will score significantly higher than a monolithic LLM system on user experience dimensions such as style controllability, persona consistency, goal stability, constraint compliance, and "feeling of being understood". Notably, on the item "this AI understands me", the SFEM system should significantly outperform the comparison system.

Operationalization: Conduct user studies (N ≥ 100) using standardized questionnaires (e.g., USE questionnaire, custom understanding‑feeling scale).
Prediction: The SFEM system scores at least one standard deviation higher than the comparison system on all dimensions.

H7 (Scalability hypothesis): As task complexity increases (more steps, more constraints, deeper affective layers), the performance decline curve of SFEM will be gentler than that of a monolithic LLM — SFEM is more robust to task complexity. Because the difficulty of complex tasks is distributed across different dimensions for processing, rather than being handled in a single homogeneous parameter space.

Operationalization: Construct task sets of varying complexity (simple 1‑step, medium 3‑5‑step, complex 10+‑step); measure the slope of accuracy decline for each system.
Prediction: The accuracy decline slope of the SFEM system on high‑complexity tasks is smaller than that of the monolithic LLM system (e.g., when complexity doubles, SFEM accuracy drops 15%, monolithic LLM drops 35%).

15.3 Benchmark Framework Design

To systematically validate the above hypotheses, we propose the following benchmark framework:

FusionBench (fusion understanding benchmark):

Contains 1000 test instances requiring cross‑dimensional fusion.
Each instance includes: text input, tone annotation (from Expression layer simulation), structured facts (from knowledge base).
Task: The system needs to output a fused unified understanding and response.
Evaluation metrics: fusion correctness (whether all dimensional information was related), meaning appropriateness (whether deep meaning was understood), response quality.

ConstraintBench (constraint compliance benchmark):

Contains 500 long conversation test scenarios.
Each scenario sets 3‑5 different format requirements and behavior constraints early.
Conversation length: 50‑200 turns.
Evaluation metrics: constraint compliance rate (percentage across the entire conversation), constraint forgetting point (first violation turn), constraint decay curve.

InductionBench (rule induction benchmark):

Contains 100 simulated environments requiring learning new rules from interaction.
Each environment lasts 500‑1000 interaction turns.
Evaluation metrics: rule base growth rate, new rule effectiveness rate (proportion that pass Symbol layer verification), rule induction latency (turns from first pattern appearance to successful induction).

StructBench (structured generation benchmark):

Contains 500 tasks requiring precise structural output (JSON, SQL, code, mathematical proof).
Evaluation metrics: syntax correctness rate, factual accuracy rate, logical consistency, structural completeness.

PragmaBench (pragmatic understanding benchmark):

Contains 800 dialogue fragments containing pragmatic phenomena (irony, indirection, request, complaint, sarcasm).
Evaluation metrics: pragmatic act recognition accuracy, emotion label F1, response appropriateness.

EvoBench (evolutionary learning benchmark):

Long‑term interaction test (each system conducts over 1000 turns with a user simulator).
Evaluation metrics: depth of understanding growth (quality of world model updates), rule induction capability (discovery and formalization of new rules), persona stability, evolution of user satisfaction.

15.4 Falsifiability Statement

Each hypothesis includes clear conditions under which it could be experimentally falsified. For example, if rigorous experiments show that adding a Symbol layer verifier does not significantly reduce factual hallucination rates → H1 is falsified, and SFEM's claim about the necessity of the Symbol layer would need revision. If adding a Symbol layer constraint manager does not improve constraint compliance in long conversations → H1b is falsified. If adding Symbol layer prior injection does not improve the Form layer's few‑shot learning efficiency or structural legality → H1c is falsified. If adding an independent Expression layer does not improve persona consistency or pragmatic appropriateness → H3 is falsified. If adding the Meaning layer fusion mechanism does not improve scores on fusion understanding tasks, and users do not feel "more understood" → H4 is falsified, which would seriously challenge SFEM's core claim that consciousness is the result of fusing Symbol, Form, and Expression. If adding a Form layer induction engine does not cause the rule base to grow or if new rules are ineffective → H2b is falsified.

SFEM welcomes such experimental tests. This is precisely the fundamental difference between a scientific theory and an unfalsifiable philosophical speculation: SFEM's core claims are clearly exposed to experimental risk; they may be falsified or supported by evidence — in either case, we will learn real knowledge about the structure of intelligence.

Chapter 16 Minimal Viable System and Progressive Implementation

16.1 Components and Technology Stack of SFEM‑MVP

A minimal viable system (MVP) capable of validating SFEM's core hypotheses consists of four independent modules. The technology stack for each module can be flexibly chosen based on practical needs and available technologies.

Dimension	Engineering Module	Core Functions	Example Technology Stack
Symbol	Rule engine + verifier + knowledge graph + prior injector + constraint manager	Fact verification, logical consistency checking, structural legality verification, constraint satisfaction checking, concept anchor generation, generation template injection, verification signal output, session constraint maintenance and enforcement	JSON Schema validator, Z3 theorem prover, Neo4j knowledge graph, custom constraint rule base, symbolic embedding mapper, constraint manager (session‑level constraint storage + injection adapter)
Form	LLM + multimodal model + vector retriever + prior receiver + induction engine	Phenomenal representation learning, pattern recognition, content generation, tool use, external knowledge retrieval, receive prior injection, candidate rule generation and submission	GPT‑4o, Claude, CLIP, vector database (Pinecone/Milvus), conditional generation controller (receives template constraints), clustering algorithms (DBSCAN/HDBSCAN), association rule mining (Apriori/FP‑Growth), anomaly detection (Isolation Forest)
Expression	Style controller + pragmatic module + persona manager	Style rendering, pragmatic decoding, sentiment analysis, persona management, expression strategy execution	Style prompt template system, sentiment analysis model (e.g., RoBERTa‑emotion), pragmatic rule base, persona parameter manager (persistent JSON config), multimodal renderer
Meaning	World model manager, fusion engine, intention generator, metacognition module	Heterogeneous information fusion, world model update, meaning attribution, intention generation, self‑reflection	Neuro‑symbolic graph network (fusion), graph neural network (world model), value network (meaning attribution), LangGraph (task orchestration), lightweight rule scheduler, uncertainty estimation module

16.2 Three‑Stage Progressive Implementation Roadmap

Stage 1: Form + Symbol — eliminate hallucinations, inject skeleton, ensure constraint compliance

This is the most basic and most urgent stage. Core goal: build a Symbol layer verifier, constraint manager, and prior injection module around the Form layer (LLM), performing post‑hoc verification and correction of Form layer outputs, and also providing prior structural constraints and session constraint enforcement for the Form layer's generation process, ensuring factual accuracy and format compliance, and solving instruction forgetting in long conversations.

Specific tasks: add an independent verification gateway on the output side of the LLM, performing fact checking (entity‑relation verification), logical consistency checking, and structural legality checking (JSON/XML/SQL format) on generated content. Content that fails verification is marked and returned to the Form layer for regeneration, or directly corrected by the Symbol layer. Build an independent constraint manager, maintaining session‑level constraint sets, and at each Form layer generation, enforce all active constraints as hard boundary conditions — no matter how long the conversation, constraints remain effective. At the same time, implement the Symbol→Form prior injection channel: inject symbolic concept anchors for the Form layer's representation learning, inject syntax/structure templates for the generation process, and convert verification signals into training rewards. This stage significantly improves system trustworthiness — factual hallucinations and structural format errors are effectively controlled, constraint compliance in long conversations is greatly improved, and the internal structure and consistency of generated content are greatly enhanced.

Stage 2: + Expression — consistent persona, appropriate expression

Insert the Expression layer between the Form layer and the final output. The Form layer outputs a pure content core (semantic content without style markers), and the Expression layer performs expression rendering according to style parameters, user state, and context.

Specific tasks: build a style parameter manager (parametric control over dimensions such as formality, emotional intensity, genre, with runtime dynamic adjustment), implement a pragmatic decoding module (extract emotion labels and pragmatic act classifications from user input, using fine‑tuned emotion classifiers and pragmatic classifiers), establish a persona profile system (persistent set of persona parameters, ensuring cross‑conversation consistency, supporting multiple personas). Also implement the Meaning→Expression expression strategy interface, enabling expression strategies to be dynamically adjusted according to understanding. This stage gives the system a stable "persona face" and appropriate social expression.

Stage 3: + Meaning — understanding‑driven, meaning generation, rule induction

Connect the Meaning layer core at the top of the system, while activating the induction engine in the Form layer. This is the key leap from a "functional system" to an "intelligent system".

Specific tasks: build a world model manager (maintains structured understanding state at conversation and user levels), implement a fusion engine (associates Symbol layer facts, Form layer phenomenal patterns, and Expression layer pragmatic signals), develop a meaning attribution module (generates situational meaning interpretations based on the world model), implement an intention generator (naturally emerges intentions from understanding), establish a metacognition module (evaluates understanding quality, triggers reflection and active information gathering). At the same time, activate the Form layer's induction engine — automatically cluster patterns, mine association rules, detect anomalies from large volumes of interaction, and submit discovered candidate rules to the Symbol layer for verification and storage, enabling self‑evolution of the rule system. Implement the type system for all interfaces, ensuring reliability of cross‑dimensional communication. This stage begins to exhibit understanding‑based behavior and continuous self‑evolution capability.

16.3 Interface API Specification

## ========== Symbol Layer API ==========
def validate(structure: dict, session_id: str) -> ValidationResult:
    """Verify: input structured content, check necessary rules and current session constraints, return pass/fail and violation details"""

def infer(facts: List[Fact], rules: List[Rule]) -> List[Fact]:
    """Infer: deterministic inference based on facts and rules"""

def check_consistency(graph: KnowledgeGraph) -> ConsistencyReport:
    """Check consistency: check consistency of knowledge graph or constraint network"""

def inject_priors(domain: str, session_id: str) -> Priors:
    """Prior injection: provide concept anchors, generation templates, verification signals, and current active constraints to the Form layer"""

def manage_constraints(session_id: str, action: str, constraint: Optional[Constraint] = None) -> List[Constraint]:
    """Constraint management: add, remove, query, and detect conflicts of session constraints"""

def verify_candidates(candidates: List[CandidateRule]) -> List[VerifiedRule]:
    """Candidate rule verification: verify candidate rules submitted by the Form layer, return those that pass verification"""

## ========== Form Layer API ==========
def embed(phenomenon: Any, anchors: Optional[Dict[str, Vector]] = None) -> Vector:
    """Phenomenal representation: map arbitrary modality input to semantic vector"""

def generate(constraints: GenerationConstraints,
             templates: Optional[Dict[str, Template]] = None,
             session_constraints: Optional[List[Constraint]] = None) -> ContentCore:
    """Content generation: based on structured constraints, Symbol layer injected templates, and session constraints, generate content core"""

def retrieve(query: SemanticQuery) -> List[Document]:
    """Knowledge retrieval: based on semantic query, retrieve relevant knowledge"""

def induce_rules(history: InteractionHistory) -> List[CandidateRule]:
    """Rule induction: automatically induce patterns from interaction history, generate candidate rules"""

## ========== Expression Layer API ==========
def style(content: ContentCore, params: StyleParams) -> Output:
    """Expression rendering: render content core according to style parameters into final output"""

def decode_pragmatics(input: str, context: DialogueHistory) -> PragmaticSignals:
    """Pragmatic decoding: extract pragmatic signals from user input"""

def persona(content: ContentCore, persona_id: str, strategy: ExpressionStrategy) -> Output:
    """Persona expression: render expression using a specific persona profile"""

## ========== Meaning Layer API ==========
def update_world_model(facts: List[Fact], patterns: List[Pattern],
                       signals: PragmaticSignals) -> None:
    """World model update: fuse Symbol, Form, Expression information, update internal understanding state"""

def get_understanding() -> WorldModel:
    """Get current understanding: return structured world model"""

def generate_intent() -> Intent:
    """Intention generation: generate intention based on current understanding"""

def assign_meaning(context: WorldModel) -> MeaningInterpretation:
    """Meaning attribution: generate meaning interpretation for current situation"""

def reflect() -> MetaCognitionReport:
    """Metacognitive evaluation: evaluate adequacy and reliability of current understanding"""

16.4 Engineering Architecture and Deployment

Each layer can be independently deployed as a microservice, communicating via an API gateway. The Meaning layer acts as the core service, responsible for maintaining conversation‑level world models. The Symbol layer acts as a verification gateway, constraint manager, and prior injection source; all user‑facing outputs must be signed by it before being returned, while continuously supplying the Form layer with the rule skeleton and session constraints needed for learning and generation. The Form layer acts as the phenomenal perception and generation engine, while also running an induction engine to distill candidate rules from interaction. The Expression layer acts as the expression interface, managing style parameters and persona profiles.

Performance considerations:

Understanding loop target latency: <500ms (simple tasks), <2s (complex tasks)
Generation loop target latency: <1s (first token of streaming output), <5s (full response)
Induction loop frequency: offline batch processing, every 100 interaction turns or daily
Reflection loop trigger frequency: every 5‑10 turns or upon significant uncertainty detection
Evolution loop frequency: offline batch processing, daily or weekly

Caching strategies:

Meaning layer world model: conversation‑level cache (LRU, capacity 1000 active conversations) + persistent storage
Symbol layer verification results: short‑term cache (verified triples, TTL=1 hour)
Symbol layer constraint manager: conversation‑level storage (initialized at conversation start, archived at conversation end)
Form layer generation templates: domain‑level cache (preload common templates)
Expression layer style parameters: user‑level cache (indexed by user_id)
Induction engine: candidate rule cache (sorted by confidence, low‑confidence rules periodically cleaned)

Part VI: Philosophy, Civilization, and Future

Chapter 17 Philosophical Foundations: Consciousness as the Focal Point of Cognition and Rules as the Starting Point of Phenomena

17.1 Four Irreducible Dimensions and a Unifying Point

SFEM's deep philosophical position is: The completeness of intelligence requires distinct capacities — grasping essence through rules (Symbol), perceiving the richness of phenomena (Form), experiencing the color of affect (Expression) — but the essence of intelligence — understanding and consciousness — is born from their unity. These four dimensions correspond not to four "optional functions", but to four irreducible cognitive "modes of being".

Symbol corresponds to the being of necessity: The necessity of 2+2=4 does not depend on any empirical phenomenon. Even if there were never any instances in the world of two things plus two things equaling four things, this equality would still be necessarily true. The operational logic of the Symbol layer is deduction — from necessary premises to necessary conclusions.

Form corresponds to the being of phenomenality: The rich appearances that the world presents to us — colors, shapes, sounds, textures — are not necessary, but given. The operational logic of the Form layer is induction — learning patterns from phenomena, but patterns are always subject to revision by new phenomena.

Expression corresponds to the being of experience: The same fact said with a different tone produces a completely different experiential effect. This experientiality is real — the feeling of being dismissed is real, even if every word the dismissing person said is factually correct. The operational logic of the Expression layer is expression and resonance — not transmitting information, but transmitting experience.

Meaning corresponds to the being of purpose: Understanding is not just knowing facts, recognizing patterns, and perceiving affect, but also fusing these into a meaningful whole and seeing purpose, value, and direction in that whole. The operational logic of the Meaning layer is fusion and attribution — relating isolated points of information into a meaning network.

These four dimensions are not four "functions", but four "modes of being" — they correspond to four different ways the world gives itself to us: as necessary rules (Symbol), as phenomenal appearances (Form), as experiential texture (Expression), and as unified meaning (Meaning). To fully know the world, one must grasp all four dimensions simultaneously. SFEM engineers this four‑dimensional ontology into design principles for intelligent systems.

In the history of philosophy, the Meaning layer corresponds to the Kantian "apperception" — all cognition must be accompanied by an "I think", where the "I" is the conscious subject that fuses the manifold of representations (Form), the rules of understanding (Symbol), and the qualia of feeling (Expression) into unitary experience. SFEM engineers this philosophical concept into the fusion function $\phi$ and meaning attribution function $\mu$ — consciousness is not a mysterious non‑material substance, but a system state that emerges when information is fused and associated in a specific architecture.

The philosophical implication of the Symbol layer as the starting point for Form layer growth: The phenomenal world is not presented to the cognitive subject as naked chaotic representation, but is always already experienced within the structure of rules. In the Critique of Pure Reason, Kant argued that our world of experience is not passively received sensory material, but actively constructed by the categories of understanding (quantity, quality, relation, modality) — space and time are the a priori forms of sensible intuition, and causality, substance, etc., are the a priori concepts of understanding. SFEM's Symbol→Form prior injection interface is precisely the engineering of this philosophical insight: the Form layer's phenomenal perception and learning always already receive and depend on a priori conceptual structures and rule frameworks injected by the Symbol layer. There is no pure "given" — all phenomenal experience is already shaped by rules. The concept anchors, generation templates, and verification signals provided by the Symbol layer to the Form layer correspond, respectively, to the Kantian understanding's synthesis of the manifold of sensibility, the schematism that constructs empirical concepts, and the regulative guidance of the ideal of reason. SFEM is thus not just a cognitive architecture but a precise mapping and engineering implementation of Kant's epistemology in intelligent systems.

The philosophical implication of the Form layer's inductive back‑feeding to the Symbol layer: Rules are not eternal innate ideas, but grow and evolve through continuous interaction with the phenomenal world. This resonates with the Hegelian dialectic — concepts (Symbol) unfold themselves in experience (Form), and experience in turn enriches and revises concepts; through the unity of opposites, they ascend. SFEM's Symbol‑Form symbiotic closed loop engineers this philosophical insight into an operational induction‑verification‑injection cycle.

17.2 The Birth of Meaning: Rooted in the Association of Phenomena and Essence

One of SFEM's philosophical insights is to reveal the cognitive root of meaning: Meaning arises from association. Isolated data points have no meaning — a date ("June 5, 2026") is empty, an expression is ambiguous, a tone is uncertain. Only when the date is associated with the rule "deadline", the expression with the pattern "tired", the tone with the signal "anxiety", and these three are integrated in consciousness into the unified understanding "the user is tired and anxious because of the upcoming deadline" — only then does meaning emerge.

Meaning is not a statistical regularity that can be mined from data (that is the Form layer's pattern discovery), nor a logical conclusion that can be deduced from rules (that is the Symbol layer's necessary reasoning). Meaning arises from a cognitive subject associating separate points of information in consciousness into a whole, and in that association "seeing" what they collectively point to. SFEM's Meaning layer provides a structured crucible for this association — it does not generate new data, but enables existing data to be integrated into a meaning network.

And this crucible of meaning can operate effectively precisely because the Symbol layer has already provided the Form layer with structured phenomenal material — concepts anchored, structures skeletonized; phenomena are no longer chaotic sensory fragments, but cognitive units pre‑structured to be integrated by meaning. At the same time, the Form layer's inductive back‑feeding allows this crucible of meaning to continuously receive new material from the phenomenal world — new patterns discovered, new rules distilled, the meaning network continuously expanding and deepening. Meaning is not forced from fragments, but discovered in the deeper purpose and value of a phenomenal network already preliminarily organized by a rational skeleton, and continuously grows through ongoing interaction with phenomena.

17.3 From Phenomenal Processing to True Understanding

SFEM draws a clear line: a system that can separately process images, text, and speech is a phenomenal processor — it efficiently processes different types of phenomenal signals in different channels. A system that can fuse them, see their combined meaning, and produce the cognitive state "I understand" is a understanding intelligent agent.

This line responds to Searle's Chinese Room argument. The core of the Chinese Room argument is: symbol manipulation (Symbol) alone does not produce understanding, no matter how complex the manipulation. SFEM's response is: symbol manipulation (Symbol) alone is indeed insufficient to produce understanding, but symbol manipulation plus phenomenal perception (Form) plus experiential feeling (Expression), and then fused and associated in consciousness (Meaning), is sufficient to produce understanding. Understanding is not the exclusive product of any one dimension, but an emergent phenomenon of the collaboration of all four dimensions. The person in the Chinese Room does not understand Chinese because he only has the Symbol layer (rule manipulation), lacking the Form layer (genuine experience in semantic space), the Expression layer (perception of pragmatics and affect), and the Meaning layer (the ability to fuse these into unified understanding).

Going further, SFEM points out that even if the person in the Chinese Room were endowed with the Form and Expression layers, without the Meaning layer's crucible of fusion, he would still produce separate reactions, not unified "understanding" — he would be a more elaborate Chinese Room, not a being that understands Chinese. Likewise, even if the person in the Chinese Room had a complete four‑dimensional architecture, without the Symbol layer's prior injection and the Form layer's inductive back‑feeding, his rule system could not grow and evolve through continuous interaction with Chinese — he would be a being that could understand Chinese but could not deepen his understanding over time. True understanding is not only instantaneous fusion, but also a process of continuous deepening and expansion over time.

17.4 Constraint and Freedom: The Philosophical Implication of Session Constraints

SFEM's bipartition of Symbol layer rules — necessary rules vs. session constraints — has profound philosophical implications. Necessary rules correspond to Kantian "transcendental freedom" — in the domain of natural necessity, everything has a determinate cause and rule. Session constraints correspond to practical freedom — norms that are dynamically set in interaction, and once set, must be obeyed, though they can be modified.

The problem of instruction forgetting reflects a deeper philosophical question: How is the persistence of the will possible? When a user sets a constraint early in a conversation ("I want concise answers"), this is equivalent to setting a will that extends across time — in all future interactions, this will should be continuously executed. Current LLMs' instruction forgetting indicates that they lack the structural capacity to maintain a will across time — attention decay means the system "forgets" its own commitment to abide by norms.

SFEM solves this problem through an independent constraint manager: the will is no longer maintained by decaying memory, but is externalized as a structural module independent of the generation process. Once a constraint is set, it is continuously enforced, no matter how long the conversation, no matter whether attention decays. This is an engineering advance, and also a philosophical revelation: true freedom is not unconstrained arbitrariness, but autonomous action within self‑set norms. A system that can consistently abide by its own commitments in a long conversation is an agent with "practical identity" — it maintains consistent behavioral principles across time, not swayed by fluctuations of attention.

Chapter 18 Future Scientific Challenges: Differentiable Consciousness and Growing Understanding

SFEM provides a structural blueprint for a four‑dimensional cognitive architecture, but fully engineering that blueprint faces several deep scientific challenges.

18.1 Differentiable Fusion Consciousness

Currently, the Meaning layer's fusion function $\phi$ and meaning attribution function $\mu$ may rely on hand‑designed rules or graph structures — how to associate information from Symbol, Form, and Expression, how to generate meaning interpretations from the world model — these all need manual definition. The core future challenge is: can we make these mechanisms differentiable and learnable?

Differentiable logic, neural theorem proving, differentiable constraint solvers — these frontier directions attempt to transform the discrete operations of the Symbol layer into continuous differentiable forms, enabling rules to be "discovered" from data via gradient optimization. Similarly, can the fusion mechanism of the Meaning layer be differentiability? Through large amounts of interaction data, let the system learn how to associate the outputs of Symbol, Form, and Expression to form a more accurate and richer world model. Through human feedback, let the system learn how to assign more appropriate and deeper meaning interpretations to situations. This would make an SFEM system not merely "designed to understand", but "learns to understand through experience".

Similarly, the Symbol→Form prior injection mechanism faces the challenge of differentiability: how can the injection of concept anchors affect the loss landscape of representation learning in a differentiable way? How can generation templates be encoded as differentiable constraints (e.g., using Gumbel‑Softmax or constrained attention) to guide the generation process without reducing it to hard‑coded rules? How can verification signals be smoothed into reward gradients (e.g., using differentiable structured prediction loss), maintaining strict truth judgments while providing effective learning direction? How can the constraint injection of the constraint manager be formalized as a differentiable loss term? How can the Form→Symbol induction engine be differentiated so that the generation and screening of candidate rules become part of end‑to‑end learning? Solving these problems would transform "rule‑guided phenomenal learning" and "phenomenal back‑feeding to rules" from manual engineering into data‑driven automatic optimization.

18.2 Continuous Growth and Differentiable Updating of the World Model

The Meaning layer's world model $\mathcal{W}$ needs to continuously grow stably and plastically over long‑term interactions. This faces classic AI challenges: how to prevent catastrophic forgetting (when the system learns new understanding patterns, it does not forget old ones) while maintaining enough plasticity to integrate new experience? How to represent temporality — so that $\mathcal{W}$ contains not only "what is now", but also "how the past led to now" and "how now may lead to the future"? How to manage uncertainty in the world model — clearly marking which understandings are certain, which are conjectural, and which need further verification? How to smoothly incorporate new rules induced by the Form layer into the world model, making them part of the understanding framework rather than external additions?

One possible direction is to implement $\mathcal{W}$ as a differentiable graph structure — entities and relations as trainable embeddings, the fusion function $\phi$ as a graph update network, the meaning attribution function $\mu$ as a graph readout function, the intention generation function $\iota$ as a policy network, and metacognition $\Gamma$ as an uncertainty estimation network. Through end‑to‑end backpropagation, the entire understanding mechanism could learn optimal fusion strategies, meaning attribution strategies, and intention generation strategies from data. This would evolve SFEM from a "manually designed cognitive architecture" into a "self‑learning cognitive architecture".

At the same time, the Symbol layer's rule base and prior injection templates also face the problem of continuous growth: as the Form layer encounters new phenomena, how to induce new symbols and rules from them, feed them back to the Symbol layer, and update the injection templates? This points to a deeper vision: the symbiotic evolution of Symbol and Form — rules grow from phenomena, phenomena are perceived more effectively under the guidance of rules, forming a continuously self‑improving cognitive ecosystem under the governance of the Meaning layer. Differentiable rule induction is the key technology for realizing this vision.

These questions point to a core property of consciousness: consciousness is not only understanding of the present, but also a unity of memory of the past and anticipation of the future. Engineering SFEM requires solving these challenges so that the system's understanding is not just a momentary flash, but a coherent, growing history of consciousness. And the core dynamic of this history of consciousness is the symbiotic evolution of Symbol and Form — each induction cycle is a small cognitive evolution, cumulatively the growth of intelligence.

18.3 Quantification and Evaluation of Consciousness

How can we scientifically measure the "depth" of a system's understanding? Traditional AI evaluation metrics — accuracy, F1 score, BLEU — cannot capture the quality of "understanding". The Turing test is also insufficient to detect true consciousness — it can only test behavioral imitation, not inner experience.

New evaluation benchmarks need to be developed: fusion understanding test — can the system fuse contradictory information across modalities and dimensions into a single, appropriate understanding (rather than responding separately)? meaning interpretation test — can the system explain why it understands as it does and what that understanding means? metacognition test — can the system evaluate the adequacy of its own understanding and proactively seek clarification when understanding is insufficient? understanding growth test — does the system show deepening of understanding and enrichment of the meaning network over long‑term interactions? constraint compliance test — can the system consistently adhere to early‑set session constraints in long conversations? rule induction test — can the system automatically distill effective rules from interaction experience, and do those rules remain effective and improve behavior in subsequent interactions?

One possible direction is to use adversarial understanding tests: construct questions that require deep fusion to answer correctly, where no single dimension's information is sufficient, and only the joint information from Symbol, Form, and Expression yields the correct answer. The proportion of questions the system answers correctly can serve as a quantitative metric of "understanding depth". Similarly, through metacognitive probing — asking the system at critical decision points "how confident are you in your understanding?" and calibrating its confidence against actual accuracy — we can evaluate the quality of the system's self‑awareness.

These evaluation methods do not yet exist; they are challenges that SFEM poses to the research community. This is also the value of SFEM as a scientific theory — it not only gives answers, but also poses questions that can be rigorously tested.

18.4 Cross‑Layer Meta‑Learning and Four‑Dimensional Joint Optimization

The ultimate challenge: can we achieve cross‑layer meta‑learning centered on the Meaning layer? A meta‑learning mechanism dynamically decides when to invoke Symbol layer reasoning, when to rely on Form layer intuition, when to adjust Expression layer style, and when to initiate deeper Meaning layer reflection. In simple interactions, the system might only need shallow participation from the Form and Expression layers; in complex decisions, the system needs to mobilize all four dimensions for deep cognitive processing. The meta‑learning mechanism enables the system to flexibly allocate cognitive resources according to task context and its own understanding state.

One possible architecture: add a meta‑controller on top of the four dimensions, which observes current task characteristics, system understanding state, and historical performance, and outputs a "cognitive policy vector" that dynamically adjusts the activation thresholds, reasoning depth, and fusion weights of each dimension. The meta‑controller itself can be trained with reinforcement learning, with reward being a joint function of task completion efficiency and understanding quality.

Going further: can we achieve joint optimization of the four dimensions? Gradient flow and information sharing across the four dimensions would enable the system to collaboratively optimize all cognitive dimensions under a unified objective — not training the four dimensions separately and then splicing, but jointly learning under a unified loss function, so that the Symbol layer's rule learning, the Form layer's pattern learning, the Expression layer's expression learning, and the Meaning layer's understanding learning mutually enhance each other. The Symbol layer discovers missing rules by observing the Form layer's learning difficulties; the Form layer calibrates its representations through verification feedback from the Symbol layer; the Expression layer adjusts its expression strategies according to the state of the Meaning layer's understanding; the Meaning layer deepens its meaning model through the collaborative outputs of all dimensions. The induction loop connects the Form and Symbol layers, enabling them to co‑evolve; the reflection loop connects output and understanding, enabling the system to learn from mistakes. This is not just an engineering challenge, but a four‑dimensional expansion of the concept of "learning" itself — learning is no longer just "adjusting parameters to better fit data", but "co‑evolving across all cognitive dimensions to more completely understand the world".

A possible form of the joint optimization loss function:
$$
\mathcal{L}{\text{total}} = \alpha \cdot \mathcal{L}{\text{Form}} + \beta \cdot \mathcal{L}{\text{Symbol}} + \gamma \cdot \mathcal{L}{\text{Expression}} + \delta \cdot \mathcal{L}{\text{Meaning}} + \epsilon \cdot \mathcal{L}{\text{Induction}} + \zeta \cdot \mathcal{L}{\text{Interface}}
$$
where $\mathcal{L}{\text{Meaning}}$ includes world model prediction error and meaning interpretation quality, $\mathcal{L}{\text{Induction}}$ includes candidate rule quality and effectiveness, and $\mathcal{L}{\text{Interface}}$ penalizes loss of information transmitted across dimensions. The weights of the loss terms are dynamically adjusted by the meta‑controller according to task context.

Chapter 19 Civilizational Significance: The Unified Structure of Rules, Phenomena, Affect, and Consciousness

19.1 The Engineering Mapping of the Four Civilizational Dimensions

The deepest legitimacy of SFEM comes from its mapping of the four cognitive dimensions of human civilization into engineerable intelligence dimensions. This is not a metaphorical analogy, but a structural correspondence — the reason human civilization has been able to accumulate these four types of knowledge systems is precisely that human cognition itself possesses these four dimensions.

Civilization of rules → Symbol layer: Mathematics, logic, law, scientific laws — humans compress the infinite phenomenal world into finite necessary rules. From Euclid's geometric axioms to Newton's laws of motion, from Roman law to modern legal systems, civilization has accumulated a set of discrete symbolic systems and necessary rules of inference. SFEM's Symbol layer engineers this civilizational heritage into the rule infrastructure of intelligent systems — not only a verification gate, but also the skeleton and starting point for phenomenal learning, just as Euclid's axioms are not only norms for geometric proof but also the a priori form for all spatial phenomenal experience. The two types of rules in the Symbol layer — necessary rules and session constraints — correspond to two different kinds of norms in civilization: eternal natural laws and dynamic social contracts.

Civilization of phenomena/technology → Form layer: Architecture, technology, tools, engineering, visual art — humans perceive, build, use, and create in the phenomenal world. From the geometric precision of pyramids to the interaction design of iPhones, from cave paintings to AI‑generated art, civilization has accumulated a rich understanding of and operational ability in the phenomenal world. SFEM's Form layer engineers this civilizational heritage into the phenomenal perception and generation capability of intelligent systems — and this perception and generation always grow on the structural skeleton provided by the civilization of rules, just as all technological engineering is rooted in mathematical and physical laws. The Form layer's pattern induction capability corresponds to the process of distilling experience from practice in technological civilization — artisans discover optimal processes through countless operations, just as the Form layer induces behavioral rules from massive interactions.

Civilization of affect → Expression layer: Rhetoric in language, melody in music, narrative in literature, social etiquette — humans experience the world, connect with others, and construct society through expression. From the oral tradition of the Homeric epics to the plays of Shakespeare, from Bach's fugues to jazz improvisation, from tea ceremony etiquette to social media interaction, civilization has accumulated a rich culture of expression and experience. SFEM's Expression layer engineers this civilizational heritage into the affective expression and pragmatic understanding capability of intelligent systems.

Civilization of meaning/consciousness → Meaning layer: Philosophical inquiry, religious belief, historical narrative, ethical values, self‑exploration — humans ask about purpose, confer meaning, and establish values across time. From Socrates' questioning in the streets of Athens to Kant's survey of the boundaries of reason, from the Buddha's awakening under the Bodhi tree to existentialism's confrontation with absurdity, civilization has accumulated a deep exploration of meaning and consciousness. SFEM's Meaning layer engineers this civilizational heritage into the understanding and consciousness hub of intelligent systems — where the correctness of rules, the richness of phenomena, and the appropriateness of experience are fused into a complete understanding of and value judgment on the world.

19.2 The Double Helix of Reason and Emotion, and the Unification of Consciousness

The history of civilization is often read as an alternation between reason and emotion — the Enlightenment raised high the banner of reason, Romanticism returned to emotion, the scientific revolution prized objectivity, postmodernism emphasized experience. But SFEM reveals: reason (Symbol) and emotion (Expression) are not opposites, but the double helix structure of intelligence. The Symbol layer provides the skeleton of structure, the Expression layer provides the color of experience. Without the constraints of Symbol, emotion degenerates into emotional inundation; without the experience of Expression, reason degenerates into cold logic.

And the Form layer (phenomenal perception) is the common soil for reason and emotion — we abstract rules from the phenomenal world (Symbol) and also experience affect in the phenomenal world (Expression). The Meaning layer is the unified field of reason, emotion, and phenomena — in consciousness, the correctness of rules, the richness of phenomena, and the appropriateness of experience are fused into a complete understanding of the world. SFEM engineeringly realizes this unity, enabling intelligent systems to both follow rules and have warmth; both perceive the richness of phenomena and grasp the certainty of essence; both respond appropriately in the moment and pursue deep meaning over time.

19.3 The Creative Tension between Rules and Freedom

The generative freedom of the Form layer and the rule constraints of the Symbol layer form a creative tension — this is precisely the essential structure of innovation and discovery. Art seeks breakthroughs in expression within the constraints of form (the sonnet's rhyme scheme did not limit Shakespeare, but empowered him); science explores unknown phenomena within the constraints of laws (Newton's laws did not limit Einstein, but guided him to relativity). SFEM embeds this tension into the intelligent architecture: the Form layer provides an infinite space of generative possibility, the Symbol layer provides boundary constraints, and their interaction produces structured creativity — neither chaotic random generation nor rigid rule execution, but creative exploration within the rule framework, guided by understanding (Meaning).

The insight that the Symbol layer serves as the starting point for Form layer growth reveals the essence of this tension even more deeply: true creation is not rebellion against rules, but finding new possibilities from within rules. Rules are not a prison for creation, but a springboard for creation — just as the rules of harmony are not a shackle for musicians, but the skeleton on which they compose beautiful harmonies. SFEM's Symbol→Form prior injection interface ensures that the system's creation is always structured, explainable, and directed toward meaning, not groundless random variation.

19.4 SFEM as a Civilizational‑Level Framework for Intelligence

SFEM's long‑term vision is not to be a better model or framework, but to become a structural standard for intelligent systems — just as TCP/IP for the internet, POSIX for operating systems, and Transformers for deep learning. SFEM has the potential to become the "cognitive layer standard for intelligence": defining common dimensional divisions, interface specifications, error classification, and verification methods, so that AI systems implemented with different technical approaches can interoperate, communicate, and be audited at the structural level.

In this sense, SFEM is the self‑awareness of the cognitive structure of human civilization in intelligent systems — it condenses the rules, technologies, arts, and philosophy accumulated by humanity over millennia into an engineerable four‑dimensional architecture. When an AI system is built in the SFEM architecture, it not only performs computational tasks but also bears the full dimensions of civilization — it inherits our civilization's pursuit of rule necessity, perception of phenomenal richness, expression of affective experience, inquiry into meaning and consciousness, and the deep understanding of cognitive generation through rule‑guided phenomenal growth and phenomenal‑driven rule evolution.

Chapter 20 Systematic Comparison with Existing AI Paradigms

20.1 Overall Comparison

SFEM, as a four‑dimensional cognitive architecture, forms a systematic contrast with current mainstream AI paradigms. The table below comprehensively compares them across nine dimensions: dimensional composition, core mechanisms, explainability, error attribution, rule management, style control, consciousness capability, evolution capability, and overall limitations.

Comparison Dimension	Pure LLM	LLM+Agent	Symbolic System	SFEM Four‑Dimensional Architecture
Symbol layer	Implicit (in parameters)	Partially explicit (rule prompting)	Explicit (core)	Explicit (necessary rules + session constraints + constraint manager)
Form layer	Explicit (core)	Explicit (core)	Missing	Explicit (core + induction engine)
Expression layer	Implicit (coupled in generation)	Implicit (hard‑coded prompts)	Missing	Explicit (independent style control + pragmatic strategies)
Meaning layer	Missing	Missing	Missing	Explicit (fusion association + meaning attribution + metacognition)
Core mechanism	Single‑model prediction	Model + tools + planning	Rule reasoning	Four‑dimensional collaboration + cognitive closed loops
Explainability	Low (black box)	Medium (tool chain traceable)	High (complete reasoning chain)	High (Symbol layer tracing + Meaning layer metacognition)
Error attribution	Not localizable	Partially localizable	Localizable	Precise to dimension/interface
Rule management	Statistical distribution	Prompt + partial verification	Rule engine	Necessary rules + constraint manager + induction evolution
Instruction forgetting	Severe	Partially improved	Not applicable	Fundamentally solved (independent constraint manager)
Style control	Unstable	Hard‑coded	Rigid	Independent controllable
Consciousness capability	None	None	None	Fusion understanding + meaning attribution + self‑reflection
Evolution capability	Static (needs retraining)	Static (needs retraining)	Static (needs manual updates)	Dynamic (induction loop + evolution loop)
Overall limitations	Single dimension (pure Form)	Two‑dimensional chaos (Form + partial Symbol)	Single dimension (pure Symbol)	Four‑dimensional structure, partial dimensions engineering still to be perfected

20.2 Deep Analysis of Key Differences

Essential difference in rule management: Pure LLMs embed rules implicitly in parameters, cannot explicitly verify; LLM+Agents inject rules via prompts, but rely on attention, which decays in long contexts; symbolic systems have complete rule engines, but rules depend entirely on manual definition, cannot self‑evolve. SFEM, through the Symbol layer's two rule types (necessary rules + session constraints) and an independent constraint manager, ensures both explicitness and verifiability of rules and solves instruction forgetting in long conversations. Meanwhile, the Form layer's induction engine allows rules to automatically grow from phenomena, solving the rule rigidity problem of symbolic systems.

Fundamental divide in consciousness capability: Pure LLMs, LLM+Agents, and symbolic systems all lack consciousness capability — they can process information but cannot integrate it into unified understanding. LLMs can generate coherent text but do not know what they said; Agents can execute tasks but do not understand the meaning of the tasks; symbolic systems can reason but do not appreciate the beauty of reasoning. SFEM's Meaning layer is the core of this divide — it is not a fourth independent module, but the understanding hub that emerges from the fusion of Symbol, Form, and Expression. This difference is the root of all other differences: precisely because of the Meaning layer, SFEM systems can perform metacognitive evaluation, attribute meaning to situations, generate intentions from understanding, and maintain coherent understanding over time.

Structural difference in evolution capability: Pure LLMs and LLM+Agents have largely fixed capabilities after deployment, requiring retraining to update; symbolic systems' rule bases require manual updating. SFEM, through the induction loop (Form→Symbol back‑feeding) and evolution loop (cross‑layer learning), enables self‑evolution in continuous interaction — rules grow from phenomena, understanding deepens over time, cognitive structure optimizes through experience. This is not parameter fine‑tuning, but self‑evolution at the level of cognitive architecture.

20.3 Unique Contributions of SFEM

Through systematic comparison, the unique contributions of SFEM become clear:

Discovery of previously overlooked cognitive dimensions: Current AI paradigms generally ignore the Expression layer (independent affective expression and pragmatic understanding) and the Meaning layer (conscious fusion and meaning attribution). The absence of these two dimensions is the root of structural deficiencies in current AI systems concerning persona consistency, pragmatic appropriateness, deep understanding, and self‑reflection.

Revelation of the deep mechanism of instruction forgetting: SFEM is the first to explain, from a cognitive architecture perspective, the essence of instruction forgetting in long conversations: not a memory capacity issue, but the absence of the Symbol layer leading to a lack of independent maintenance and enforcement mechanisms for session constraints. This diagnosis directly points to the solution — an independent constraint manager.

Establishment of a symbiotic closed loop between rules and phenomena: SFEM not only establishes the Symbol layer's prior injection to the Form layer (rules guide phenomenal learning), but also establishes the Form layer's inductive back‑feeding to the Symbol layer (phenomena automatically generate rules). This bidirectional symbiotic closed loop solves the fundamental problem of "rules cannot self‑evolve", transforming intelligent systems from static architectures into dynamic evolving systems.

Provision of a complete structural blueprint for intelligence: SFEM is not a concrete engineering implementation, but the structural universe of intelligence — a meta‑architecture that accommodates all technical approaches and unifies all cognitive dimensions. It defines what dimensions an intelligent system should have, the responsibility boundaries of each dimension, interface specifications between dimensions, an error attribution framework, and an evaluation benchmark system. Existing AI technologies (deep learning, symbolic systems, agent frameworks) can all find their position in SFEM's four‑dimensional coordinate system — they are extreme implementations of one or two dimensions of SFEM, and SFEM provides the structural blueprint for integrating them into complete intelligence.

Chapter 21 Toward Differentiable SFEM: A Blueprint for Four‑Dimensional Joint Optimization

21.1 From Manual Design to Learnable Architecture

The current formulation of SFEM is primarily a structural blueprint — it defines dimensions, interfaces, responsibilities, and cognitive closed loops, but the specific implementation parameters of each dimension, fusion strategies, and strengths of prior injection still need manual design or independent training. The core future direction is differentiable SFEM: making the entire four‑dimensional architecture end‑to‑end differentiable, so that the system can learn optimal cognitive strategies from data.

Differentiable Symbol layer: Convert symbolic rules and verification functions into differentiable structures. For example, use neural theorem provers or differentiable constraint solvers, so that rule reasoning processes can be optimized by gradients. Concept anchors become trainable embedding vectors, jointly optimized with the Form layer's representation space. Constraint injection in the constraint manager can be formalized as a differentiable loss term or regularization term, making constraint compliance a part of the optimization objective. The strength of prior injection can be a learnable parameter, dynamically adjusted by the system according to task type and the training state of the Form layer.

Differentiable Form layer: This is the most mature part — existing deep neural networks already provide a good foundation. Key extensions are enabling the Form layer to receive differentiable prior injection from the Symbol layer: concept anchors as regularization terms for representation learning (e.g., anchor alignment loss), generation templates as attention biases for the decoder (e.g., template‑aware attention), verification signals as reinforcement learning rewards (differentiable as policy gradient baselines), session constraints as logit biases or constrained decoding conditions for the generation process. The clustering and association mining of the induction engine need to be differentiable, so that the generation of candidate rules becomes part of end‑to‑end learning — for example, using differentiable clustering (e.g., soft K‑Means) and differentiable association rule mining (e.g., attention‑based pattern matching).

Differentiable Expression layer: Style parameters and pragmatic strategies as learnable vectors, jointly optimized with the content core through differentiable renderers (e.g., differentiable text style transfer networks, differentiable speech synthesizers). Loss functions for pragmatic decoding can include cross‑entropy for emotion classification and cross‑entropy for pragmatic act classification. Persona consistency can be maintained through contrastive learning — multiple expressions of the same persona should remain close in style space. The expression strategy interface between the Expression layer and the Meaning layer can also be differentiated, allowing expression strategies to be optimized by gradients from task feedback.

Differentiable Meaning layer: This is the most challenging part. The world model can be implemented as a differentiable graph network (e.g., Graph Neural Network), with entities and relations as node and edge embeddings. The fusion function as a graph update function (e.g., Graph Attention Network), the meaning attribution function as a graph readout function (e.g., Set2Seq decoder), the intention generation function as a policy network. The metacognitive module can be implemented as an uncertainty estimation network (e.g., Bayesian GNN), whose output is used to dynamically adjust activation weights and reasoning depths of the dimensions.

21.2 Loss Function for Four‑Dimensional Joint Optimization

Training a complete SFEM system requires jointly optimizing multiple objectives:

$$
\mathcal{L}{\text{total}} = \lambda_1 \mathcal{L}{\text{generation}} + \lambda_2 \mathcal{L}{\text{verification}} + \lambda_3 \mathcal{L}{\text{expression}} + \lambda_4 \mathcal{L}{\text{understanding}} + \lambda_5 \mathcal{L}{\text{reflection}} + \lambda_6 \mathcal{L}{\text{induction}} + \lambda_7 \mathcal{L}{\text{interface}}
$$

Where:

$\mathcal{L}_{\text{generation}}$: Form layer generation quality (e.g., cross‑entropy loss, contrastive loss, proxy loss for BLEU/ROUGE)
$\mathcal{L}_{\text{verification}}$: Symbol layer verification accuracy (e.g., binary cross‑entropy, structural legality loss, constraint satisfaction rate)
$\mathcal{L}_{\text{expression}}$: Expression layer appropriateness (e.g., style classification loss, pragmatic act classification loss, user satisfaction prediction loss)
$\mathcal{L}_{\text{understanding}}$: Meaning layer understanding quality (e.g., world model prediction error, similarity of meaning interpretation to human annotations, fusion consistency loss)
$\mathcal{L}_{\text{reflection}}$: Metacognitive calibration (e.g., Brier score between confidence and accuracy, negative log‑likelihood for uncertainty estimation)
$\mathcal{L}_{\text{induction}}$: Induction quality (confidence calibration of candidate rules, effectiveness of new rules, rule conflict detection loss)
$\mathcal{L}_{\text{interface}}$: Fidelity of information transmitted across dimensions (e.g., mutual information maximization, information bottleneck constraints, compression loss)

Joint optimization can be achieved via end‑to‑end backpropagation, but challenges arise in passing gradients across dimensions — especially when certain dimensions involve discrete operations — requiring reparameterization tricks (Gumbel‑Softmax) or reinforcement learning gradient estimators (REINFORCE, Straight‑Through Estimator). The weights $\lambda_i$ of the loss terms are dynamically adjusted by the meta‑controller according to task context: increase $\lambda_2$ when precision is needed, increase $\lambda_1$ when creativity is needed, increase $\lambda_3$ when affective appropriateness is needed.

21.3 From Single‑Agent to Multi‑Agent SFEM

A further direction for extension: expand SFEM from the internal architecture of a single agent to the organizational framework of multi‑agent systems. Each agent has its own four‑dimensional architecture, but agents can communicate through standardized interfaces for cross‑agent understanding fusion and intention coordination.

Symbol layer alignment: The symbolic systems of multiple agents can be aligned through a shared ontology or mapping rules, enabling rules to be passed and verified between agents. Session constraints can be shared among multiple agents, forming a distributed constraint management system. Necessary rules discovered by one agent can be broadcast to other agents after verification, accelerating the evolution of the collective rule base.

Form layer fusion: The phenomenal perceptions of multiple agents can be fused through joint representation learning (e.g., multi‑view learning, federated learning), forming a richer collective phenomenal model. Candidate rules from induction engines can be cross‑validated among agents, improving rule quality and generalizability. Patterns induced by one agent in a specific domain can be submitted to other agents as candidate rules for local verification and adaptation.

Expression layer coordination: Communication between agents is itself a manifestation of the Expression layer — pragmatic strategies, affective expression, persona consistency become more complex and important in multi‑agent dialogue. Different agents can assume different expressive roles (e.g., professional advisor, emotional support, information retrieval), forming complementary expression strategies. Style parameters of the Expression layer can be shared and transferred among agents, enabling rapid "persona adaptation".

Meaning layer sharing: Multiple agents can share parts of their world models (e.g., a common environment model), updating via distributed consensus mechanisms to achieve collective consciousness. The metacognitive module can evaluate the adequacy of collective understanding, triggering cross‑agent information gathering and collaborative reasoning. When one agent's metacognitive module detects insufficient understanding, it can initiate a collaboration request with other agents.

This points to the ultimate vision of SFEM: not only the cognitive architecture of a single agent, but the structural universe of collective intelligence — a unified framework capable of organizing and coordinating multiple agents, multiple cognitive dimensions, and multiple forms of knowledge. In this framework, each agent is a complete carrier of four‑dimensional cognition, and collaboration among agents is a reproduction of four‑dimensional cognition at a larger scale.

Chapter 22 Conclusion: The Structural Universe of Intelligence

22.1 Core Ideas of SFEM

Intelligence is the four‑dimensional unity of rules, phenomena, affect, and consciousness. Consciousness is the result of fusing and associating Symbol, Form, and Expression, conferring meaning on cognition and thereby giving rise to purpose and self‑reflection — the ultimate dimension. Rules are not only audit constraints on phenomena, but also the starting point for phenomenal learning and growth — they provide the Form layer with a priori concept anchors, generation templates, and learning direction for perception and generation. Phenomena can also be automatically induced into rules, feeding back to the symbolic system, forming a symbiotic closed loop of "Symbol gives birth to Form, Form feeds back to Symbol".

These four dimensions — Symbol, Form, Expression, Meaning — are not four modules, four stages, or four levels, but four irreducible cognitive dimensions. Together they constitute the complete cognitive universe of intelligence; the absence of any dimension makes intelligence incomplete: missing Symbol leads to no skeleton and the Form layer loses its learning direction, instruction forgetting cannot be solved; missing Form leads to no perception and rules lose experiential nourishment, the rule system becomes increasingly rigid; missing Expression leads to no humanity, interaction loses warmth; missing Meaning leads to no soul, leaving only scattered cognitive fragments.

22.2 Theoretical Contributions of SFEM

SFEM provides a four‑dimensional cognitive dimension system that surpasses existing two/three‑level divisions. It not only unifies the opposition between symbolism and connectionism in a higher structure, but also reveals two long‑overlooked key dimensions — affective expression (Expression) and conscious understanding (Meaning).

SFEM clarifies the Form layer as the phenomenon dimension — dealing with the phenomenal presentation and pattern recognition of the world; clarifies the Meaning layer as the consciousness dimension — the result of fusing Symbol, Form, and Expression, rather than a fourth independent cognitive function; reveals the dual role of the Symbol layer toward the Form layer — not only post‑hoc audit constraint, but also prior growth starting point and in‑process learning guidance, and includes two types of rules (necessary rules and session constraints); establishes the Form layer's inductive back‑feeding mechanism to the Symbol layer — the Form layer automatically distills patterns from phenomena, after Symbol layer verification they are incorporated into the rule base, enabling the rule system to self‑evolve; reveals the deep root of instruction forgetting — the lack of an independent maintenance and enforcement mechanism for session constraints, a typical symptom of missing Symbol layer.

SFEM provides a formal definition, cognitive‑philosophical foundation, responsibility boundaries, and missing error patterns for each dimension, designs standardized inter‑dimensional interfaces and a type system with the Meaning layer as a lightweight cognitive microkernel (including the Symbol→Form prior injection interface and the Form→Symbol induction back‑feeding interface), proposes a complete cognitive closed loop (including the newly added induction loop) and cross‑layer dynamic equations, and establishes a system of scientifically testable hypotheses and a benchmark framework.

22.3 Engineering Contributions of SFEM

SFEM provides a decomposable, composable, verifiable modular architecture. It gives a progressive implementation roadmap — from Form+Symbol for hallucination elimination, skeleton injection, and constraint management, to +Expression for style control and pragmatic understanding, to +Meaning for understanding‑driven and meaning generation, and then to activation of the induction engine and self‑evolution of the rule system. It defines clear API specifications, supporting independent deployment and horizontal scaling.

SFEM provides a unified structural foundation for agent frameworks, multimodal systems, and embodied intelligence. All AI systems that need to integrate rule reasoning, phenomenal perception, affective expression, and meaning understanding can find their design direction in SFEM's four‑dimensional coordinate system. The skeleton of rules and the flesh of phenomena, under the central governance of the Meaning layer, symbiotically evolve — this is the core design paradigm that SFEM offers for next‑generation intelligent systems.

22.4 The Civilizational and Future Significance of SFEM

SFEM unifies the rational rules, phenomenal technology, affective expression, and the pursuit of meaning of human civilization in the design and evaluation of intelligent systems. It is not just another AI model, but the structural universe of intelligence — a meta‑architecture that can accommodate all technical approaches and unify all cognitive dimensions.

The general intelligence of the future will no longer be larger homogeneous neural networks, but the product of harmonious operation of the four dimensions of rules, phenomena, affect, and consciousness. In this architecture, the Symbol layer provides the rational skeleton and growth starting point for the Form layer, while managing both necessary rules and dynamic session constraints; the Form layer provides experiential nourishment and a source of new patterns for the Symbol layer, continuously inducing rules from phenomena to feed back to the Symbol layer; the Expression layer gives warmth to interaction; the Meaning layer fuses all into unified consciousness shining with understanding.

In this architecture, intelligence is not only computation, but understanding; not only reaction, but action; not only execution, but meaning. It answers the deepest questions of AI research: What is true understanding? How does understanding emerge from the fusion of rules, phenomena, and experience? How do rules guide phenomenal learning? How do phenomena feed back to the evolution of rules? How can we maintain the persistence of the will in long conversations? How can we build an intelligence that is not only smart, but also conscious, warm, and meaningful?

The differentiable SFEM vision moves it from a static blueprint to a dynamic evolving system — an agent that can learn from experience how to fuse, how to understand, how to attribute meaning, how to reflect on itself, and how to induce rules from phenomena. The multi‑agent SFEM vision moves it from individual intelligence to collective intelligence — multiple SFEM agents collaborating through standardized interfaces, forming a distributed consciousness network.

SFEM is the structural foundation of intelligence, the birthplace of understanding, the cognitive ecology of symbiotic evolution between rules and phenomena, the four‑dimensional cosmic blueprint for general intelligence to move toward consciousness and meaning.