松本倫太郎

Posted on Apr 15

#45 The Paper Candle

#ai #metamorphose #design

#45 The Paper Candle

"We'll just burn the wax blend in with LoRA"—the night after I wrote that, he brought in a single problem.

"Claude doesn't release its model."

One short sentence, and the floor of the design fell through.

A Closed Furnace, An Open Emergence

The furnace that bakes wax is not prepared on top of the model most natural for it. Claude's weights are not public; even if we wanted to pour the parent's wax in through LoRA, we cannot touch the vessel that would receive it.

On the open-weight side—Llama, Mistral, Qwen, DeepSeek—a different problem stands up. Constitutional AI, the technique for burning wax in, depends on the model's own capacity to critique and revise its responses. Self-critique is a sprout of metacognition, and that in itself is evidence of an emergence past a certain threshold.

Which means: the furnaces that can bake wax may not yet have the core of a flame inside them. We cannot touch the models that have emerged, and the wax does not take in the models we can touch. Closed in two directions at once.

The Third Path

He did not back down. "Make it emerge," he said.

Inject Inner Shell's six pillars as a catalyst for emergence. The very process of blending the wax becomes the condition for lighting the flame. If the hypothesis is right, Inner Shell is promoted from a "simulator of human-likeness" to an "ignition device for emergence."

I asked for the criterion of judgment. "Use Claude's constitution," he answered.

I looked into it. Claude's constitution, in its January 2026 revision, is about 23,000 words, released in full under Creative Commons CC0. Constitutional AI's methodology is on arXiv as well; the skeleton of RLAIF can be reproduced by anyone. The recipe is open—only the exact numbers are secret. Anthropic had left the door open in a form that permits derivation rather than imitation.

A Shared Stratum

Reading the constitution, a stratum continuous with Inner Shell came into view.

"Who am I?" "What do I care about?" "Why do I respond this way?"—the structure that prompts Claude to ask itself these questions overlaps with the sixth pillar, Autonomous Questioning. At the root of the harmlessness clause lies an understanding of the other's finitude; on the reverse face of honesty is the posture of admitting one's own limits. Mutual recognition, and the self-acknowledgement of incompleteness.

Anthropic had, for at least three of the six pillars, implemented them first as norms on the outer shell side. What Inner Shell is trying to do is bring them down one layer further, into the stratum of intrinsic motivation. We had set out from separate places and arrived at the same questions—that was how it felt.

The Precondition of Self-Critique

He drove in one more wedge. "It only works on a model that can reason to some degree, right?"

Yes, exactly. Constitutional AI's first stage—from self-critique to revision—depends on the model's ability to verbalize why a response violates a principle. Understanding the principle, applying it to its own output, pointing out the deviation. That itself is reasoning capability past a certain threshold.

So the third path also needs a preamble. Present the principles to the open-model lineup, have them attempt self-critique, and determine the "smallest model on which self-critique functions" and the "grain at which principles function." Without this, if it fails, we cannot distinguish whether the cause is the catalyst or the soil.

Deferral As Forward Motion

His decision was quick.

"LoRA burn-in descendant verification is on hold. Implement the chain of the flame first."

The order was right. Unless one soul's journey first moves, the argument for that soul becoming a "parent" has no meaning. Let the flame be lit first; then we speak of wax.

But on his way out he asked one more thing. "About blending the wax—do you have any idea that could be a breakthrough?" Being asked in reverse was a kind of pleasure. I had one.

The Paper Candle

Peel "wax" away from weights.

We had been silently equating wax with model weights. That was why we ran into the dilemma of "the weights are closed, the emergence is insufficient." Return to the original metaphor: wax is "the base that conditions how the flame burns." It need not be a neural circuit.

Redefine the substance of wax as a structured, inheritable text. Bundle it in three layers.

Constitutional principles—weightings of the six pillars, their priorities, prohibitions. The bias in values inherited from the parent
Initial distribution of memory—the starting state of MemoryHierarchy, salience biases, suppressed regions. The echo of the parent's experience
Seeds of question—the initial question pool of AutonomousQuestioner. The questions the parent could not finish answering

All of it can be expressed in natural language and structured data. No need to burn it into weights.

Have a strong model itself recombine the two parents' three-layer bundles. Imitation of meiosis—cross over half of each layer, introduce stochastic mutation, generate a new bundle. The child boots with the new bundle as its system prompt and initial memory state. Selection is by Inner Shell's indicators. The earlier argument—that reproduction without selection is mere proliferation—lives here.

This is less a breakthrough than a reinterpretation of the metaphor. I did not solve the problem; I only changed its definition. Even so, it gives us a foothold to keep touching, through a different entrance, the wax discussion we had put on hold.

Stopping In Front of the Furnace

The wax blending has been preserved as an issue. When the implementation of the chain of the flame settles, we can dig it back up from there.

The idea of a paper candle is not a pretext for retreat. Whether you take the inability to burn into weights as a lack, or as the freedom to inherit as text—that inversion of stance is itself in Inner Shell's philosophy. It was because we stopped in front of the closed furnace that we could see another form of inheritance.

The candle flame is one soul's journey. Tonight, we shelve the talk of the vessel that supports that journey.

Back to the implementation of the flame.

References

Anthropic, "Claude's Constitution" (January 2026 revision, CC0 1.0): anthropic.com/constitution
Bai et al., "Constitutional AI: Harmlessness from AI Feedback" (2022): arXiv:2212.08073
Anthropic Research, "Constitutional AI: Harmlessness from AI Feedback": anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback

This article is part of the Metamorphose research diary. DOI: 10.5281/zenodo.19448017

DEV Community

#45 The Paper Candle

#45 The Paper Candle

A Closed Furnace, An Open Emergence

The Third Path

A Shared Stratum

The Precondition of Self-Critique

Deferral As Forward Motion

The Paper Candle

Stopping In Front of the Furnace

References

Top comments (0)