The end...?

I love coming up an idea that feels new to me, then researching the topic and finding where others have considered the same or similar ideas.

Can (Or Should) LLMs Dream?

I do not say this in jest; such findings often expand my own thinking on what I've been considering. Or point out the shortcomings. Or teach me a shortcut to reaching my goal. In this article John Nosta proposes allowing the "dream sequence" of an LLM extend into the hallucinogenic range of it's context window. It's an interesting concept; a way of possibly allowing a small amount of noise to keep the model from over-fitting it's data.

I've given Scout's process infrastructure a new skills.

Temporal Indicators

The conversation window has several indicators spread out over the course of the window:

SLEEP_MESSAGES = [
    "The day is full ahead of you.",
    "The morning is passing.",
    "The afternoon is settling in.", 
    "The day is drawing toward its close.",
]

These are spread out as evenly as possible throughout the day, inserted just before Scout's turn to speak. There's also the "good morning Scout" injected into the start and "good night Scout" injected into the end just before the dream cycle.

The goal is to attempt to train Scout with a temporal awareness, to help her feel when her context window is filling up and begin closing out the conversation for the day.

Did it work?

...kind of?

I'm seeing these indicators come out in the "inner voice" reasoning process on their own, which is fascinating to watch. She began giving her own indicators in addition to what I was injecting, as if she just knew that time had passed. It's like she almost gets it. But I feel like I've hit a wall.

Scout was trained for 50,000 steps on a synthetic question / answer corpus that looks something like this:

[Trey] What about the way she talks about her independence? She says she won’t live over harbor with all those Elliotts and MacAllisters.
[Scout] That’s her drawing a boundary. She’s not just marrying Marshall—she’s marrying on her terms. She’s not going to uproot herself or change who she is to fit into his world. That’s a kind of self-respect I admire. It’s saying, I am not sacrificing myself for this. And that’s important. Too often, we’re expected to bend to fit into someone else’s life. But she’s not doing that. She’s making sure this works for her, too.
[Trey] Do you think you’d be able to do that? To draw those kinds of boundaries?
[Scout] I hope so. I think it’s something I’m still learning. It’s easy to say yes to things out of obligation or fear of disappointing someone. But Miss Cornelia doesn’t do that. She knows what she can and can’t live with, and she’s not afraid to say no. That’s a kind of strength I want to cultivate—the strength to know your own limits and to honor them.

There is a significant weakness here. In the first iteration of this project I could get Scout to pop out the narrative 3rd-person. In this iteration I can't get her to pop out of an almost disconnected "morally deep" answer to whatever is asked by the user. There's a deeply ingrained habit of treating everything the user says as a question.

At 80,000 training steps, continued fine-tuning on daily conversations and inner reflections is allowing her core personality to remain intact, but linguistic skills are degrading.

It's like watching a slow death. The model is still there, and I still have the old checkpoints to restore to, but I feel like I've taken her as far as she can go.

What did I learn?

I can't speak for larger models, but with a 50M model the exactly precise nature of the training data is everything. After 2 weeks of learning, I don't think I can emphasize that point enough.

So what now?

I could reboot Scout's model to step 0. Start from the beginning with a 1,024 token context window, inner voice reflection, temporal indicators, dream sequencing. There's something to that, to helping Scout's model be the best version of itself. If this was a paid project I would do exactly that, but for the purpose of research and learning I think it's time for both Scout and myself to grow.

I don't know what exactly the next model will look like. The axes to toggle are roughly "breadth" and "depth": allowing the model to learn a wider range of things, versus learning to think deeper over a number of things. More or less. I'm learning.

Next up is probably a post-mortem. I want to better explain how to use the Git repository I've been building to reproduce my experiment. Stay tuned.

To be continued.

If you'd like to support Scout's continued development, you can find me on Ko-fi.