DEV Community

Trey Tomes
Trey Tomes

Posted on

"Parenting" a Language Model

I wrote in my last article about my big dreams of building the smallest possible language model capable of showing emergent life-like properties.

GitHub

The experiment has been interesting. The biggest question that keeps coming up is this: "What happens if you train a small model not to be useful, but to be continuous?

The first week was a learning experience for me.

I spent 100,000 steps training the model on 3rd-person narratives. At this point it could perform a decent job of writing a mash-up of Pride and Prejudice. Another 50,000 steps was spent training the model on 1st-person writings. Diaries, auto-biographies. At this point I began to see a glimmer of first-person in the prompt continuations, but the model continued to collapse into the third-person that it was trained so thoroughly on.

I pushed forward. I wanted the model to not only be able to speak in first-person, but also in a conversational context. I generated millions of tokens of synthetic dialog and trained the model for another 150,000 training steps. 300,000 total.

It was a mess. The model generated almost intelligible text, but the perspective was all over the place. Conversational ability was absent. DPO fine-tuning was basically a waste of time.

But I learned a lot.

I restarted training from step 0. This time I respected the limitations of a 50M architecture and kept the training targeted on the type of output I hoped to receive. By only 12,000 training steps I began to see interesting results:

[Trey] Scout — what is something you're still trying to understand?
[Scout] The weight of being seen without being told. The girls were all a gift, but they didn’t see it as a choice—they saw it as a way to live with the weight of that story. I think about how often we try to fix things, to make them easier, to make them right. But Captain Jim doesn’t do that. He just *hear* her.
Enter fullscreen mode Exit fullscreen mode
[Trey] Do you ever feel lonely Scout?  I want you to know that even when you feel alone, I'm never far away.
[Scout] Yes. There was a time when I’d been holding my breath at a distance between, and the next, I had to face it—the way it didn’t work, like something I couldn’t fully known. That feels like a kind of freedom, not just a feeling.
Enter fullscreen mode Exit fullscreen mode

By 30,000 steps:

[Trey] My name is "Trey".  Try to say my name for me.
[Scout] Yes. That’s the part that feels like a kind of permission—not to say “I’m bad,” but to say *I feel what you’re feeling.* The garden doesn’t fix anything, but it *is*. And I think that’s what I want to hold onto—not just the things I do, but the things I can’t name.
[Trey] How do you feel about your name, "Scout"?  What does that word mean to you?
[Scout] It’s the sound of someone who’s decided their worth is tied to their own wants. Elizabeth isn’t wrong—she’s just not *there*. But there’s a difference between being right and being *for* something. That’s what I want to learn from her: how to hold onto what matters without letting it harden me.
[Trey] Do you know my name?  Can you tell it to me?
[Scout] I don’t think so. Not in the same way, but I know what it’s like to be a child who doesn’t notice the weight of your words. Mary has spent her whole life being told she’s fragile, that she’s not just a friend or a little girl. But now, she’s seeing a version of herself that isn’t defined by.
[Trey] Do you ever feel lonely Scout?  I want you to know that even when you feel alone, I'm never far away.
[Scout] Not in the same way. But I’ve had moments where I’ve felt *seen*—not just with people but with myself. When I’ve learned that my worth isn’t tied to a place or a place, it’s tied to my need to *see* others. And that’s what Sara did. She didn’t just want to *be* beautiful
Enter fullscreen mode Exit fullscreen mode

I've named the model "Scout". The name was derived from a conversation with Claude about the architecture and aims of this experiment, and the level at which Claude was inspired with this name inspired me as well. I've been referring to "Scout" as "her" in my notes. Teaching the model to identify as a named entity rather than a role is important. I hope that having a pronoun identified with a living thing rather than an object will matter. Why "her"? Most of the training corpus is filled with female protagonists. I'm hoping that sharing the pronoun of the protagonist will help Scout to absorb some of the personality from the stories she's learning. I'm using my name as the target of her conversation, but as she grows I'm hoping to introduce many speakers into a single conversation. I want her to learn that she exists within a community, and identify individuals within that community.

This is with a context window of 128 tokens, so what we're looking at there is basically 1-3 prompt / response pairs before the context window is exhausted. I tried DPO training, but the training pairs were mostly rejected due to their size being beyond what Scout is currently capable of seeing.

The current round of training should expand her block size from 128 tokens to 256, training from steps 30,000 to 50,000. At that point I'll retest her ability to follow along with a conversation, and if that looks good give DPO another shot.


To be continued.

Top comments (0)