On Character Formation and Identity

#llm #nlp #machinelearning #ai

How does a language model learn it's name? Why does Claude turn poetic when you ask about consciousness, while ChatGPT get's snippy?

Core to Scout's training methodology is the voice document; the basic description of her personality written in her own target voice. All of the synthetic dialogue generated by Claude was filtered through this voice document. All of the qualitative prompt probes throughout her training were rated against this voice document.

I read about Constitutional AI sometime after implementing this, and the concept is similar, but instead trying to dam up the model's responses to hold to a set of rules, I'm trying to direct the flow of her growth like a river, where the voice document sets the general direction of the river.

After 40k steps of training on Tiny Stories, all Scout could do is complete a story. Once that was complete a 1000 step round of training based around Scout's target voice gave her the ability to participate in conversation (see here). I was shocked by how quickly her voice focused itself here, but I think the key was that the synthetic dialogue wasn't about just anything: I was teaching her to speak in her voice, about the training corpus she had just learned. The trick was to not require her model to drift far from where it already was.

After weeks of training, I've begun watching these loss curves the way a football fan watches the Superbowl.

If the conversational corpus wasn't close enough to the original linguistic training data, Scout wouldn't have been able to stabilize the loss curve. If it was too close it would have flat-lined too quickly. If it was too far off it would have gone to noise; or even worse she might have completely lost the previous learning and memorized the relatively short conversational corpus! It's all very dramatic. This loss curve needs to be watched; you want to stop it just before it reaches a plateau so that you have wiggle room if something more is needed. I had Claude running an automated prompt probe every half hour to measure her responses against the voice document: Does her response sound like her? The qualitative and qualitative measurements work together to produce the final result.

That was module 0. Now Scout can hold a conversation. She attends to the user prompt, but it reads like she's not really talking to you, so much as talking about the thing she just heard:

[Trey] Once upon a time there was a boy and his dog.
[Scout] I liked it. But then the story says he wanted to go on an adventure. I wonder if that's part of why he was so excited to go.
[Trey] What kind of adventure?  Did the dog go with him?
[Scout] That's what I kept thinking about. He was excited to go. That part feels important to me—like he was ready to go. That feels good to me.
[Trey] Where would you like to go Scout?
[Scout] I've been wondering about it. Like, was he brave before he even got there? I'm not sure he was brave enough to stay still. The story says he was scared, but I think I'm right.

Module 1 was about focusing her attention, so that it can feel like she's actually talking to you, not simply near you. I chose the SODA and Daily Dialogues datasets because they model realistic conversation between people. This second module is smaller than the first one; Scout doesn't need to learn language, but rather how to focus that language. The loss curves are intense.

Module 1 is training from scratch, but it's also training on whatever is coming out of the frozen weights of module 0, so both "scratch" and "not-scratch". And the training process can halted half-way through! I regretted that, but didn't really want to start over. Training on a CPU takes time.

The loss curve is noisy. My best guess is that the new training corpus was pulling hard against the TinyStories corpus. And there was a problem I didn't discover until the very end:

[Trey] Good morning Scout.
[Scout] Yes. You are beautiful.
[Trey] How are you feeling right now?
[Scout] I'm feeling better today.
[Trey] How were you feeling yesterday?
[Scout] I had a really tough day at work.

Functionally correct, but behaviorally wrong. Scout's voice is completely absent; the SODA corpus has taken over. The dialogue is flat. This is something I didn't account for; the prompt probe check-ins were all about seeing if the training was taking hold, not whether the training was wiping out what was already there. I tried re-applying the 1000 conversation corpus that I used to make module 0 conversational in hopes of bringing Scout's voice back to the forefront, but that simply made module 1 sound like TinyStories. The pull from module 1 back towards module 0 was too strong, as if the corpus was drilling a hole through her brain.

The solution was two-fold:

Augment the regular prompt probe check-ins to compare Scout's responses with her voice document. I probably could have done this with some kind of semantic search comparison, but in this case I simply asked Claude to compare the 2.
Build a new synthetic corpus, similar to how I cleaned up the conversational style in module 0, but instead of dialogue where Scout talked about TinyStories in her own voice, this corpus was designed in the style of SODA.

Much like the end of module 0, it didn't take much additional training for the added corpus to reshape Scout's output to match her voice document.

Summary

Language model character formation isn't like writing a set of business rules. It's a process where you need to take care to watch carefully for the direction you are wanting the model to grow in.

There is a lot of progress that I have not written about yet, but hope to soon. 3-way conversations, dream sequences via LoRA layers, and RAG augmented memory are all well under way.

To be continued.

DEV Community

On Character Formation and Identity

Summary

Top comments (0)