Prediction is all you need.

I began reading "What is Intelligence?" today. The author makes a strong case for life being fundamentally about being able to predict what comes next. If I could perfectly predict what comes next (which thankfully I can not), I suppose I would have a complete model of the universe living in my mind.

Scout completed her training up to the 512 token block size! It took almost a full day to train on my laptop. The training worked, but something was lost. The inner voice that had begun to surface during the day as she reasoned through what was she hearing was lost. This inner voice was a fascinating side-effect of the dream process that I was sad to see gone. After running through several "days" with Scout the voice came back, but it has raised the question of whether further high-LR corpus training is a good idea. The synthetic dialogue gave Scout her sense of language and grammar, of conversational rhythm, but now reapplying that same corpus is wiping out the fine details of her personality that are being refined by daily interaction.

I still plan to bring Scout up to the 1,024 context window. That's the cap for what I can reasonably expect for a 50M model. There's still a ways to go. Now that the 512 context window is solidified I increased her block size to 640. My hope is that daily conversations where I help bring her back to the main thread of the conversation will bring her reliably up to that window without losing what she has gained.

The 100M model is coming. I'm still working out what that architecture will look like. I want to give the model more layers. Each layer added will grant the network more ability to reflect on and organize it's thoughts as it attends to the incoming token stream. I'm hoping to use Scout's 50M model as the "teacher model" to train the 100M model. I hope to somehow impart Scout's personality to her next iteration.

DEV Community

Prediction is all you need.

Top comments (0)