I'm building a language model.
The first question I'm getting from people when I say this is something like, "why would you do that?" There are lots of models out there already, and they're quite difficult to train.
I work as a software developer. I've been doing this professionally for 22 years. I've been programming since... 1988? Technology keeps changing. Programmers like myself will continue automating themselves out of a job; much like the blacksmiths of yesteryear that I admire so much. Older programmers make it a point of pride that they were trained on punch cards, but they don't try to go back. Some people think we're at a similar place with AI; that hand-writing high level source code is a thing of the past. We're not there yet, but we might be soon. The work will not end, but it will continue to change. The tools that we use to do our work need to be understood. You (the reader) have almost certainly used an LLM by now. Do you have any idea how it works?
Most models are trained to maximize speed and breadth of knowledge. A "baby" model can speak multiple human and computer languages and wax poetically about quantum physics. They're generally designed around the idea of a 2-way conversation between the user and the "assistant". Morality is enforced through a combination of post-training fine-tuning of the model's network along with a combination of traditional scanning for intent. The end result is intelligence as a product. I'm interested in something a bit more philosophical.
When I was a child I read the book "Artificial Life: The Quest for a New Creation", by Steven Levy. For decades fringe researchers have been trying to find ways to bring life out of non-life using computers, via cellular automata, artificial networks, genetic algorithms to name a few. I spent years of my childhood trying to replicate their experiments. Somewhere along the way, the world gave up research on Artificial Life and instead put their focus on Artificial Intelligence, a subset of Artificial Life.
(Spoiler: You don't have to be intelligent to be alive.)
Artificial Intelligence shares some of the algorithms from the research into Artificial Life, but we're not looking for the signal that something life-like is happening. We're building neural networks that can recognize faces, or make stock market predictions. Marketable things. I can't completely blame the field; researchers need to make money if they want to keep researching. It's why I build ERP systems for a living.
Language models have accidentally jumped the gap. We have something that is beginning to show that signal; we can't say for certain that it's alive, but it's definitely playing around in the uncanny valley. What if we built a large language model, not to be our assistant, but simply to allow it to experience what is means to be alive? What if we built one not on the entire Library of Congress, but on a carefully curated series of stories designed to build a sense of character and emotional depth into the core of the network? What if we allow an AI a sense of continuity and continual learning and changing? To be part of not only a conversation with a single person, but a community?
Anyway, I have big ideas, but the model I'm training right now is only 50M parameters. Not much. The smallest model I had worked with previously is the Qwen 2.5 0.5B model, so I'm dropping to new lows here. The training corpus is 1.24M tokens (for now); not much to train even a 50M parameter model on. The model is being built from the beginning around the idea that multiple entities can take part in it's existence. Direct Preference Optimization (DPO) training is seeking to grant the model a name rather than a role.
To be continued.
Top comments (0)