About a year ago I wrote about training a Pong paddle to move on its own. NEAT — NeuroEvolution of Augmenting Topologies. Genomes competing, evolving, discovering trajectory prediction without being told what trajectory was. I watched it work for a few minutes and moved on.
I didn't know that was the first stop on anything.
Last weekend I was four phases into a CartPole RL project when I looked back at the year and saw it. PyPongAI wasn't the beginning of a Pong project. It was the beginning of a pattern I hadn't named yet.
After Pong came TurboShells. Twenty-trait creature genetics in Rust with Python bindings via PyO3. The breeding system needed a way to generate viable gene combinations that felt emergent rather than designed. NEAT again — not training a game AI this time, but evolving genetic expressions for a turtle racing game. Same algorithm, completely different problem. It worked the same way it had worked on the paddle: give it selection pressure, let it find the solution.
Then rpgCore Asteroids. Ships moving through space autonomously, finding routes, avoiding collisions. The autopilot instinct — give the system agency, watch what it does. Same question I'd been asking since the paddle first moved.
Then CartPole last weekend. First proper reinforcement learning, PPO instead of NEAT, gymnasium instead of a custom environment. The agent hit 500/500 reward in 39 seconds. Perfect score. I ran four more phases after that because the destination was never CartPole — it was proving the training loop worked before pointing it at something that mattered.
The something that matters is EIC Auto. Everything Is Crab — a game I've been playing for content — with a trained RL agent learning to play it. Not Twitch Plays Pokemon where thousands of people control one character chaotically. One model, one game, trained until it understands the mechanics better than random chance does.
That's the current edge of what I haven't taught anything yet.
The hardest part across the whole year wasn't the training. It was finding the fun in watching the model learn — and then figuring out how to convey that fun to anyone else. A paddle moving on its own is interesting for about three minutes. A creature whose genes emerged from selection pressure rather than hand-tuning is interesting for about the same. The interesting part isn't the result. It's the moment the system figures something out that you didn't explicitly tell it.
That moment lasts a few minutes. Then you want to find it again somewhere else.
I didn't go looking for ML applications. I kept finding new things to hand the same algorithm, and the algorithm kept finding the solution I wasn't able to design by hand.
EIC Auto has a lot of stops between here and there. The game needs a Gym wrapper. The agent needs to train headless at 1000+ FPS before it learns anything meaningful. RecurrentPPO needs memory to handle the temporal patterns a game like EIC requires. None of that exists yet.
But the Pong paddle moved a year ago. That's further than the year before.
Building in public at blog.rfditservices.com — intake page is there if you're working through something similar.
Top comments (0)