A small, very personal story from the Build Small Hackathon

Let me start with something honest: I built this whole thing while on parental leave, with a toddler who never stops and a baby who is just figuring out the world. People assume parental leave is rest. It is not. It is beautiful, it is loud, and it is a little bit of a pandemonium.
So LoFinity became my small escape. One hour here, twenty minutes there, always between nap times. And honestly? It has been one of the best things for my head: a way to disconnect from the chaos without leaving the house, and to build something that is mine, slowly, piece by piece.
Why lofi, and why me
The idea is not new for me. I have been carrying it around in my head for more than a year. I love lofi music. It is genuinely one of my favorite genres. And not only because it sounds nice. I am neurodivergent, and focusing is not always easy for me. Lofi has been a real game-changer: those warm, repetitive, slightly imperfect beats are the thing that finally lets my brain settle down and work while giving me that 90s childhood nostalgia. So building a little machine that generates endless lofi felt almost personal.
What LoFinity is
LoFinity is a vending machine for lofi. You land in a cozy low-poly, anime-ish little street; you walk up to the machine; you insert a coin; you type a vibe ("studying late in a snowy cabin"); and out pops a cassette tape with a freshly generated song. Everything chill, pleasing and without triggering your dopamine.
The honest reason I started now
Here is the funny part. I had the vision very clearly in my head. What I did not have was the comfort with Three.js to actually build it. 3D on the web always intimidated me. Then Anthropic dropped Fable 5 and I just HAD to try it. It was genuinely impressive: it helped me go from "I have this in my head" to a real, living 3D world. It worked beautifully... right until it got banned. But hey, shit happens. 🤷 I am grateful for the 3 days, enough to get me kickstarted.
A little under the hood
The whole point of the Build Small Hackathon is small, open weights you can read, run and reshape: everything under 32B parameters, deployed as a Gradio app, running on ZeroGPU. So here is my stack:
-
Frontend: a fully custom Three.js world (not default Gradio components), served from a custom
gradio.server.Server. The vending machine, the cassettes, the day/night toggle, the little Game Boy. All hand-built. - The brain: MiniCPM5-1B from OpenBMB (~1B parameters). It takes your free-text wish and turns it into a structured recipe: instruments, mood, BPM, and an ambience tag. (Locally I use Ollama with llama3.2:3b for the same job.)
- The music: MusicGen (the medium model on the Space, ~1.5B) generates the actual audio.
- The atmosphere: a separate ambience layer (rain, ocean, crickets, café murmur, fireplace, vinyl crackle) mixed gently under the beat.
And the part I am most proud of: all of it runs on ZeroGPU, in a single GPU acquisition per song. Two open models, orchestrated together, both comfortably small. No giant cloud API. That is the spirit of the hackathon, and making it actually work inside that constraint was half the fun.
What I actually learned (the struggles)
I went in thinking "I'll just prompt a model and get a song." Reality laughed at me.
Lesson 1: 30 seconds is a wall. MusicGen generates about 30 seconds in one shot, full stop. That is its training window. The moment I wanted a longer tape, I had to stitch chunks together: take the last few seconds of audio, feed them back as a seed, and continue. Simple in theory. In practice the continuations slowly degraded into a weird noise. I chased that bug for hours. Was the ambience leaking into the seed? Was it the guidance scale? Then I understood the real cause: each continuation was generating seed + 30 seconds, which pushed the tail past the model's 30-second comfort zone, where it falls apart. The fix was to cap every shot so the total never crosses that window. A very humbling "go read how the model actually works" moment.
Lesson 2: one model cannot do everything, so orchestrate. A music model is great at music and completely deaf to texture words like "rain" or "vinyl crackle." So I stopped fighting it: a small LLM plans the recipe, MusicGen plays the notes, and a separate ambience layer adds the rain and the crackle on top. Each model does the one thing it is good at. Small models plus smart orchestration beat one big model trying to do it all.
Lesson 3: constraints make you creative. Running on ZeroGPU means the GPU lives in a separate, forked process, so my nice progress bar could not see what the worker was doing. Instead of fighting it, I turned the bar into a smooth time-based estimate. Another fun one: the runtime sets SPACES_ZERO_GPU='1', not 'true', and my exact-string check silently sent everything to the CPU for a while. 8 years in the industry and still getting the classic humbling ✌️😅
So, was it worth it?
A thousand times yes. I shipped an idea I had been carrying for over a year, I finally got comfortable with Three.js, and I did it in the little cracks of new-parent life. LoFinity is small, it runs on small open models, and it is genuinely mine. For someone whose brain works better with lofi in the background, building the machine that makes it has been weirdly healing.
If you want to make your own tape, the door is open. Insert a coin. ☕🎵
Badges I'm going for 🏅
🌳 Thousand Token Wood (the whimsical track) + Community Choice: LoFinity is pure cozy whimsy.
🎨 Off Brand: the UI is a fully custom Three.js world, miles past the default Gradio components.
🐣 Tiny Titan: every model I ship is ≤4B (MiniCPM5-1B ~1B + MusicGen-medium ~1.5B).
🧩 MiniCPM sponsor prize: OpenBMB's MiniCPM5-1B is the brain that plans every single song.
🎬 Best Demo: once my demo video and social post are up (that is literally next on my list).
🤖 Best Agent: the multi-model orchestration, a small LLM planning, an audio model performing, an ambience layer dressing the set. More pipeline than autonomous agent, but the multi-step model collaboration is real.
🏆 Bonus Quest Champion: if stacking several bonus criteria counts for anything.
🃏 Judges' Wildcard: well... a 3D lofi vending machine is nothing if not a wildcard.

Top comments (0)