DEV Community

Cover image for The Pain of Building My Own Fully-Featured Locally Hosted ChatGPT Out of Open Source Tools And a Franken Laptop
Jay
Jay

Posted on

The Pain of Building My Own Fully-Featured Locally Hosted ChatGPT Out of Open Source Tools And a Franken Laptop

The AI dropped the ball on the cover art but I need to get some sleep.


Stage Zero: Windows is Gaslighting Me

This whole circus kicked off months ago when I was still shackled to Windows 11. At first, I thought I could just mess around with some local LLMs using tools like Ollama and LM Studio. My Frankenstein laptop—an overclocked, duct-taped monstrosity that should’ve been retired years ago—was somehow pulling it off.

But here’s the thing: no matter how much bloatware you purge, no matter how many “services” you assassinate with Task Manager, Windows still eats half your bloody resources at idle. GPU humming at 50% like it’s siphoning off GPU cycles to train Skynet or crypto mining Dogecoin on my dime. Paranoid? Maybe. Wrong? Probably not.

So yeah, I switched teams. I ditched Windows for Linux, and I’ve never looked back. There’s just no way in hell my current stack would even boot without melting to slag on Windows.


Stage One: The Indie Game Dev Dream

Originally, this was all about my indie game. And when I say indie I mean literally one guy: me. Writing, coding, music, testing, character design, the works. I wanted AI as a co-dev to help me brainstorm characters and iterate ideas faster.

So I started with LLMs for dialogue and iteration. Then realized I needed concept art too, which dragged me straight into the world of Stable Diffusion and image gen. Before long, I had text and images working in tandem. Perfect.

Except ChatGPT started to go soft on me. Wanted to iterate a female character in a superhero suit? Shame-on-you vibes. Meanwhile, loincloth barbarian dudes were fine. Double standards much?

Thus the mission evolved: build my own fully offline AI stack. No filters. No corporate leash. No subscriptions.


Stage Two: The Frontend, the Tunnel, and the Internet Portal to Hell

Once I had OpenWebUI handling LLMs, and A1111 doing the image gen thing, I figured I’d stitch them together with a frontend. Add in Cloudflare tunnels and a domain, and suddenly I could access my baby from anywhere—even my phone.

It was beautiful. It was mine. And it was still mute.


Stage Three: The TTS Graveyard

Ah, Text-to-Speech. My white whale. My cursed obsession.

Attempt #1: Tortoise TTS

The name isn’t just branding—it’s prophecy.

Slow as molasses, GPU-hungry as hell, and somehow still sounded like Google Maps had a stroke. Not good enough. Shelved it.

Attempt #2: Coqui TTS

Coqui looked promising. Open source, fast enough, looked like it’d wire up nicely. I spent about 20 hours over two days writing custom scripts, debugging, smashing my head into formatting mismatches, and trying every workaround known to man.

No matter what I did, I couldn’t get it to play nice with my frontend. Two separate rage sessions later, I shelved it again. And even when I did get a voice sample out of it, it sounded like a dying Speak & Spell.

Attempt #3: Eleven Labs (Cloud Crap)

Voice quality? Absolutely phenomenal. Voices generated just by prompting? Chef’s kiss.

Problem? Cloud-based and egregiously expensive. Like “hundreds a month for 90 minutes of talk time” expensive.

I used my free tokens, gave them the bird, and moved on. My mission statement was clear: fully local, fully offline, or bust.


Stage Four: Enter the Holy Grail (Chatterbox)

Fast forward a couple months. I stumble across Resemble AI’s Chatterbox, and hallelujah—it actually works. Open source, offline, sounds damn near Eleven Labs quality, and set up in under an hour.

I even got it speaking back to me in an Australian accent. (I know what I like, don't judge me.)

Not quite ChatGPT’s live speech pace, but within ~30 seconds of a prompt I get multi-paragraph spoken responses. That’s wizard-level shit in the open-source world.

If anyone from Resemble AI is reading this: add a donation button. I’d happily chuck you a few bucks, but I’m never paying a subscription to make robot voices.


Stage Five: GPU Bottleneck Hell

So I’d done it, right? Wrong.

Now that I had LLMs, Stable Diffusion, and Chatterbox TTS all pulling from my GPU, my rig basically lit itself on fire.

The solution? Offload TTS to my laptop’s internal GPU while Stable Diffusion hogged the eGPU. Easy in theory. In practice? Linux decided GPU numbering should be a cosmic joke.

nvidia-smi said one thing, Chatterbox said another, and for about an hour I was unknowingly offloading Stable Diffusion to the weaker GPU while wondering why everything was breaking. Eventually figured out the “GPU 0 vs GPU 1” mix-up, corrected it, and achieved balance.


The Final Monster

So here’s what I’ve got now:

  • OpenWebUI + LM Studio → LLMs (swap between GPT OSS, Deepseek Coder R1, Mythalion depending on mood/task)
  • A1111 + Stable Diffusion → Image gen
  • Chatterbox (Resemble AI) → TTS that doesn’t sound like it belongs in a horror game
  • Cloudflare Tunnel + Self-hosted website → Remote access anywhere
  • Linux Franken-laptop + eGPU wired through a cable so thick it looks like I could upload my consciousness through it → Hardware glue

It’s messy. It eats resources like a starving demon. It makes me daydream about a 24GB GPU. But it’s mine.

And best of all, no censorship, no subscriptions, no Windows bullshit idling at 50% GPU usage while “definitely not mining crypto” in the background.


The Epilogue

Months of tinkering, rage shelving, re-shelving, and GPU numbering madness later… I’ve got it. My own ChatGPT-alternative stack, running locally, offline, and screaming through silicon like a barely-contained eldritch horror.

I don’t even need it for game dev anymore—I’ve left that industry-shaped dumpster fire behind. But I’m starting Computer Science uni soon, and now I’ve got a real AI companion at my side.

And honestly? After all that pain, after all the rage, after all the swearing at GPUs…

Fuck yeah.

//Ghotet

Top comments (8)

Collapse
 
anik_sikder_313 profile image
Anik Sikder

This is the kind of post that makes other devs feel seen. The pain, the persistence, the “why is my GPU melting” moments, we’ve all been there. Thanks for documenting the madness so hilariously. Bookmarking this for future Franken-stack inspiration.

Collapse
 
ghotet profile image
Jay

I'm trying to counter balance some of of the AI generated top 10 lists with a more human experience lol. I'm glad you enjoyed it!

Collapse
 
annavi11arrea1 profile image
Anna Villarreal

Watcha gonna name him?

Collapse
 
ghotet profile image
Jay

I usually ask them what they would choose for a name and it chose Holly so I gave it a female Australian voice lol

Collapse
 
richyana profile image
Richi

so that's why lol

great article btw, thank you for sharing :)

Thread Thread
 
ghotet profile image
Jay

Usually they choose more abstract names like "Echo" or something but yup, just Holly. lol

I'm glad you enjoyed it :)

Collapse
 
zjcz profile image
Jon Clarke

Wonder if it chose Holly for the Red Dwarf reference?

Some comments may only be visible to logged-in visitors. Sign in to view all comments. Some comments have been hidden by the post's author - find out more