Jay

Posted on Aug 24

The Pain of Building My Own Fully-Featured Locally Hosted ChatGPT Out of Open Source Tools And a Franken Laptop

#opensource #ai #linux

The AI dropped the ball on the cover art but I need to get some sleep.

Stage Zero: Windows is Gaslighting Me

This whole circus kicked off months ago when I was still shackled to Windows 11. At first, I thought I could just mess around with some local LLMs using tools like Ollama and LM Studio. My Frankenstein laptop—an overclocked, duct-taped monstrosity that should’ve been retired years ago—was somehow pulling it off.

But here’s the thing: no matter how much bloatware you purge, no matter how many “services” you assassinate with Task Manager, Windows still eats half your bloody resources at idle. GPU humming at 50% like it’s siphoning off GPU cycles to train Skynet or crypto mining Dogecoin on my dime. Paranoid? Maybe. Wrong? Probably not.

So yeah, I switched teams. I ditched Windows for Linux, and I’ve never looked back. There’s just no way in hell my current stack would even boot without melting to slag on Windows.

Stage One: The Indie Game Dev Dream

Originally, this was all about my indie game. And when I say indie I mean literally one guy: me. Writing, coding, music, testing, character design, the works. I wanted AI as a co-dev to help me brainstorm characters and iterate ideas faster.

So I started with LLMs for dialogue and iteration. Then realized I needed concept art too, which dragged me straight into the world of Stable Diffusion and image gen. Before long, I had text and images working in tandem. Perfect.

Except ChatGPT started to go soft on me. Wanted to iterate a female character in a superhero suit? Shame-on-you vibes. Meanwhile, loincloth barbarian dudes were fine. Double standards much?

Thus the mission evolved: build my own fully offline AI stack. No filters. No corporate leash. No subscriptions.

Stage Two: The Frontend, the Tunnel, and the Internet Portal to Hell

Once I had OpenWebUI handling LLMs, and A1111 doing the image gen thing, I figured I’d stitch them together with a frontend. Add in Cloudflare tunnels and a domain, and suddenly I could access my baby from anywhere—even my phone.

It was beautiful. It was mine. And it was still mute.

Stage Three: The TTS Graveyard

Ah, Text-to-Speech. My white whale. My cursed obsession.

Attempt #1: Tortoise TTS

The name isn’t just branding—it’s prophecy.

Slow as molasses, GPU-hungry as hell, and somehow still sounded like Google Maps had a stroke. Not good enough. Shelved it.

Attempt #2: Coqui TTS

Coqui looked promising. Open source, fast enough, looked like it’d wire up nicely. I spent about 20 hours over two days writing custom scripts, debugging, smashing my head into formatting mismatches, and trying every workaround known to man.

No matter what I did, I couldn’t get it to play nice with my frontend. Two separate rage sessions later, I shelved it again. And even when I did get a voice sample out of it, it sounded like a dying Speak & Spell.

Attempt #3: Eleven Labs (Cloud Crap)

Voice quality? Absolutely phenomenal. Voices generated just by prompting? Chef’s kiss.

Problem? Cloud-based and egregiously expensive. Like “hundreds a month for 90 minutes of talk time” expensive.

I used my free tokens, gave them the bird, and moved on. My mission statement was clear: fully local, fully offline, or bust.

Stage Four: Enter the Holy Grail (Chatterbox)

Fast forward a couple months. I stumble across Resemble AI’s Chatterbox, and hallelujah—it actually works. Open source, offline, sounds damn near Eleven Labs quality, and set up in under an hour.

I even got it speaking back to me in an Australian accent. (I know what I like, don't judge me.)

Not quite ChatGPT’s live speech pace, but within ~30 seconds of a prompt I get multi-paragraph spoken responses. That’s wizard-level shit in the open-source world.

If anyone from Resemble AI is reading this: add a donation button. I’d happily chuck you a few bucks, but I’m never paying a subscription to make robot voices.

Stage Five: GPU Bottleneck Hell

So I’d done it, right? Wrong.

Now that I had LLMs, Stable Diffusion, and Chatterbox TTS all pulling from my GPU, my rig basically lit itself on fire.

The solution? Offload TTS to my laptop’s internal GPU while Stable Diffusion hogged the eGPU. Easy in theory. In practice? Linux decided GPU numbering should be a cosmic joke.

nvidia-smi said one thing, Chatterbox said another, and for about an hour I was unknowingly offloading Stable Diffusion to the weaker GPU while wondering why everything was breaking. Eventually figured out the “GPU 0 vs GPU 1” mix-up, corrected it, and achieved balance.

The Final Monster

So here’s what I’ve got now:

OpenWebUI + LM Studio → LLMs (swap between GPT OSS, Deepseek Coder R1, Mythalion depending on mood/task)
A1111 + Stable Diffusion → Image gen
Chatterbox (Resemble AI) → TTS that doesn’t sound like it belongs in a horror game
Cloudflare Tunnel + Self-hosted website → Remote access anywhere
Linux Franken-laptop + eGPU wired through a cable so thick it looks like I could upload my consciousness through it → Hardware glue

It’s messy. It eats resources like a starving demon. It makes me daydream about a 24GB GPU. But it’s mine.

And best of all, no censorship, no subscriptions, no Windows bullshit idling at 50% GPU usage while “definitely not mining crypto” in the background.

The Epilogue

Months of tinkering, rage shelving, re-shelving, and GPU numbering madness later… I’ve got it. My own ChatGPT-alternative stack, running locally, offline, and screaming through silicon like a barely-contained eldritch horror.

I don’t even need it for game dev anymore—I’ve left that industry-shaped dumpster fire behind. But I’m starting Computer Science uni soon, and now I’ve got a real AI companion at my side.

And honestly? After all that pain, after all the rage, after all the swearing at GPUs…

Fuck yeah.

//Ghotet

Top comments (10)

Anik Sikder • Aug 25

This is the kind of post that makes other devs feel seen. The pain, the persistence, the “why is my GPU melting” moments, we’ve all been there. Thanks for documenting the madness so hilariously. Bookmarking this for future Franken-stack inspiration.

Jay • Aug 25

I'm trying to counter balance some of of the AI generated top 10 lists with a more human experience lol. I'm glad you enjoyed it!

Anna Villarreal • Aug 24

Watcha gonna name him?

Jay • Aug 25

I usually ask them what they would choose for a name and it chose Holly so I gave it a female Australian voice lol

Richi • Aug 25

so that's why lol

great article btw, thank you for sharing :)

Jay • Aug 25

Usually they choose more abstract names like "Echo" or something but yup, just Holly. lol

I'm glad you enjoyed it :)

Jon Clarke • Aug 30

Wonder if it chose Holly for the Red Dwarf reference?

Alex Towell • Oct 16

I've also been building my own UI for local models—not multimodal yet, but a terminal-based TUI with a command system that lets me chat with and search across all of my past OpenAI, Anthropic, and Gemini conversations. I wrote a bulk export → bulk import pipeline so I can operate over the entire chat history locally.

RAG support is next, but I don’t want a plain vector store. I’m modeling it as a complex network so I can ask higher-level structural queries like:

“Find a conversation in the same conceptual cluster as this one.”
“Show me hub or bridge nodes—conversations that connect distant topics in my history.”

This weekend I’m starting to wire it into MCP servers for my own datasets: all my GitHub repos, my ebook library, bookmarks, notebooks, etc.—so I can literally talk to my accumulated data. Not just retrieval, but networked memory. A personal digital simulacrum.

Repo: github.com/queelius/ctk (Conversation Toolkit)

Jay • Oct 18

I’m gonna be real, the two of us together would be dangerous.
You’re building the actual cognitive backbone I’ve been architecting around, and I didn’t even realize it until now.

I went wide, chasing modularity, presence, and plug-and-play UI so the whole ecosystem could breathe.
You went deep, anchoring it all with functional intelligence and structure.
Same goal, opposite approaches. It’s brilliant.

Your network-based memory model hits exactly where I’ve been wanting to take this next.
I’ve been so focused on the interface layer, making it feel alive, easy to incorporate new technology, that I haven’t even touched the kind of semantic clustering and conceptual bridging you’re already modeling.

I’m following your repo for sure, and if you’re documenting the process, I’ll be glued to it.
Once I finalize the directory structure and front-end orchestration, I might reach out. Because honestly, you’re doing the foundational work I wanted to build on top of.

Keep it up, man. I think we’re climbing the same mountain from opposite sides, and I’ve got a feeling the view from the summit’s gonna be insane.

Some comments may only be visible to logged-in visitors. Sign in to view all comments. Some comments have been hidden by the post's author - find out more