DEV Community

Cover image for Set Up Your Own Personal AI Stack: Summarized
Jay
Jay

Posted on • Edited on

Set Up Your Own Personal AI Stack: Summarized

Hey folks. I finally have a moment to sit down and lay out the blueprint for setting up your own AI stack. This will be a quick summary not a tutorial.

This stack consists of:

  • LLM software
  • Stable Diffusion (image generation)
  • Text-to-speech (but not speech-to-text)
  • Web search for the LLM
  • All tied together through a unified front end

Just to clarify upfront: this isn't a tutorial or step-by-step guide. I'm laying out the toolkit, giving notes and caveats for each piece of software. For example, I'll list my machine specs and the LLMs I run to give you a realistic expectation. This stack is GPU/CPU hungry.

My Specs

  • Modified Alienware 15 R4 (circa 2018)
  • Nvidia GTX 1070 8GB (laptop GPU)
  • Nvidia RTX 3060 12GB (AGA external GPU dock)
  • Intel i7-8750H CPU @ 2.20GHz
  • 32GB RAM
  • All drives are NVMe
  • Stack uses ~120GB including ~8 LLM/SD models

LLM

LM Studio was my choice:

  • Offers an in-depth front end with performance tuning and experimental features
  • Allows offloading KV cache for faster performance (quality may vary)
  • Lets you run multiple models simultaneously (if your system can handle it)
  • Easy download of models directly from Hugging Face

I recommend trying it before asking about alternatives like Ollama. I’ve used Ollama in CLI mode, but I wasn’t a fan personally.

Models I use:

  • GPT-OSS 20B – My favorite for reasoning. Adjustable low/medium/high settings. Low ~2s, High ~2min. Only runs 3-4B parameters at a time, so lighter on resources. Trained for tool use.
  • Mythalion 13B – Creative writing, fast, decent chat, good for Stable Diffusion prompts. Not for code.
  • Deepseek-Coder (R1) – Strictly for complex scripts. Slowest model, but handles long code reliably.

Vision models:

  • I haven’t used these extensively; if you need vision, try a 7B model and test. Smaller models may be better for limited VRAM.
  • Parameter count isn’t always indicative of performance; adjust based on GPU capacity.

Stable Diffusion (Image Generation)

I use A1111:

  • Straightforward GUI with deep settings for LoRA training, img2img, VAE support
  • I mainly use it for cover art or character concepts
  • Default model: RevAnimated
  • ComfyUI is an alternative but more node-based; I didn’t use it

Text-to-Speech

Chatterbox – 100% recommend:

  • Local alternative to ElevenLabs
  • Streams in chunks for faster playback
  • Supports voice cloning via ResembleAI: just a 10-second clip for a new voice
  • Swap default voice by editing the relevant script (check GitHub for details)
  • Other options (Tortoise, Coqui) were worse in my experience.

Web Search

SearXNG – acts like a meta-search engine:

  • Searches multiple engines at once (Google, DuckDuckGo, Brave, etc.)
  • AI can query several sources in one shot
  • I run it through Cloudflare Warp for privacy; Tor is optional

Frontend

OpenWebUI – central control hub:

  • Configure multiple models, knowledge bases, tools
  • Evaluate LLM responses, run pipelines, execute code, manage databases
  • TTS autoplay option in user settings; speaker icon for manual playback
  • Offline mode available (set Offline_Mode = true)
  • Customize branding freely; commercial use over 50 users may require paid plan

Custom prompts/personas:

  • Set base prompt in LM Studio
  • OpenWebUI admin panel allows high-priority prompts
  • Per-user prompts can be layered on top

Linux Launcher Script

  • I created a aistart alias to sequentially launch all components for proper resource allocation
  • LM Studio doesn’t auto-load the last model yet
  • Debug launcher opens multiple terminals for monitoring
  • Important: GPU assignment isn’t always respected automatically; check NVIDIA settings

Why Not Docker?

  • Docker caused localhost address issues on Linux
  • Added dependencies can break the stack; simpler is better
  • Windows may not have this issue

Connecting to the Web

  • Requires domain and Cloudflare tunnel
  • Tunnel forwards traffic to OpenWebUI on your local machine
  • Lets you access the stack anywhere, including mobile
  • ChatGPT or documentation can guide setup quickly

Final Thoughts

  • DO NOT expect this to run perfectly on first try
  • Troubleshooting is part of the fun and rewarding
  • Experiment, iterate, optimize
  • Full tutorial may come later for both OS

Best of luck, have fun, and remember: the pain of troubleshooting makes the success sweeter.

// Ghotet

Top comments (9)

Collapse
 
guypowell profile image
Guy

Really enjoyed your “Frankenstack” breakdown, there’s something genuinely inspiring about someone in a garage sweating those GPU cycles and privacy ideals to build something future-proof. When I wired up my own orchestration with Claude, I leaned on that same hacker ethic: tight context flow, predictable handoffs, and making sure the tools serve you. Not the other way around. Your stack isn’t just neat tech tinkering; it’s real, usable sovereignty in action. Nice work.

Quick question for you: when you’re juggling image gen, TTS, web search, and LLMs under a single front end, have you felt any orchestration pain? Like components tripping over each other or has everything stayed remarkably smooth?

Collapse
 
ghotet profile image
Jay

Thank you for acknowledhing the hacker spirit and privacy first. That was the main premise when I started down this road.

The TTS was the last piece I added and it wasn't at all smooth lol. It takes up just enough vram to either cause my larger LLM models to fail or for the stable diffusion model not to load. I thought it was just a case of editing my launcher script to point the TTS to my lower end gpu but no matter what I did everything would try to load on my 3060 causing vram shortages.

Eventually I realized the only solution is to make a new virtual environment that can't see my 3060 at all so it will default to my 1070 instead and alleviate that memory shortage. It does all run on the 3060 if I use smaller models or GPT-OSS though which is pretty impressive. If i try to run Mythalion 13B or Deepseek coder, LM studio will eject the model and fail to load it back in.

Collapse
 
guypowell profile image
Guy

That workaround makes total sense. I’ve had to play the same shell game with GPUs when juggling heavier models. It’s funny how orchestration ends up mattering as much for hardware as it does for the models themselves. You can wire Claude into a codebase with all the right context flow, but if your VRAM isn’t managed like a scarce resource, the whole thing collapses anyway.

What you did with the virtual environment to blind the 3060 is exactly the kind of hacker-first pragmatism I love. It’s not elegant, but it’s reliable, and that’s what makes it valuable. Honestly, that tension between wanting the “big brain” models like Mythalion 13B and keeping the whole stack stable is what makes these DIY builds so interesting. You’re constantly deciding whether to scale down for consistency or squeeze every last drop of GPU for capability.

Are you thinking about automating that orchestration a bit? Like, routing smaller jobs or TTS by default to the weaker GPU while reserving the 3060 for your heavier LLMs? That could save you from having to babysit it every time you switch workloads.

Thread Thread
 
ghotet profile image
Jay

Yes I definitely have had to learn how to manage the vram. While i was trying to figure out why, even when jn my shell scripts i pointed TTS at the 1070 it would load on the 3060 I realized that in my effort to keep the number of venvs to a minimum I had shot myself in the foot a little.

The reason TTS kept ending up on the 3060 and choking the stack was because the venv it was running on was set to the 3060. That took longer then it should have for me to figure it out lol.

My 1070 is basically doing nothing most of the time so I have started doing exactly qhat you said and rerouting the lighter tasks over to the 1070.

Im glad I set this up cause it caused me to learn about proper resource management and for the first time I see vram as a scarce resource. I'd love to just grab a 24gb gpu but at the same time I would be missing out on the opportunity to learn all of this. One of my higher priority missions now is to finally start building out a server now that I have a better understanding of what I nees and how it all works. I would love to be able to actually leave it running 24/7 since I linked it to a web domain but sometimes I have to shut it down to free up resources for other tasks since I run it on my primary PC setup.

Thread Thread
 
guypowell profile image
Guy

That’s exactly it. You’ve basically turned resource management into part of the orchestration layer, and that’s a skill most people skip straight over when they just rent cloud GPUs. Having to wrangle it locally forces you to think in terms of trade-offs and pipelines instead of raw horsepower. And you’re right, if you’d just dropped cash on a 24GB card you’d have missed that learning curve completely.

The server idea makes sense now that you’ve mapped out what each piece needs. Being able to leave it running 24/7 without starving your main rig will give you a whole different feel for stability too, it stops being a side project you spin up and down and starts behaving like an actual service.

Are you planning to script the routing logic once you’ve got the server, or keep it manual for the sake of control?

Collapse
 
vidakhoshpey22 profile image
Vida Khoshpey

This is great, it's so great that you did such a great job and I find you.😂😁💪🏻 Keep going Ghotet

Collapse
 
ghotet profile image
Jay

Thank you! It was a very fun and rewarding project. A little frustrsting at times, but completely worth it lol.

Collapse
 
williamsj04 profile image
Jessica Williams

But with the pace AI is evolving looks like it will leave humans behind.

Collapse
 
ghotet profile image
Jay

There is a chance. I saw someone liken it to when the calculator came along. I would argue even when the computer came along. Both of those changed the world, just not as fast. Embracing it and learning how it works is a good place to be :)