DEV Community

Cover image for Set Up Your Own Personal AI Stack: Summarized Version
Jay
Jay

Posted on • Edited on

Set Up Your Own Personal AI Stack: Summarized Version

A heavily summarized version (ramble free) of the long form article I posted previously.

You can read that one here:
https://dev.to/ghotet/set-up-your-own-personal-ai-frankenstack-diy-edition-309


Hey folks. I finally have a moment to sit down and lay out the blueprint for setting up your own AI stack, which I dubbed the "Frankenstack"—and it seems to have stuck haha.

This stack consists of:

  • LLM software
  • Stable Diffusion (image generation)
  • Text-to-speech (but not speech-to-text)
  • Web search for the LLM
  • All tied together through a unified front end

Just to clarify upfront: this isn't a tutorial or step-by-step guide. I'm laying out the toolkit, giving notes and caveats for each piece of software. For example, I'll list my machine specs and the LLMs I run to give you a realistic expectation. This stack is GPU/CPU hungry.

My Specs

  • Modified Alienware 15 R4 (circa 2018)
  • Nvidia GTX 1070 8GB (laptop GPU)
  • Nvidia RTX 3060 12GB (AGA external GPU dock)
  • Intel i7-8750H CPU @ 2.20GHz
  • 32GB RAM
  • All drives are NVMe
  • Stack uses ~120GB including ~8 LLM/SD models

LLM

LM Studio was my choice:

  • Offers an in-depth front end with performance tuning and experimental features
  • Allows offloading KV cache for faster performance (quality may vary)
  • Lets you run multiple models simultaneously (if your system can handle it)
  • Easy download of models directly from Hugging Face

I recommend trying it before asking about alternatives like Ollama. I’ve used Ollama in CLI mode, but I wasn’t a fan personally.

Models I use:

  • GPT-OSS 20B – My favorite for reasoning. Adjustable low/medium/high settings. Low ~2s, High ~2min. Only runs 3-4B parameters at a time, so lighter on resources. Trained for tool use.
  • Mythalion 13B – Creative writing, fast, decent chat, good for Stable Diffusion prompts. Not for code.
  • Deepseek-Coder (R1) – Strictly for complex scripts. Slowest model, but handles long code reliably.

Vision models:

  • I haven’t used these extensively; if you need vision, try a 7B model and test. Smaller models may be better for limited VRAM.
  • Parameter count isn’t always indicative of performance; adjust based on GPU capacity.

Stable Diffusion (Image Generation)

I use A1111:

  • Straightforward GUI with deep settings for LoRA training, img2img, VAE support
  • I mainly use it for cover art or character concepts
  • Default model: RevAnimated
  • ComfyUI is an alternative but more node-based; I didn’t use it

Text-to-Speech

Chatterbox – 100% recommend:

  • Local alternative to ElevenLabs
  • Streams in chunks for faster playback
  • Supports voice cloning via ResembleAI: just a 10-second clip for a new voice
  • Swap default voice by editing the relevant script (check GitHub for details)
  • Other options (Tortoise, Coqui) were worse in my experience.

Web Search

SearXNG – acts like a meta-search engine:

  • Searches multiple engines at once (Google, DuckDuckGo, Brave, etc.)
  • AI can query several sources in one shot
  • I run it through Cloudflare Warp for privacy; Tor is optional

Frontend

OpenWebUI – central control hub:

  • Configure multiple models, knowledge bases, tools
  • Evaluate LLM responses, run pipelines, execute code, manage databases
  • TTS autoplay option in user settings; speaker icon for manual playback
  • Offline mode available (set Offline_Mode = true)
  • Customize branding freely; commercial use over 50 users may require paid plan

Custom prompts/personas:

  • Set base prompt in LM Studio
  • OpenWebUI admin panel allows high-priority prompts
  • Per-user prompts can be layered on top

Linux Launcher Script

  • I created a aistart alias to sequentially launch all components for proper resource allocation
  • LM Studio doesn’t auto-load the last model yet
  • Debug launcher opens multiple terminals for monitoring
  • Important: GPU assignment isn’t always respected automatically; check NVIDIA settings

Why Not Docker?

  • Docker caused localhost address issues on Linux
  • Added dependencies can break the stack; simpler is better
  • Windows may not have this issue

Connecting to the Web

  • Requires domain and Cloudflare tunnel
  • Tunnel forwards traffic to OpenWebUI on your local machine
  • Lets you access the stack anywhere, including mobile
  • ChatGPT or documentation can guide setup quickly

Final Thoughts

  • DO NOT expect this to run perfectly on first try
  • Troubleshooting is part of the fun and rewarding
  • Experiment, iterate, optimize
  • Full tutorial may come later for both OS

Best of luck, have fun, and remember: the pain of troubleshooting makes the success sweeter.

// Ghotet

Top comments (2)

Collapse
 
williamsj04 profile image
Jessica Williams

But with the pace AI is evolving looks like it will leave humans behind.

Collapse
 
ghotet profile image
Jay

There is a chance. I saw someone liken it to when the calculator came along. I would argue even when the computer came along. Both of those changed the world, just not as fast. Embracing it and learning how it works is a good place to be :)