A heavily summarized version (ramble free) of the long form article I posted previously.
You can read that one here:
https://dev.to/ghotet/set-up-your-own-personal-ai-frankenstack-diy-edition-309
Hey folks. I finally have a moment to sit down and lay out the blueprint for setting up your own AI stack, which I dubbed the "Frankenstack"—and it seems to have stuck haha.
This stack consists of:
- LLM software
- Stable Diffusion (image generation)
- Text-to-speech (but not speech-to-text)
- Web search for the LLM
- All tied together through a unified front end
Just to clarify upfront: this isn't a tutorial or step-by-step guide. I'm laying out the toolkit, giving notes and caveats for each piece of software. For example, I'll list my machine specs and the LLMs I run to give you a realistic expectation. This stack is GPU/CPU hungry.
My Specs
- Modified Alienware 15 R4 (circa 2018)
- Nvidia GTX 1070 8GB (laptop GPU)
- Nvidia RTX 3060 12GB (AGA external GPU dock)
- Intel i7-8750H CPU @ 2.20GHz
- 32GB RAM
- All drives are NVMe
- Stack uses ~120GB including ~8 LLM/SD models
LLM
LM Studio was my choice:
- Offers an in-depth front end with performance tuning and experimental features
- Allows offloading KV cache for faster performance (quality may vary)
- Lets you run multiple models simultaneously (if your system can handle it)
- Easy download of models directly from Hugging Face
I recommend trying it before asking about alternatives like Ollama. I’ve used Ollama in CLI mode, but I wasn’t a fan personally.
Models I use:
- GPT-OSS 20B – My favorite for reasoning. Adjustable low/medium/high settings. Low ~2s, High ~2min. Only runs 3-4B parameters at a time, so lighter on resources. Trained for tool use.
- Mythalion 13B – Creative writing, fast, decent chat, good for Stable Diffusion prompts. Not for code.
- Deepseek-Coder (R1) – Strictly for complex scripts. Slowest model, but handles long code reliably.
Vision models:
- I haven’t used these extensively; if you need vision, try a 7B model and test. Smaller models may be better for limited VRAM.
- Parameter count isn’t always indicative of performance; adjust based on GPU capacity.
Stable Diffusion (Image Generation)
I use A1111:
- Straightforward GUI with deep settings for LoRA training, img2img, VAE support
- I mainly use it for cover art or character concepts
- Default model: RevAnimated
- ComfyUI is an alternative but more node-based; I didn’t use it
Text-to-Speech
Chatterbox – 100% recommend:
- Local alternative to ElevenLabs
- Streams in chunks for faster playback
- Supports voice cloning via ResembleAI: just a 10-second clip for a new voice
- Swap default voice by editing the relevant script (check GitHub for details)
- Other options (Tortoise, Coqui) were worse in my experience.
Web Search
SearXNG – acts like a meta-search engine:
- Searches multiple engines at once (Google, DuckDuckGo, Brave, etc.)
- AI can query several sources in one shot
- I run it through Cloudflare Warp for privacy; Tor is optional
Frontend
OpenWebUI – central control hub:
- Configure multiple models, knowledge bases, tools
- Evaluate LLM responses, run pipelines, execute code, manage databases
- TTS autoplay option in user settings; speaker icon for manual playback
- Offline mode available (set Offline_Mode = true)
- Customize branding freely; commercial use over 50 users may require paid plan
Custom prompts/personas:
- Set base prompt in LM Studio
- OpenWebUI admin panel allows high-priority prompts
- Per-user prompts can be layered on top
Linux Launcher Script
- I created a
aistart
alias to sequentially launch all components for proper resource allocation - LM Studio doesn’t auto-load the last model yet
- Debug launcher opens multiple terminals for monitoring
- Important: GPU assignment isn’t always respected automatically; check NVIDIA settings
Why Not Docker?
- Docker caused localhost address issues on Linux
- Added dependencies can break the stack; simpler is better
- Windows may not have this issue
Connecting to the Web
- Requires domain and Cloudflare tunnel
- Tunnel forwards traffic to OpenWebUI on your local machine
- Lets you access the stack anywhere, including mobile
- ChatGPT or documentation can guide setup quickly
Final Thoughts
- DO NOT expect this to run perfectly on first try
- Troubleshooting is part of the fun and rewarding
- Experiment, iterate, optimize
- Full tutorial may come later for both OS
Best of luck, have fun, and remember: the pain of troubleshooting makes the success sweeter.
// Ghotet
Top comments (2)
But with the pace AI is evolving looks like it will leave humans behind.
There is a chance. I saw someone liken it to when the calculator came along. I would argue even when the computer came along. Both of those changed the world, just not as fast. Embracing it and learning how it works is a good place to be :)