Build a Private Windows AI Assistant with LM Studio and AnythingLLM
A fully private AI stack for Windows that never touches the cloud. LM Studio serves as your local model server with a visual interface — browse, download, and run models from HuggingFace without typing a single command. AnythingLLM adds document RAG, workspace isolation, and agent skills on top.
This stack is built for Windows users who prefer a graphical interface — no Docker, no terminal commands beyond the basics.
What you'll build
- Visual model browser — search HuggingFace models inside LM Studio, download with one click
- Drop-in document Q&A — PDF, DOCX, TXT, CSV, code files. Drag them into AnythingLLM and ask questions
- No data leaves your PC — all inference and embedding runs locally, works completely offline
- No Docker, no WSL, no CLI — both apps are native Windows desktop installers
- $0/month — the only cost is the GPU you already own
Prerequisites
- Windows 11 (64-bit)
- GPU with 4GB+ VRAM (6GB+ preferred), CPU works but slower
- 16GB RAM minimum
- 10-30GB free disk for models
Step 1: Install LM Studio
Go to lmstudio.ai and download the Windows installer. Run it — default path is fine.
LM Studio is both a model manager and a local OpenAI-compatible API server. You search models from Hugging Face visually and serve them over a local HTTP endpoint.
Step 2: Download a model
In LM Studio, go to the Discover tab and search for Qwen2.5-14B. Look for a Q4_K_M quantized version — best balance of quality and size. Click Download and wait (~8 GB).
If you have 8GB VRAM or less, search for Qwen2.5-7B or Llama 3.2 3B instead.
Step 3: Start the local server
Go to the Developer tab in LM Studio, select your model, and click Start Server. You should see: Server listening on http://localhost:1234.
Step 4: Install AnythingLLM
Go to anythingllm.com/desktop and download the Windows installer. Install for Current User only — not All Users — to avoid a known spawn error.
Step 5: Connect AnythingLLM to LM Studio
In AnythingLLM Settings > LLM Preference, select LM Studio as the provider and set the base URL to http://localhost:1234. Save changes. Go to Embedding Model and set to AnythingLLM built-in.
Step 6: Chat and upload documents
Create a workspace, then drag files into the chat area. AnythingLLM creates embeddings locally and lets you ask questions about your documents. Workspaces are isolated — perfect for keeping work and personal contexts separate.
Performance by GPU
| GPU | Max model | Speed |
|---|---|---|
| RTX 3060 12GB | 14B at Q4 | 15-20 tok/s |
| RTX 4060 8GB | 7B at Q4 | 20-30 tok/s |
| CPU-only 16GB | 3B at Q4 | 3-5 tok/s |
Cost comparison
Local stack: $0/month + $200 for used RTX 3060. ChatGPT Plus: $20/month with no privacy guarantees. The GPU pays for itself in 10 months.
Originally published on everylocalai.com
Top comments (0)