DEV Community

EveryLocalAI
EveryLocalAI

Posted on

Build a Private Windows AI Assistant with LM Studio and AnythingLLM

Build a Private Windows AI Assistant with LM Studio and AnythingLLM

A fully private AI stack for Windows that never touches the cloud. LM Studio serves as your local model server with a visual interface — browse, download, and run models from HuggingFace without typing a single command. AnythingLLM adds document RAG, workspace isolation, and agent skills on top.

This stack is built for Windows users who prefer a graphical interface — no Docker, no terminal commands beyond the basics.

What you'll build

  • Visual model browser — search HuggingFace models inside LM Studio, download with one click
  • Drop-in document Q&A — PDF, DOCX, TXT, CSV, code files. Drag them into AnythingLLM and ask questions
  • No data leaves your PC — all inference and embedding runs locally, works completely offline
  • No Docker, no WSL, no CLI — both apps are native Windows desktop installers
  • $0/month — the only cost is the GPU you already own

Prerequisites

  • Windows 11 (64-bit)
  • GPU with 4GB+ VRAM (6GB+ preferred), CPU works but slower
  • 16GB RAM minimum
  • 10-30GB free disk for models

Step 1: Install LM Studio

Go to lmstudio.ai and download the Windows installer. Run it — default path is fine.

LM Studio is both a model manager and a local OpenAI-compatible API server. You search models from Hugging Face visually and serve them over a local HTTP endpoint.

Step 2: Download a model

In LM Studio, go to the Discover tab and search for Qwen2.5-14B. Look for a Q4_K_M quantized version — best balance of quality and size. Click Download and wait (~8 GB).

If you have 8GB VRAM or less, search for Qwen2.5-7B or Llama 3.2 3B instead.

Step 3: Start the local server

Go to the Developer tab in LM Studio, select your model, and click Start Server. You should see: Server listening on http://localhost:1234.

Step 4: Install AnythingLLM

Go to anythingllm.com/desktop and download the Windows installer. Install for Current User only — not All Users — to avoid a known spawn error.

Step 5: Connect AnythingLLM to LM Studio

In AnythingLLM Settings > LLM Preference, select LM Studio as the provider and set the base URL to http://localhost:1234. Save changes. Go to Embedding Model and set to AnythingLLM built-in.

Step 6: Chat and upload documents

Create a workspace, then drag files into the chat area. AnythingLLM creates embeddings locally and lets you ask questions about your documents. Workspaces are isolated — perfect for keeping work and personal contexts separate.

Performance by GPU

GPU Max model Speed
RTX 3060 12GB 14B at Q4 15-20 tok/s
RTX 4060 8GB 7B at Q4 20-30 tok/s
CPU-only 16GB 3B at Q4 3-5 tok/s

Cost comparison

Local stack: $0/month + $200 for used RTX 3060. ChatGPT Plus: $20/month with no privacy guarantees. The GPU pays for itself in 10 months.


Originally published on everylocalai.com

Top comments (0)