GitHub Copilot costs $10-19/month. Cursor is $20/month. Every AI coding assistant I know of has one thing in common: your code goes to a remote server for inference.
That bothers me more than the cost.
When you paste a function into Copilot to ask for a refactor, that code leaves your machine and goes to Microsoft's servers. For most hobby projects that's fine. For anything with sensitive business logic, proprietary algorithms, or customer data — it's a problem many developers quietly accept.
I built guIDE to be different.
The core promise: your code never leaves your machine
guIDE runs AI inference locally using llama.cpp under the hood. The models run on your CPU or GPU. No API calls. No outbound requests for inference. The code you write and the questions you ask stay on your machine.
This means:
- No API key needed — nothing to sign up for, no billing to manage
- Unlimited completions — your quota is your hardware, not a monthly token limit
- Works offline — plane mode, mountain cabin, corporate firewall — doesn't matter
- No latency from round-trips — response time depends on your hardware, not network conditions
What it can do
guIDE integrates with your editor (VS Code extension, with others planned) and provides:
- Inline completions — context-aware code suggestions as you type
- Explain code — highlight any block and get an explanation
- Refactor suggestions — "make this more readable", "extract to function", etc.
- Error analysis — paste a stack trace, get a diagnosis
- Chat interface — general-purpose coding questions
Which models does it support?
Any GGUF-format model that llama.cpp supports. In practice that means:
- Qwen 2.5 Coder (recommended, 7B and 14B variants)
- DeepSeek Coder
- Codestral (GGUF)
- Llama 3.2 / 3.3
- Mistral / Mixtral
- Phi-4
The 7B quantized models run well on a modern CPU. If you have a GPU with VRAM, you get dramatically faster responses.
The honest trade-offs
Local inference means you're bounded by your hardware. A 7B model running on a laptop CPU is noticeably slower than GPT-4o or Claude. The code quality ceiling is also lower than the frontier models.
But for the use case of "I need a reliable AI pair programmer that I own completely and can use without any external dependencies" — local models have gotten genuinely good. Qwen 2.5 Coder 14B in particular is impressive for its size.
Try it
guIDE runs on Mac and Windows. Download at graysoft.dev.
If you're tired of your code leaving your machine every time you want a suggestion, give it a try.
Built with: Electron, llama.cpp, VS Code Extension API, GGUF model support
Top comments (0)