Building a local-first AI coding assistant with unlimited completions and zero cloud dependencies

#showdev #ai #programming #javascript

GitHub Copilot costs $10-19/month. Cursor is $20/month. Every AI coding assistant I know of has one thing in common: your code goes to a remote server for inference.

That bothers me more than the cost.

When you paste a function into Copilot to ask for a refactor, that code leaves your machine and goes to Microsoft's servers. For most hobby projects that's fine. For anything with sensitive business logic, proprietary algorithms, or customer data — it's a problem many developers quietly accept.

I built guIDE to be different.

The core promise: your code never leaves your machine

guIDE runs AI inference locally using llama.cpp under the hood. The models run on your CPU or GPU. No API calls. No outbound requests for inference. The code you write and the questions you ask stay on your machine.

This means:

No API key needed — nothing to sign up for, no billing to manage
Unlimited completions — your quota is your hardware, not a monthly token limit
Works offline — plane mode, mountain cabin, corporate firewall — doesn't matter
No latency from round-trips — response time depends on your hardware, not network conditions

What it can do

guIDE integrates with your editor (VS Code extension, with others planned) and provides:

Inline completions — context-aware code suggestions as you type
Explain code — highlight any block and get an explanation
Refactor suggestions — "make this more readable", "extract to function", etc.
Error analysis — paste a stack trace, get a diagnosis
Chat interface — general-purpose coding questions

Which models does it support?

Any GGUF-format model that llama.cpp supports. In practice that means:

Qwen 2.5 Coder (recommended, 7B and 14B variants)
DeepSeek Coder
Codestral (GGUF)
Llama 3.2 / 3.3
Mistral / Mixtral
Phi-4

The 7B quantized models run well on a modern CPU. If you have a GPU with VRAM, you get dramatically faster responses.

The honest trade-offs

Local inference means you're bounded by your hardware. A 7B model running on a laptop CPU is noticeably slower than GPT-4o or Claude. The code quality ceiling is also lower than the frontier models.

But for the use case of "I need a reliable AI pair programmer that I own completely and can use without any external dependencies" — local models have gotten genuinely good. Qwen 2.5 Coder 14B in particular is impressive for its size.

Try it

guIDE runs on Mac and Windows. Download at graysoft.dev.

If you're tired of your code leaving your machine every time you want a suggestion, give it a try.

Built with: Electron, llama.cpp, VS Code Extension API, GGUF model support

DEV Community