The Problem: Cloud AI is Great, but Privacy is Greater
We all love GitHub Copilot, but let's be honest: in an enterprise environment, Privacy isn't just a buzzword—it's a legal requirement. Sending proprietary codebases to cloud servers is often a strict "no-go" for many companies.
I realized that we needed a bridge between the power of LLMs and the security of a local environment. That’s why I built Local LLM Plugin Modern for Visual Studio 2022.
What is it?
It's a powerful, modern, and highly optimized AI assistant extension. It seamlessly integrates local LLMs via Ollama and cloud-based models like OpenAI, Anthropic (Claude), and Google Gemini directly into your coding environment.
Whether you want to run DeepSeek or Llama 3 entirely offline or leverage GPT-4o for heavy reasoning, this extension offers a native-feeling dark theme experience that boosts your productivity without leaving your IDE.
Engineering Highlights (Built for Performance)
Instead of just "making it work," I rebuilt this extension to meet enterprise standards:
- Clean Architecture & MVVM: Separated UI, Core logic, and Infrastructure for maximum maintainability.
-
Dependency Injection: Utilizing
Microsoft.Extensions.DependencyInjectionto handle provider factories gracefully. -
Lightning Fast Text Injection: Utilizes Visual Studio's native
UndoContext. This ensures massive code blocks are applied instantly without freezing the editor and allows for a single-step Undo (Ctrl+Z). -
Memory Efficiency: Optimized handling of large text blocks using
StringBuilderandStringReaderto prevent memory leaks.
Features that Boost DX (Developer Experience)
- Multi-Provider AI Support: Switch seamlessly between Ollama, OpenAI, Claude, and Gemini.
- Partial Selection Injection: Select a specific part of the AI's response to inject only that portion into your code, ignoring unnecessary conversational filler.
- Smart Defaults: Automatically configures API URLs and model names based on the selected provider.
- Advanced Text Manipulation Tools: Includes built-in tools to remove duplicates, modify and replicate variables, or batch-erase specific words.
- Native Dark Theme: A sleek interface that perfectly matches Visual Studio 2022's native dark mode.
Keyboard Shortcuts (Power User Friendly)
The UI is entirely optimized for keyboard-only navigation:
-
Ctrl + 1: Open the AI Assistant (Global VS Shortcut). -
Enter: Send your prompt to the AI. -
Shift + Enter: Apply the response directly to your editor and close the window. -
Esc: Cancel and close the assistant.
How to Get It?
You don't even need a link. Just open your Visual Studio 2022, go to Extensions -> Manage Extensions, and search for:
"Local LLM Plugin Modern"
Open Source & Contributing
This project is fully open source under the MIT License. I believe the best tools are built by the community, for the community.
Check out the repo here: furkiak/visualStudioLocalLLMPlugin
I'd love to hear your thoughts! Are you moving towards local LLMs for coding, or is cloud AI still your go-to? Let's discuss in the comments!
Top comments (2)
The hybrid local/cloud approach is the right call here. I run Llama 3 locally for generating SEO content across 8,000+ stock ticker pages, and the privacy angle was actually secondary to the latency wins — batching thousands of prompts through a local model without hitting rate limits or per-token costs changed the economics of the whole project completely. Curious whether you've benchmarked the streaming token performance between Ollama-served Llama 3 and the cloud providers in your extension? In my experience, local inference on even a mid-range GPU outperforms cloud roundtrip for shorter completions, but falls off a cliff once you need heavy reasoning. That hybrid toggle you built could be really powerful if it auto-detected the complexity of the prompt and routed accordingly.
I haven't published formal benchmarks yet, primarily because local performance is so hardware-dependent (VRAM speed, GPU architecture, etc.). However, in my internal testing, I’ve seen exactly what you described. For 'short-burst' tasks like unit test generation or boilerplate C# properties, the roundtrip to a cloud provider is often the slowest part of the process. Ollama-served models on a decent GPU feel much more 'integrated' in terms of UI responsiveness.
Once the task requires high-level architectural reasoning or complex debugging across multiple files, the cloud models (like GPT or Claude) still hold a significant lead.
Implementing a lightweight 'router' to detect prompt complexity and suggest the best provider could be a game-changer for this extension. It's definitely going on my research roadmap!
Thanks for the feedback!