Why I Built a Privacy-First AI Assistant for Visual Studio 2022 (Goodbye Cloud-Only Copilots!)

#ai #opensource #csharp #productivity

The Problem: Cloud AI is Great, but Privacy is Greater

We all love GitHub Copilot, but let's be honest: in an enterprise environment, Privacy isn't just a buzzword—it's a legal requirement. Sending proprietary codebases to cloud servers is often a strict "no-go" for many companies.

I realized that we needed a bridge between the power of LLMs and the security of a local environment. That’s why I built Local LLM Plugin Modern for Visual Studio 2022.

What is it?

It's a powerful, modern, and highly optimized AI assistant extension. It seamlessly integrates local LLMs via Ollama and cloud-based models like OpenAI, Anthropic (Claude), and Google Gemini directly into your coding environment.

Whether you want to run DeepSeek or Llama 3 entirely offline or leverage GPT-4o for heavy reasoning, this extension offers a native-feeling dark theme experience that boosts your productivity without leaving your IDE.

Engineering Highlights (Built for Performance)

Instead of just "making it work," I rebuilt this extension to meet enterprise standards:

Clean Architecture & MVVM: Separated UI, Core logic, and Infrastructure for maximum maintainability.
Dependency Injection: Utilizing Microsoft.Extensions.DependencyInjection to handle provider factories gracefully.
Lightning Fast Text Injection: Utilizes Visual Studio's native UndoContext. This ensures massive code blocks are applied instantly without freezing the editor and allows for a single-step Undo (Ctrl+Z).
Memory Efficiency: Optimized handling of large text blocks using StringBuilder and StringReader to prevent memory leaks.

Features that Boost DX (Developer Experience)

Multi-Provider AI Support: Switch seamlessly between Ollama, OpenAI, Claude, and Gemini.
Partial Selection Injection: Select a specific part of the AI's response to inject only that portion into your code, ignoring unnecessary conversational filler.
Smart Defaults: Automatically configures API URLs and model names based on the selected provider.
Advanced Text Manipulation Tools: Includes built-in tools to remove duplicates, modify and replicate variables, or batch-erase specific words.
Native Dark Theme: A sleek interface that perfectly matches Visual Studio 2022's native dark mode.

Keyboard Shortcuts (Power User Friendly)

The UI is entirely optimized for keyboard-only navigation:

Ctrl + 1: Open the AI Assistant (Global VS Shortcut).
Enter: Send your prompt to the AI.
Shift + Enter: Apply the response directly to your editor and close the window.
Esc: Cancel and close the assistant.

How to Get It?

You don't even need a link. Just open your Visual Studio 2022, go to Extensions -> Manage Extensions, and search for:

"Local LLM Plugin Modern"

Open Source & Contributing

This project is fully open source under the MIT License. I believe the best tools are built by the community, for the community.

Check out the repo here: furkiak/visualStudioLocalLLMPlugin

I'd love to hear your thoughts! Are you moving towards local LLMs for coding, or is cloud AI still your go-to? Let's discuss in the comments!

visualstudio #csharp #ai #productivity #programming

Top comments (2)

Apex Stack • Mar 20

The hybrid local/cloud approach is the right call here. I run Llama 3 locally for generating SEO content across 8,000+ stock ticker pages, and the privacy angle was actually secondary to the latency wins — batching thousands of prompts through a local model without hitting rate limits or per-token costs changed the economics of the whole project completely. Curious whether you've benchmarked the streaming token performance between Ollama-served Llama 3 and the cloud providers in your extension? In my experience, local inference on even a mid-range GPU outperforms cloud roundtrip for shorter completions, but falls off a cliff once you need heavy reasoning. That hybrid toggle you built could be really powerful if it auto-detected the complexity of the prompt and routed accordingly.

Furkan Akça • Mar 20

I haven't published formal benchmarks yet, primarily because local performance is so hardware-dependent (VRAM speed, GPU architecture, etc.). However, in my internal testing, I’ve seen exactly what you described. For 'short-burst' tasks like unit test generation or boilerplate C# properties, the roundtrip to a cloud provider is often the slowest part of the process. Ollama-served models on a decent GPU feel much more 'integrated' in terms of UI responsiveness.

Once the task requires high-level architectural reasoning or complex debugging across multiple files, the cloud models (like GPT or Claude) still hold a significant lead.

Implementing a lightweight 'router' to detect prompt complexity and suggest the best provider could be a game-changer for this extension. It's definitely going on my research roadmap!

Thanks for the feedback!