Luke Liukonen

Posted on Dec 22, 2023 • Edited on Jan 4

Unleashing the Power of Developer AI: A Journey into Hosting a Private LLM/Code Assistant locally

#ai #privacy #development #programming

Introduction

In the ever-evolving landscape of software development, the quest for efficient coding tools led me to Twinny, a VSCode extension promising to bring GitHub Copilot-like capabilities to the local environment. Eager to play around with something that promises copilot-like capabilities without the cost, I set out to host this private GitHub Copilot alternative on my machines. I'll take you through the highs and lows, the challenges faced, and the eventual win on this fascinating journey.

As a note. I am not affiliated with the twinny project in any way.

Discovery

When I first encountered Twinny, the prospect of having GitHub Copilot's insights, right within the confines of my local development environment, was, well, a bit mind-blowing. The promise of streamlined coding, tailored to my needs, I had to see extension had to offer.

The Hardware Landscape: Two Machines, Two Stories

Machine 1 - Framework Laptop

Processor: 11th Gen Intel Core i7-1165G7 @ 2.80GHz
Memory: 32GB RAM
Storage: 1TB NVMe SSD
Graphics: Onboard Graphics
OS: Windows 11
Containerization: Rancher Desktop over Docker

Machine 2 - Custom-built Desktop

Processor: AMD Ryzen 5 5600X 6-Core
Memory: 32GB RAM
Storage: 1TB SSD
Graphics: NVIDIA GeForce RTX 3060 TI
OS: Windows 11
Containerization: Docker Desktop

The Pros and Cons: Navigating the Local Deployment Terrain

Advantages of Local Deployment:

Cost Efficiency: Running GitHub Copilot-like capabilities without incurring cloud costs was a game-changer.
Privacy Control: Keeping code and data on-premises provided an added layer of security.
Customization: While I'm stuck using Ollama as my backend/host for LLMs, there are more LLMs available than 1.

Disadvantages of Local Deployment:

Hardware Requirements: The demand for powerful GPUs for optimal performance proved to be a consideration.
Setup Complexity: While not too bad, having to have an LLM running all the time and the complexity of a client/server model is a bit more than just installing a plugin
Maintenance Responsibility: Regular updates and maintenance became my responsibility, adding a layer of ownership.

Machine 1: No luck

While Ollama and the LLM were installed and running, when calling the API using Postman, it would just spin. Calling directly from within the terminal appeared to work, but was really, really slow. In my IDE, I would just see a spinning wheel where my twinny icon should appear. My CPU would spike, and that's it.

A Glimpse Into Success on Machine 2

Not surprisingly, it was on my gaming PC, a custom-built powerhouse with an AMD Ryzen processor and NVIDIA GeForce RTX 3060 TI, where Twinny truly came to life. The performance boost, notably attributed to GPU acceleration, turned what initially seemed like a challenge into a success story.

Key Observations:

GPU acceleration played a pivotal role in Twinny's optimal performance.
Comparisons of Docker Desktop configurations between machines shed light on factors influencing responsiveness.

Installation Instructions: (Windows)

get the backend running
- Have a container-based system or WSL. I prefer Containers since they are easy to spin up or down
- as you saw above, I have Rancher Desktop and Docker desktop.
- Run the following command to download and start up your container... the gpus=all is needed to access my NVidia GPU. My Framework laptop however did not have the same luck. As a side, for my Framework laptop, I did try both with and without this flag.

  docker run --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollamagpu ollama/ollama

Run the following command to install the LLM needed for Twinny

docker exec -it ollama ollama run codellama:7b-code

I validated things "should work" by running the following curl command

  curl --location 'http://localhost:11434/api/chat' \
--header 'Content-Type: application/json' \
--data '{
  "model": "codellama:7b-code",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'

Install the extension link

From there in the bottom right-hand corner, you should see an extension for Twinny. As you start typing, hopefully, you'll see IntelliSense kick in with automatic code-generation recommendations.

Conclusion: Unveiling the Potential

In conclusion, self-hosting a private "GitHub Copilot"-like tool locally with Twinny has been a bit of a journey with some definite learnings. The challenges faced on the Framework Laptop were met with at least a win on the gaming PC, showcasing the potential benefits and potential alternatives to the bigger players in the game. I'm excited about this proof of concept, and I like the idea that soon organizations and individuals will have the capability to run code assistants from within their organization and reduce the fear of 3rd parties getting a hold of their code.

Top comments (4)

Steven Fouracre • Sep 7 • Edited

If you use the AI app from SES (selfevolvingsoftware.com), this gives you the best of both worlds. Using Salesforce's infrastructure you don't need to have amazing powerful processors in your desktop computer and because you own and control access to your Salesforce environment, no organisation can have access, not even Salesforce. Even better, if your expensive AMD Ryzen 5 5600X 6-Core breaks, well you have to pay to replace it, whereas Salesforce pays for repairs and upgrades. SES can currently build in Apex, Java, C#, PHP, Python and Powershell. Most large companies use Salesforce so you don't need to pay additional Salesforce licenses, but even you don't currently have Salesforce you can get a free edition Salesforce license forever. If anyone would like to understand more about SES you can contact me at sfouracre@selfevolvingsoftware.com

Luke Liukonen • Nov 9

Since much of the point of this was to be able to self host the LLM, reducing dependencies on cloud services, this kinda misses the point, but something for me to look at later.

Alex so yes • Sep 7 • Edited

Do you still use this a year from now? :)

Luke Liukonen • Nov 9

I ran this for about 3 months. Since then, I've switched over to some free VSCode copilot-like extensions for personal development, but have this as a backup if these services ever decide to take away their free tiers or I need to have a LLM in any sort of professional environment (Using free products only for my pet projects at home that don't make any money. If I were to ever freelance, I'd go back to the self hosted route)

DEV Community