DEV Community

Cover image for Unleashing the Power of Developer AI: A Journey into Hosting a Private LLM/Code Assistant locally
Luke Liukonen
Luke Liukonen

Posted on • Updated on

Unleashing the Power of Developer AI: A Journey into Hosting a Private LLM/Code Assistant locally

Introduction

In the ever-evolving landscape of software development, the quest for efficient coding tools led me to Twinny, a VSCode extension promising to bring GitHub Copilot-like capabilities to the local environment. Eager to play around with something that promises copilot-like capabilities without the cost, I set out to host this private GitHub Copilot alternative on my machines. I'll take you through the highs and lows, the challenges faced, and the eventual win on this fascinating journey.

As a note. I am not affiliated with the twinny project in any way.

Discovery

When I first encountered Twinny, the prospect of having GitHub Copilot's insights, right within the confines of my local development environment, was, well, a bit mind-blowing. The promise of streamlined coding, tailored to my needs, I had to see extension had to offer.

The Hardware Landscape: Two Machines, Two Stories

Machine 1 - Framework Laptop

  • Processor: 11th Gen Intel Core i7-1165G7 @ 2.80GHz
  • Memory: 32GB RAM
  • Storage: 1TB NVMe SSD
  • Graphics: Onboard Graphics
  • OS: Windows 11
  • Containerization: Rancher Desktop over Docker

Machine 2 - Custom-built Desktop

  • Processor: AMD Ryzen 5 5600X 6-Core
  • Memory: 32GB RAM
  • Storage: 1TB SSD
  • Graphics: NVIDIA GeForce RTX 3060 TI
  • OS: Windows 11
  • Containerization: Docker Desktop

The Pros and Cons: Navigating the Local Deployment Terrain

Advantages of Local Deployment:

  1. Cost Efficiency: Running GitHub Copilot-like capabilities without incurring cloud costs was a game-changer.
  2. Privacy Control: Keeping code and data on-premises provided an added layer of security.
  3. Customization: While I'm stuck using Ollama as my backend/host for LLMs, there are more LLMs available than 1.

Disadvantages of Local Deployment:

  1. Hardware Requirements: The demand for powerful GPUs for optimal performance proved to be a consideration.
  2. Setup Complexity: While not too bad, having to have an LLM running all the time and the complexity of a client/server model is a bit more than just installing a plugin
  3. Maintenance Responsibility: Regular updates and maintenance became my responsibility, adding a layer of ownership.

Machine 1: No luck

While Ollama and the LLM were installed and running, when calling the API using Postman, it would just spin. Calling directly from within the terminal appeared to work, but was really, really slow. In my IDE, I would just see a spinning wheel where my twinny icon should appear. My CPU would spike, and that's it.

A Glimpse Into Success on Machine 2

Not surprisingly, it was on my gaming PC, a custom-built powerhouse with an AMD Ryzen processor and NVIDIA GeForce RTX 3060 TI, where Twinny truly came to life. The performance boost, notably attributed to GPU acceleration, turned what initially seemed like a challenge into a success story.

Key Observations:

  • GPU acceleration played a pivotal role in Twinny's optimal performance.
  • Comparisons of Docker Desktop configurations between machines shed light on factors influencing responsiveness.

Installation Instructions: (Windows)

  1. get the backend running
    • Have a container-based system or WSL. I prefer Containers since they are easy to spin up or down
    • as you saw above, I have Rancher Desktop and Docker desktop.
    • Run the following command to download and start up your container... the gpus=all is needed to access my NVidia GPU. My Framework laptop however did not have the same luck. As a side, for my Framework laptop, I did try both with and without this flag.
  docker run --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollamagpu ollama/ollama
Enter fullscreen mode Exit fullscreen mode
  • Run the following command to install the LLM needed for Twinny
docker exec -it ollama ollama run codellama:7b-code
Enter fullscreen mode Exit fullscreen mode
  • I validated things "should work" by running the following curl command
  curl --location 'http://localhost:11434/api/chat' \
--header 'Content-Type: application/json' \
--data '{
  "model": "codellama:7b-code",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'
Enter fullscreen mode Exit fullscreen mode
  1. Install the extension link

From there in the bottom right-hand corner, you should see an extension for Twinny. As you start typing, hopefully, you'll see IntelliSense kick in with automatic code-generation recommendations.

Conclusion: Unveiling the Potential

In conclusion, self-hosting a private "GitHub Copilot"-like tool locally with Twinny has been a bit of a journey with some definite learnings. The challenges faced on the Framework Laptop were met with at least a win on the gaming PC, showcasing the potential benefits and potential alternatives to the bigger players in the game. I'm excited about this proof of concept, and I like the idea that soon organizations and individuals will have the capability to run code assistants from within their organization and reduce the fear of 3rd parties getting a hold of their code.

Top comments (0)