Quick and easy local AI RAG setup with JetBrains IDE integration and browser UI

#ai #webdev #programming #productivity

The hardest question when trying to adopt a new technology or workflow is "How do I even start?". Searching for the answer to that question is now more difficult than ever. Everything AI related is, guess what..., polluted with AI generated slop. From fake AI generated YouTube tutorials, to AI generated blogs and search results. Of course, most of it is just pure slop that doesn't work.

Actual human personal experience is hard to find, so I'm here to share that with you. I'm going to post here my setup, from start to finish, just to provide an example of a working system. You are free to adapt each step to your own preferences.

IMPORTANT: All of the software listed here is free, open source and is run locally. Because I'm honestly sick of articles and tutorials that have one purpose: to lure you in to using their own cloud hosted subscription based tools.

Prerequisites and my hardware

Windows
Docker (optional, for web UI... can be skipped if you want to use local Python instead)

My hardware:

Ryzen 9 5950X
64GB DDR4
RTX 3080 12GB

Maybe listing hardware configuration is relevant, maybe it's not. I've just listed it to show that my 5 years old workhorse is still pulling its weight, despite being 2-3 generations older than the current hardware.

Install ollama

Website: https://ollama.com/

Open a PowerShell terminal and run:

irm https://ollama.com/install.ps1 | iex

Ollama is a free and open source application for downloading, managing and running LLMs locally. It has cloud features as well, but they are not mandatory.

Download and run your first local LLM model

In our example, we'll use Gemma4. Gemma4 is the latest open source model from Google that can be run locally. The complete list of models available to ollama is available here: https://ollama.com/search

In your terminal:

ollama run gemma4

This will both download and run Gemma4 locally (be ready for a multi-gigabyte download). Don't worry, the next time you run this command it will use the already downloaded model.

Default ollama port is 11434. When the model finishes downloading, test it by opening http://127.0.0.1:11434 . You should see a message "Ollama is running".

Local Web UI

To get a local web UI (that is very similar to ChatGPT) that supports Retrieval Augmented Generation (RAG), workflows and many other features, we'll use Open WebUI (https://github.com/open-webui/open-webui). Although it can be setup using locally installed Python, I've decided to try out their Docker image instead. Since I have an Nvidia card, I've used their Nvidia GPU supported docker image.

At the time of writing this article, the exact command is:

docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda

See Open WebUI's Github page or documentation to double-check if that command is still relevant.

Again, prepare yourself for a multi-gigabyte download. Once the Docker image is downloaded and started, you can access the web UI at http://127.0.0.1:3000 . It will prompt you to enter admin username and password for your local instance, so you are free to enter whatever you want.

JetBrains IDE integration (PHPStorm)

We'll use PHPStorm as an example. Open your settings and go to Tools > AI Assistant > Providers & API Keys. In the Thrid-paty AI providers section, select Ollama from the Provider dropdown. Provide the URL to your locally running ollama instance (the default is http://127.0.0.1:11434) and click on Test Connection.

This will add Ollama and any models available in it to your JetBrains AI Chat window:

To use local Ollama (and Gemma4 LLM) in other parts of the IDE (code completion, code generation,...) scroll down to the Model Assignemnt section of the same settings page and select your local LLM model (in our case, it's Ollama/gemma4:latest):