DEV Community

Cover image for Building a Local AI Chatbot with Gemma 4 and Java
Sahaj Gupta
Sahaj Gupta

Posted on

Building a Local AI Chatbot with Gemma 4 and Java

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

Building a Local AI Chatbot with Gemma 4 and Java

Recently, I started exploring local AI models because I wanted to understand how modern LLMs actually work beyond cloud APIs. Most tutorials I found online were focused on Python, but since I usually work with Java projects, I wanted to try something different.

So I decided to build a simple local AI chatbot using Gemma 4, Ollama, and Spring Boot.

Honestly, I expected the setup to be complicated, but it turned out to be much simpler than I thought.

In this tutorial, I’ll show exactly how I got everything working on my laptop.

Why I Wanted to Try Local AI

Most AI applications today depend heavily on cloud APIs. While that works well, there are still some drawbacks:

  • Internet dependency
  • API usage limits
  • Extra costs
  • Privacy concerns
  • Latency issues sometimes

I wanted to see whether a lightweight local model could still generate useful responses without depending completely on external services.

That’s where Gemma came in.

What is Gemma 4?

Google released Gemma as a family of lightweight open AI models designed for developers and researchers.

One thing I liked about Gemma is that it can run locally on consumer hardware without requiring expensive cloud infrastructure.

That makes it useful for:

  • Learning AI development
  • Offline experimentation
  • Personal projects
  • Private applications

Of course, local models are not as powerful as massive cloud-hosted models, but they are surprisingly capable for development and testing.

Tools Used

For this project, I used:

  • Ollama
  • Gemma 4
  • Spring Boot
  • Java 21
  • IntelliJ IDEA

My Laptop Specs

  • 16 GB RAM
  • Intel i5 processor
  • No dedicated GPU

Even without a GPU, the model still worked reasonably well for small prompts.

Step 1: Install Ollama

Ollama makes running local LLMs extremely simple.

After installing Ollama, open the terminal and verify the installation:

ollama --version
Enter fullscreen mode Exit fullscreen mode

If everything is installed correctly, you should see the installed version.

Step 2: Download and Run Gemma

Now download the Gemma model locally.

ollama run gemma3
Enter fullscreen mode Exit fullscreen mode

The first download took several minutes on my system because the model files are quite large.

After downloading, Ollama automatically starts the local model server.

This was honestly the easiest part of the setup.

Step 3: Create the Spring Boot Project

I created a simple Spring Boot application using:

  • Spring Web
  • Lombok

Project structure:

src
 └── main
      └── java
           └── chatbot
                ├── controller
                ├── service
                └── ChatbotApplication.java
Enter fullscreen mode Exit fullscreen mode

Nothing complicated here.


Step 4: Connecting Java with Gemma

What surprised me most was how easy the integration was.

Ollama exposes a local REST API, so Java can directly communicate with the model.

I created the following service class:

@Service
public class ChatService {

    private final RestTemplate restTemplate =
            new RestTemplate();

    public String askAI(String prompt) {

        String url =
                "http://localhost:11434/api/generate";

        String requestBody = """
        {
          "model": "gemma3",
          "prompt": "%s",
          "stream": false
        }
        """.formatted(prompt);

        HttpHeaders headers = new HttpHeaders();
        headers.setContentType(MediaType.APPLICATION_JSON);

        HttpEntity<String> entity =
                new HttpEntity<>(requestBody, headers);

        ResponseEntity<String> response =
                restTemplate.postForEntity(
                        url,
                        entity,
                        String.class
                );

        return response.getBody();
    }
}
Enter fullscreen mode Exit fullscreen mode

This sends prompts to Gemma running locally and returns the generated response.

Step 5: Create the Controller

Next, I added a REST endpoint.

@RestController
@RequestMapping("/chat")
public class ChatController {

    @Autowired
    private ChatService chatService;

    @GetMapping
    public String chat(@RequestParam String prompt) {
        return chatService.askAI(prompt);
    }
}
Enter fullscreen mode Exit fullscreen mode

Now the chatbot can be accessed directly from the browser:

http://localhost:8080/chat?prompt=Explain+Java+Threads
Enter fullscreen mode Exit fullscreen mode

First Problem I Faced

Initially, the API was not responding.

After wasting almost 15 minutes debugging the Java code, I realized Ollama was not running in the background.

Running this command fixed the issue:

ollama serve
Enter fullscreen mode Exit fullscreen mode

After that, everything started working properly.

Testing the Chatbot

I tested several prompts like:

Explain operating system deadlocks in simple words.
Enter fullscreen mode Exit fullscreen mode
Write a short Java multithreading example.
Enter fullscreen mode Exit fullscreen mode
What is the difference between stack and heap memory?
Enter fullscreen mode Exit fullscreen mode

The responses were actually much better than I expected from a local model.

Short prompts generated responses within a few seconds on my laptop.

Longer prompts were slower, but still usable.

What I Learned During This Project

1. Local AI feels more flexible

Since everything runs on your own machine, experimentation becomes easier.

You can test prompts freely without worrying about API usage limits.

2. Prompt quality matters a lot

Small wording changes can significantly improve responses.

For example:

  • vague prompts → average output
  • specific prompts → much better output

3. Hardware still matters

The biggest limitation is hardware.

Larger models need:

  • More RAM
  • Better CPU/GPU
  • More storage

Smaller models perform much better on normal laptops.

4. Java integration is easier than expected

Before this project, I assumed AI integration would mostly require Python.

But since Ollama exposes a REST API, integrating with Java applications is actually very straightforward.

Performance on My Laptop

Here’s what I observed:

  • Short prompts: 2–5 seconds
  • Longer prompts: 10–20 seconds
  • RAM usage increased noticeably during generation

Even without a dedicated GPU, the project still worked decently for learning purposes.

Possible Improvements

Some things I want to try next:

  • Add chat history
  • Create a frontend UI
  • Stream responses in real time
  • Store conversations in a database
  • Add PDF summarization
  • Experiment with other Gemma variants

Final Thoughts

Before trying this project, I thought local AI development would be difficult and resource-heavy.

But tools like Ollama make the setup surprisingly beginner-friendly.

Gemma turned out to be a solid starting point for experimenting with local LLMs, especially for developers who want more control and privacy.

This project also changed my perspective on Java AI integration. I expected the process to be much harder, but using a simple REST API made everything manageable.

If you are a Java developer interested in AI, building a small local chatbot is honestly one of the best ways to start experimenting.

Useful Links

Tags

#ai
#java
#machinelearning
#gemma
#tutorial
Enter fullscreen mode Exit fullscreen mode

Top comments (0)