DEV Community

Tamás Bereczki
Tamás Bereczki

Posted on

Spring AI: How to use Generative AI and apply RAG?

Let’s dive into world of AI and investigate how Spring AI works and earn how to use an AI programmatically and generate some content with RAG method.

Generative AI models are powerful, but their knowledge is limited to the data they were trained on. So, how can we make them intelligent about our own specific documents or data? This is where the Retrieval-Augmented Generation (RAG) pattern comes in. In this article, I’ll guide you step-by-step through building a pet project that does exactly that, using a practical, code-first approach.

If you feel the need to learn first about what Artifical Intelligence means and how it is works under the hood, read this article, thank you:
https://dev.to/bereczki/beyond-the-buzzwords-how-generative-ai-really-works-bac

RAG technique workflow diagram (source: https://docs.spring.io)

Project idea

A kinda common idea came up in my mind, which is a usual pet project in universities or at home, create a movie database service (like an IMDB). But in this case, I am going to focus on how can I tune this movie database with using AI services.

Project: Customized media suggestion service
Purpose:
We should create a system, which can give personal suggestions on media contents regarding their interesting topics and previously checked contents.

Requirements

  • Data: Collect contents (movies data) and store them in vector database
  • User profile: Let’s assume, there are registered users in the system. And the system collects feedbacks from users to watched movies, which one was liked, which one was not liked.
  • RAG applying: When user login, or request for new suggestions, system will query the liked contents. These contents will be used to find another movies, which are similar to them. Vector database similarity search feature will be used there.
  • Generative AI will get these suggestions and summarize them in a personalized result.
  • Fine-tuning: Generative AI can sum up, why the suggested content is relevant for the user and provide a short description about why we think the suggested movie will be liked by them

Take aways

  • We can check how can RAG be used for personalization
  • Give possibility to learn how to vectorize content, store it and do similarity search.

Spring AI

What is it?
Spring Framework is a mature and well known tool for Java developers to build webserver. Spring has a lot of different tools which developers can use to provide solutions. A brand new tool is Spring AI, which gives an abstract layer to make easy operating with AI models.

For more information, check Spring AI documentation: https://docs.spring.io/spring-ai/reference/getting-started.html

Why I wanted to use Spring AI?

  • I mostly start new project with selecting Spring Boot framework
  • I would like to get familiar as soon as possible with new AI related technologies
  • Spring AI already has released version, which may have kinda stabil architecture.

Setup

Maven dependency:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-tika-document-reader</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-rag</artifactId>
</dependency>
Enter fullscreen mode Exit fullscreen mode

These dependencies are required for embedding and RAG plugin usage, the core AI dependencies will be imported by ollama spring dependency!

Ollama server

What is it?
Ollama is a user-friendly, open-source tool designed to simplify the process of running large language models (LLMs) locally on your computer. Ollama is enabling you to download, run, and interact with these models without relying on cloud-based services.

Most popular models, you can find:

  • deepseek-r1
  • mistral

How can it help me?
Ollama permits to use an LLM without subsribe to any cloud-based models (GPT from OpenAI, Gemini/VertexAI by Google, etc.), because it downloads model from their central repository or from Huggingface repository to local machine. After Ollama is a server, which provides API to operate with these models.

Setup (on Linux)
(This Linux may be Ubuntu inside WSL on Windows)

$ curl -fsSL https://ollama.com/install.sh | sh
$ ollama serve
Enter fullscreen mode Exit fullscreen mode

This will download, install and start the ollama server.

By default, Spring pulls not existing models at startup, however sometimes it fails due to timeout. In this case, you should pull model manually by ollama pull <model> command.

Maven dependency:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-autoconfigure-model-ollama</artifactId>
</dependency>
Enter fullscreen mode Exit fullscreen mode

Vector Database

What is a Vector Database?
A vector database is a specialized database designed to store, manage, and query data represented as numerical vectors. These vectors are mathematical representations of data objects (like text, images, or audio) that capture their semantic meaning or characteristics. Essentially, they allow computers to understand and compare data based on similarity rather than exact matches.

Why it is needed?
Beside that, the vector represent the semantic meaning of a data, in our case a movie.

Imagine that there is a movie description in JSON:

{
  "title": "The Godfather", 
  "genre": "Crime", 
  "actors": ["Marlon Brando", "Al Pacino"],
  "plot": "The aging patriarch of an organized crime dynasty transfers control of his clandestine empire to his reluctant son."
}
Enter fullscreen mode Exit fullscreen mode

During the embedding operation, from this JSON data a vector is created and stored into Vector Database. Then, when query by plot of “Mafia family”, then similarity search will more likely return “The Godfather” movie.

Summary:

  • We need Vector Database to store movie and vector for that movie
  • Provide similarity search function

What Vector Databases are available?
There are several database vendor who created a vector feature for the database engine and Spring AI supports a lot of them. Here are a list about, but check out in Spring AI guide for full list and related informations:

  • Apache Cassandra
  • Couchbase
  • Elasticseach
  • MariaDB
  • MongoDB Atlas
  • OpenSearch
  • Oracle Database
  • Postgres
  • Redis
  • and so on…

I choose Elasticsearch because previously I worked with it a lot and I don’t want to deep dive now into an unknown database engine.

Setup

  1. Start Elasticsearch server with Docker:
$ docker run -d --name elasticsearch --net somenetwork -p 9200:9200 \
 -p 9300:9300 -e "discovery.type=single-node" -e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:9.0.0
Enter fullscreen mode Exit fullscreen mode

This is going to start Elasticsearch server and expose it on 9200 port. If you found any issue, please check their Docker description.

  1. Configurate Elasticsearch vector store in Spring application.yaml properties file:
spring:
  elasticsearch:
    uris: http://localhost:9200
  ai:    
    vectorstore:
      elasticsearch:
        initialize-schema: true
        index-name: movies
        dimensions: 1024 # vector dimension which depends on selected embedding model, in case of 'mxbai-embed-large' is 1024
        similarity: cosine
Enter fullscreen mode Exit fullscreen mode

Take away:

  • We have a platform to build a solution for the defined project (Spring Framework✅)
  • We have tool to manage AI (Spring AI ✅)
  • We have LLM model to use for embedding and generating content (Ollama ✅)
  • We have Vector Database to store movies and perform similarity search (Elasticsearch ✅)

Selecting AI model

Spring AI framework has default configurations for embedding and generative features, which you can set up in application.yaml properties file.

spring:
  ai:
    ollama:
      embedding:
        options:
          model: mxbai-embed-large
      chat:
        model: mistral
Enter fullscreen mode Exit fullscreen mode

What is the difference between embedding and chat models?
This is a question regarding to fundation of AI, so first check with another question:
What is the difference between embedding and content generation?
Embedding is procedure when a document is vectorized, it is encoder type of AI.
Content generation is another type of AI — decoder — , when there is no input, but there is output.
(This is an oversimplification, but if it is not clear, please read article referenced at the beginning of this article)

So this is the difference, we can use same or different model for input (vectorization) and for generation (chat model).

Using AI Model for what?
AI model is going to use during vectorize Movies; and going to use to generate text content (response) for the user who requests movie suggestions. This is similar as well known chat models (GPT, Gemini, etc.).


Solution

Let’s see what every developer reader wants, the code itself how all these are resolved.

Note: The code quality is not the best, I know also! Because this project is just a Proof of Concept and to learn, clean code was not high priority. Thanks for understanding!

#1 Create Test Data

First things which we need at the beginning stage of solution:

  1. Design data layer which will suite for our solution
  2. Create Movie descriptions (id, title, year, genre, director, actors, plot)
  3. Create Users (id, username, name, email, age)
  4. Create movie ratings by users (id, userId, movieId, rating, comment, date)

Create Java models for them:

record Ratings(List<RatedMovie> ratings) {}
@JsonIgnoreProperties(ignoreUnknown = true)
record RatedMovie(String movieId, String userId, int rating, String dateRated) {}
record Movies(List<Movie> movies) {}
@JsonIgnoreProperties(ignoreUnknown = true)
record Movie(String id, String title, int year, String director,
  List<String> genre, List<String> actors) {}
Enter fullscreen mode Exit fullscreen mode

To generate test data, I use another AI which is provided in JetBrains IDEA Intellij, the Junie agent. I asked it to create json files into resources for movie, users and ratings regarding defined attributes. Junie successfully created the test data, step by step. It checked defined model classes and used them to declare required attributes, then ask permission to write files into resources folder and generates test data:

{
  "movies": [
    {
      "id": "movie-001",
      "title": "The Shawshank Redemption",
      "year": 1994,
      "genre": ["Drama"],
      "director": "Frank Darabont",
      "actors": ["Tim Robbins","Morgan Freeman","Bob Gunton"],
      "plot": "Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency."
    },
    {
      "id": "movie-002",
      "title": "The Godfather",
      "year": 1972,
      "genre": ["Crime","Drama"],
      "director": "Francis Ford Coppola",
      "actors": ["Marlon Brando","Al Pacino","James Caan"],
      "plot": "The aging patriarch of an organized crime dynasty transfers control of his clandestine empire to his reluctant son."
    },
    {
      "id": "movie-003",
      "title": "The Dark Knight",
      "year": 2008,
      "genre": ["Action","Crime","Drama"],
      "director": "Christopher Nolan",
      "actors": ["Christian Bale","Heath Ledger","Aaron Eckhart"],
      "plot": "When the menace known as the Joker wreaks havoc and chaos on the people of Gotham, Batman must accept one of the greatest psychological and physical tests of his ability to fight injustice."
    },
  ...
  ]
}
Enter fullscreen mode Exit fullscreen mode
[
  {
    "id": "user-001",
    "username": "movie_buff_42",
    "name": "John Smith",
    "email": "john.smith@example.com",
    "age": 28,
    "location": "New York, USA"
  },
  {
    "id": "user-002",
    "username": "cinema_lover",
    "name": "Emma Johnson",
    "email": "emma.j@example.com",
    "age": 34,
    "location": "Los Angeles, USA"
  },
  ...
]
Enter fullscreen mode Exit fullscreen mode
{
  "ratings": [
  {
    "id": "rating-001",
    "userId": "user-001",
    "movieId": "movie-001",
    "rating": 5,
    "comment": "Absolutely brilliant film. The performances are outstanding and the story is deeply moving.",
    "dateRated": "2023-01-15"
  },
  {
    "id": "rating-002",
    "userId": "user-001",
    "movieId": "movie-003",
    "rating": 5,
    "comment": "Heath Ledger's Joker is one of the greatest performances in cinema history.",
    "dateRated": "2023-02-03"
  },
  {
    "id": "rating-003",
    "userId": "user-002",
    "movieId": "movie-005",
    "rating": 4,
    "comment": "Mind-bending plot with amazing visuals. Nolan at his best.",
    "dateRated": "2023-01-22"
  },
  ...
  ]
}
Enter fullscreen mode Exit fullscreen mode

#2 Vectorize Movies

Define a TextSplitter bean implementation, which is going to be used during vectorize to split document into tokens.

@Configuration
public class RatingAiConfiguration {

    @Bean
    public TextSplitter textSplitter() {
        return new TokenTextSplitter();
    }
}
Enter fullscreen mode Exit fullscreen mode

Create service to add Movie document into Vector Database (MovieSuggestionService.java)

@Service
public class MovieSuggestionService {
    private final VectorStore vectorStore;

    public MovieSuggestionService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    public void add(List<Document> documents) {
        this.vectorStore.add(documents);
    }
}
Enter fullscreen mode Exit fullscreen mode

VectorStore is Spring AI interface which implemented by ElasticsearchVectorStore that is injected.

Initialize Vector Database with data, from resources files:

  1. read test movies,
  2. parse into Java model (Movies.class),
  3. use Json reader to create Document (Document is managed by Vector Databases)
  4. extend each document with movie identification and a randomized popularity index. During similarity search, these metadata informations can be used as filter to documents.
  5. split document into sequence of tokens
  6. store the test movies with vector representation
@Autowired
MovieSuggestionService movieSuggestionService;

@Autowired
TextSplitter textSplitter;

public void indexTestMovies() {
    ObjectMapper objectMapper = new ObjectMapper();

    var movies = objectMapper.readValue(
            AiRagApplication.class.getClassLoader().getResourceAsStream("movies.json"),
            Movies.class
    );

    for (Movie movie : movies.movies()) {
        JsonReader reader = new JsonReader(new ByteArrayResource(objectMapper.writeValueAsBytes(movie)));
        List<Document> documents = reader.read();
        documents.forEach(document -> {
            document.getMetadata().put("popularity", RandomUtils.insecure().randomInt(1, 6));
            document.getMetadata().put("movieId", movie.id());
            movieSuggestionService.add(textSplitter.split(documents));
        });
    }

    logger.info("done");
}
Enter fullscreen mode Exit fullscreen mode

See movies index in Elasticsearch with content and embedding:

Elasticsearch data after indexing document and vector


#3 Implement similarity search for RAG

Extend the MovieSuggestionService with search function, which needs of

  • prompt from ‘user’, which is the content to find similarity by.
  • expression filter, if additional filterin on document metadata is needed
  • SearchRequestOption, if needed to declare custom search algoritm, like similarity threshold for documents or the topK parameter, which limit results up to K number
@Service
public class MovieSuggestionService {
    public record SearchRequestOption(Double similarityThreshold, Integer topK) {
        }

    private final SearchRequestOption searchRequestOption = new SearchRequestOption(0.6, DEFAULT_TOP_K);

    public List<Document> search(String userPromptText, Filter.Expression filterExpression) {
        return search(userPromptText, filterExpression, this.searchRequestOption);
    }

    public List<Document> search(String userPromptText, Filter.Expression filterExpression, SearchRequestOption searchRequestOption) {
        SearchRequest.Builder searchRequestBuilder = SearchRequest.builder()
                .similarityThreshold(searchRequestOption.similarityThreshold())
                .topK(searchRequestOption.topK()).similarityThresholdAll();
        if (Objects.nonNull(userPromptText) && !userPromptText.isBlank()) {
            searchRequestBuilder.query(userPromptText);
        }
        if (Objects.nonNull(filterExpression)) {
            searchRequestBuilder.filterExpression(filterExpression);
        }
        return search(searchRequestBuilder.build());
    }

    private List<Document> search(SearchRequest searchRequest) {
        log.info("Search request: {}", searchRequest);
        return this.vectorStore.similaritySearch(searchRequest);
    }
}
Enter fullscreen mode Exit fullscreen mode

As soon as this is done, create a SuggestionRestController.java which will contain endpoint definition, but now we only implement there this similarity search function call.

@RestController
public class SuggestionRestController {
    @Autowired
    MovieSuggestionService movieSuggestionService;

    private List<Document> findSimilarMovies(byte[] referenceMovie) {
        return movieSuggestionService.search(
                new JsonReader(new ByteArrayResource(referenceMovie)).read().get(0).getText(),
                new Expression(ExpressionType.GTE, new Key("popularity"), new Value(4))
        );
    }
}
Enter fullscreen mode Exit fullscreen mode

#4 Create Movie Suggestion endpoint and configure Chat Client (with RAG)

Extend RatingAiConfiguration with a ChatClient bean. This client will have a system prompt which defines what is this ChatClient is for and generate content regarding to that.

@Configuration
public class RatingAiConfiguration {
    @Bean
    public ChatClient movieSuggestAi(ChatClient.Builder builder) {
        return builder.defaultSystem(
                        "You are a chat bot for movie suggestions. Use the provided movies suggest another " +
                                "ones to watch and write a interesting summary of the movie. You can append the provided " +
                                "movies with another ones which similar to them. Maximum 3 another movies you can suggest.")
                .build();
    }
}
Enter fullscreen mode Exit fullscreen mode

Go to SuggestionRestController to define suggest endpoint and implement that.

@RestController
public class SuggestionRestController {

  @Qualifier("movieSuggestAi")
  @Autowired
  ChatClient movieSuggestionGenAi;

  @GetMapping("/suggest")
  public String suggestMovies(@RequestParam String userId) throws IOException {
      List<byte[]> ratedMovie = queryUserRatedMovies(userId); // finds rated movies by user
      return movieSuggestionGenAi
                .prompt(Prompt.builder().content("Give some movie suggestions to watch.").build())
                .advisors(advisorSpec -> advisorSpec.advisors(movieSuggestionRag(ratedMovie)))
                .stream()
                .chatResponse()
                .getResult()
                .getOutput()
                .getText();
  }

  /**
  * Get movie documents which were rated by user (1-5). 
  * Do a similarity search by them to find movies similar to user liked.
  */
  private RetrievalAugmentationAdvisor movieSuggestionRag(List<byte[]> ratedMovie) {
      return RetrievalAugmentationAdvisor.builder()
              .documentRetriever(query ->
                      ratedMovie.stream()
                              .flatMap(movieBytes ->
                                      findSimilarMovies(movieBytes).stream()
                              )
                              .limit(3)
                              .toList()
              )
              .build();
  }
}
Enter fullscreen mode Exit fullscreen mode

Aaand that’s it, we have a ai based movie suggestion solution ready! ✨🎉


Testing suggestion endpoint

After starting Spring application on localhost, default port number is 8080, then you will able to send request to our defined movie suggestion endpoint.

Let’s get an existing user from test data: user-001

Send request (in Postman or cURL) to:
http://localhost:8080/suggest?userId=user-001

After a little time, the chat model starts to give back the response with suggestions by already liked movies of ‘user-001’.

Postman request/response of movie suggestion endpoint


Thank you for your attention, I hope I managed to share something useful by my experiences!
You can take a try too and have a nice day. 😊👋

Top comments (0)