Understanding Key Concepts of Large Language Models (LLMs) before delving into spring AI

Before diving into the world of AI within Spring Boot, it's essential to grasp some fundamental concepts. Especially before referring to any Spring Boot AI documentation, it's helpful to understand these basics. In this guide, I'll explain these ideas in the way that I understand them, and I hope it proves useful to anyone reading.
Let's start with the core technology behind many text-based AI chat tools, including image generation systems - LLMs (Large Language Models). According to IBM, "Large Language Models (LLMs) are a category of foundation models trained on vast amounts of data, enabling them to understand and generate natural language and other types of content for a wide range of tasks." So, how do we integrate them into projects using Spring Boot AI? Before answering that, let's discuss some important points about LLMs that will provide a deeper understanding.
The first thing to know is that LLMs have a training data cutoff date. This means that the model is trained only up until a specific point in time, and it won't be aware of anything that happened after that date. To address this limitation, several solutions have been developed, including:

Prompt Stuffing
RAG (Retrieval Augmented Generation)
Function Calling
Fine-tuning

Before diving into these solutions, let's talk about tokens. In the world of LLMs, tokens are essentially the currency. LLMs process tokens rather than raw words, so everything inputted has to be converted into tokens. Every prompt sent to an LLM gets converted into tokens, and there is a limit to how many tokens an LLM can handle in a single request.
Now, let's explore the solutions mentioned earlier:

1. Prompt Stuffing:

This involves adding relevant information along with the user's question. For example, if a user asks about the dosage of a specific medication, you include additional data about that medication in the prompt. This allows the LLM to refer to the provided information when formulating its response.

2. RAG (Retrieval Augmented Generation):

This technique helps overcome the token limit in LLMs. Suppose a user wants to ask a question about a 900-page book, but the LLM has a token limit of 90,000 tokens (approximately 120,000 tokens for a 900-page book). It's impossible to stuff the entire content into the prompt. Enter embedding, a method of converting digital content into vectors, which are then stored in a vector database. This type of database, unlike traditional ones, performs similarity searches. When a question is asked, relevant data is retrieved from the vector database using similarity search, and this data is used in the prompt to help the LLM generate an answer.

3. Function Calling:

In this approach, the LLM is provided with several functions it can call upon to answer user queries. When the LLM encounters a question it cannot answer with its trained data, it calls an appropriate function to fetch the necessary information. The response from the function is then used to answer the question.

4. Fine-Tuning:

Fine-tuning involves training an already pre-trained LLM for a specific role or use case. This technique is mainly used by data scientists and, as of this writing, is not typically necessary when working with Spring Boot AI projects.

Conclusion

In summary, integrating AI into Spring Boot requires a solid grasp of the foundational concepts behind Large Language Models (LLMs) and their limitations. By understanding how LLMs process tokens, and recognizing their inherent training data cutoff, developers can leverage various techniques like Prompt Stuffing, RAG, Function Calling, and Fine-Tuning to enhance their models' performance. While some solutions like RAG and embedding are crucial for overcoming token limits, others like Function Calling allow LLMs to extend beyond their training data. With this knowledge in hand, you'll be better prepared to navigate the complexities of building AI-powered systems within Spring Boot, ensuring more efficient and effective AI integrations.

DEV Community