LLM Parameter fine tuning with Spring AI

#llm #genai #springboot #java

Ever wondered what it takes to be more creative while generating a story with LLM versus doing a technical task, answer lies in the parameters of LLM you play around, if you know what to pick and choose for a given activity results will be significantly different.

This tutorial explains how to “tune” the behavior of your Large Language Models (LLMs) using Spring AI.

It is important to clarify a technical distinction: while “finetuning” often refers to retraining a model on new data, most developers use the term to describe Inference Parameter Tuning — adjusting the “knobs” that control how a model generates text in real-time. Spring AI makes this easy through the ChatOptions interface.

1. Understanding the “Knobs”

Before diving into code, let’s understand the primary parameters you can control.

- Temperature: Controls randomness. A value of 0 makes the model deterministic (it will always pick the most likely word). A value of 1.0+ makes it highly creative and unpredictable.
- Top-P (Nucleus Sampling): The model considers only the tokens whose cumulative probability reaches the value P (e.g., 0.9). It’s more dynamic than Top-K.
- Top-K: The model only considers the top $K$ most likely next words. This “cuts the tail” of low-probability words.
- Frequency Penalty: Discourages the model from repeating the same words or phrases.
- Presence Penalty: Encourages the model to talk about new topics.

2. Spring AI Implementation

In Spring AI, you can set these parameters globally in your application.properties or per-request using ChatOptions.

Per-Request Configuration (Recommended)

This approach allows you to use different settings for different parts of your app (e.g., a “Creative Story” endpoint vs. a “Data Extraction” endpoint).

@RestController
public class ChatController {
private final ChatClient chatClient;
public ChatController(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
@GetMapping("/creative-chat")
public String creativeChat(@RequestParam String message) {
return chatClient.prompt()
.user(message)
.options(OpenAiChatOptions.builder()
.withTemperature(0.9f) // High creativity
.withTopP(0.9f) // Diverse vocabulary
.withMaxTokens(500) // Length limit
.build())
.call()
.content();
}
@GetMapping("/precise-chat")
public String preciseChat(@RequestParam String message) {
return chatClient.prompt()
.user(message)
.options(OpenAiChatOptions.builder()
.withTemperature(0.1f) // Low randomness
.withFrequencyPenalty(0.5f) // Prevent repetition
.build())
.call()
.content();
}
}

Global Configuration

If you want a consistent “vibe” across your entire application, use application.properties:
Properties

spring.ai.openai.chat.options.temperature=0.7
spring.ai.openai.chat.options.model=gpt-4o
spring.ai.openai.chat.options.top-p=1.0

3. Parameter Recommendations by Use Case

Choosing the right values depends entirely on your goal. Here is a guide for common scenarios:

4. Pro-Tips for Tuning

Don’t tweak both Temp and Top P: Most AI labs (like OpenAI) recommend adjusting either Temperature or Top-P, but not both at once, as they can conflict or produce erratic results.
Use 0 for JSON: If you are using Spring AI’s BeanOutputConverter to get structured data, set Temperature to 0. Hallucinations in JSON keys will break your code.

Frequency vs. Presence: Use Frequency Penalty if the model gets stuck repeating a specific word. Use Presence Penalty if the model keeps circling back to the same concept.

Connect with me at below email if you need a curated learning path for yourself in AI/Gen AI. It does not matter which stream or role you come from. There is a learning path for everyone.

connect@thinkhumble.in