Ever wondered what it takes to be more creative while generating a story with LLM versus doing a technical task, answer lies in the parameters of LLM you play around, if you know what to pick and choose for a given activity results will be significantly different.
This tutorial explains how to “tune” the behavior of your Large Language Models (LLMs) using Spring AI.
It is important to clarify a technical distinction: while “finetuning” often refers to retraining a model on new data, most developers use the term to describe Inference Parameter Tuning — adjusting the “knobs” that control how a model generates text in real-time. Spring AI makes this easy through the ChatOptions interface.
1. Understanding the “Knobs”
Before diving into code, let’s understand the primary parameters you can control.
- Temperature: Controls randomness. A value of 0 makes the model deterministic (it will always pick the most likely word). A value of 1.0+ makes it highly creative and unpredictable.
- Top-P (Nucleus Sampling): The model considers only the tokens whose cumulative probability reaches the value P (e.g., 0.9). It’s more dynamic than Top-K.
- Top-K: The model only considers the top $K$ most likely next words. This “cuts the tail” of low-probability words.
- Frequency Penalty: Discourages the model from repeating the same words or phrases.
- Presence Penalty: Encourages the model to talk about new topics.
2. Spring AI Implementation
In Spring AI, you can set these parameters globally in your application.properties or per-request using ChatOptions.
Per-Request Configuration (Recommended)
This approach allows you to use different settings for different parts of your app (e.g., a “Creative Story” endpoint vs. a “Data Extraction” endpoint).
@RestController
public class ChatController {
private final ChatClient chatClient;
public ChatController(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
@GetMapping("/creative-chat")
public String creativeChat(@RequestParam String message) {
return chatClient.prompt()
.user(message)
.options(OpenAiChatOptions.builder()
.withTemperature(0.9f) // High creativity
.withTopP(0.9f) // Diverse vocabulary
.withMaxTokens(500) // Length limit
.build())
.call()
.content();
}
@GetMapping("/precise-chat")
public String preciseChat(@RequestParam String message) {
return chatClient.prompt()
.user(message)
.options(OpenAiChatOptions.builder()
.withTemperature(0.1f) // Low randomness
.withFrequencyPenalty(0.5f) // Prevent repetition
.build())
.call()
.content();
}
}
Global Configuration
If you want a consistent “vibe” across your entire application, use application.properties:
Properties
spring.ai.openai.chat.options.temperature=0.7
spring.ai.openai.chat.options.model=gpt-4o
spring.ai.openai.chat.options.top-p=1.0
3. Parameter Recommendations by Use Case
Choosing the right values depends entirely on your goal. Here is a guide for common scenarios:
4. Pro-Tips for Tuning
- Don’t tweak both Temp and Top P: Most AI labs (like OpenAI) recommend adjusting either Temperature or Top-P, but not both at once, as they can conflict or produce erratic results.
- Use 0 for JSON: If you are using Spring AI’s BeanOutputConverter to get structured data, set Temperature to 0. Hallucinations in JSON keys will break your code.
Frequency vs. Presence: Use Frequency Penalty if the model gets stuck repeating a specific word. Use Presence Penalty if the model keeps circling back to the same concept.
Connect with me at below email if you need a curated learning path for yourself in AI/Gen AI. It does not matter which stream or role you come from. There is a learning path for everyone.

Top comments (0)