DEV Community

Cover image for LLM Parameter fine tuning with Spring AI
Ved Sharma
Ved Sharma

Posted on

LLM Parameter fine tuning with Spring AI

Ever wondered what it takes to be more creative while generating a story with LLM versus doing a technical task, answer lies in the parameters of LLM you play around, if you know what to pick and choose for a given activity results will be significantly different.

This tutorial explains how to “tune” the behavior of your Large Language Models (LLMs) using Spring AI.

It is important to clarify a technical distinction: while “finetuning” often refers to retraining a model on new data, most developers use the term to describe Inference Parameter Tuning — adjusting the “knobs” that control how a model generates text in real-time. Spring AI makes this easy through the ChatOptions interface.

1. Understanding the “Knobs”

Before diving into code, let’s understand the primary parameters you can control.

- Temperature: Controls randomness. A value of 0 makes the model deterministic (it will always pick the most likely word). A value of 1.0+ makes it highly creative and unpredictable.
- Top-P (Nucleus Sampling): The model considers only the tokens whose cumulative probability reaches the value P (e.g., 0.9). It’s more dynamic than Top-K.
- Top-K: The model only considers the top $K$ most likely next words. This “cuts the tail” of low-probability words.
- Frequency Penalty: Discourages the model from repeating the same words or phrases.
- Presence Penalty: Encourages the model to talk about new topics.

2. Spring AI Implementation

In Spring AI, you can set these parameters globally in your application.properties or per-request using ChatOptions.

Per-Request Configuration (Recommended)

This approach allows you to use different settings for different parts of your app (e.g., a “Creative Story” endpoint vs. a “Data Extraction” endpoint).

@RestController
public class ChatController {
private final ChatClient chatClient;
public ChatController(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
@GetMapping("/creative-chat")
public String creativeChat(@RequestParam String message) {
return chatClient.prompt()
.user(message)
.options(OpenAiChatOptions.builder()
.withTemperature(0.9f) // High creativity
.withTopP(0.9f) // Diverse vocabulary
.withMaxTokens(500) // Length limit
.build())
.call()
.content();
}
@GetMapping("/precise-chat")
public String preciseChat(@RequestParam String message) {
return chatClient.prompt()
.user(message)
.options(OpenAiChatOptions.builder()
.withTemperature(0.1f) // Low randomness
.withFrequencyPenalty(0.5f) // Prevent repetition
.build())
.call()
.content();
}
}
Enter fullscreen mode Exit fullscreen mode

Global Configuration

If you want a consistent “vibe” across your entire application, use application.properties:
Properties

spring.ai.openai.chat.options.temperature=0.7
spring.ai.openai.chat.options.model=gpt-4o
spring.ai.openai.chat.options.top-p=1.0
Enter fullscreen mode Exit fullscreen mode

3. Parameter Recommendations by Use Case

Choosing the right values depends entirely on your goal. Here is a guide for common scenarios:

4. Pro-Tips for Tuning

  1. Don’t tweak both Temp and Top P: Most AI labs (like OpenAI) recommend adjusting either Temperature or Top-P, but not both at once, as they can conflict or produce erratic results.
  2. Use 0 for JSON: If you are using Spring AI’s BeanOutputConverter to get structured data, set Temperature to 0. Hallucinations in JSON keys will break your code.

Frequency vs. Presence: Use Frequency Penalty if the model gets stuck repeating a specific word. Use Presence Penalty if the model keeps circling back to the same concept.

Connect with me at below email if you need a curated learning path for yourself in AI/Gen AI. It does not matter which stream or role you come from. There is a learning path for everyone.

connect@thinkhumble.in

Top comments (0)