DEV Community

Solon Framework
Solon Framework

Posted on

Solon 4.0 ChatModel: A Practical Guide to Building LLM-Powered Applications

If you've ever tried integrating a large language model (LLM) into a Java application, you've probably written a lot of boilerplate: HTTP clients, JSON parsing, streaming handling, session management. Solon 4.0's ChatModel abstracts all of that away with a clean, builder-oriented API.

In this guide, I'll walk through building real, working AI features using ChatModel — from a simple chat call to a streaming chatbot with conversation memory.


1. What Is ChatModel?

ChatModel (package org.noear.solon.ai.chat) is a unified LLM client in Solon's AI ecosystem. Instead of writing raw HTTP calls for different model providers, you use a single API that supports:

  • Synchronous calls — one-shot request, full response
  • Streaming calls — reactive streaming via Project Reactor (Flux<ChatResponse>)
  • Tool/Function Calling — let the LLM invoke your Java methods
  • Chat Sessions — automatic conversation memory
  • Multi-modal messages — text, images, audio
  • Dialect adaptation — works with OpenAI, Ollama, Anthropic, Gemini, DashScope, and more

The best part? It uses a dialect pattern — you point it at any compatible LLM endpoint, and it adapts automatically.


2. Setting Up

Add the dependency to your pom.xml (no parent POM needed — Solon works standalone):

<dependency>
    <groupId>org.noear</groupId>
    <artifactId>solon-ai</artifactId>
    <version>${solon.version}</version>
</dependency>
Enter fullscreen mode Exit fullscreen mode

This pulls in all built-in dialects (OpenAI, Ollama, Gemini, Anthropic, DashScope).


3. Configuration

3.1 Via YAML (Recommended)

solon.ai.chat:
  demo:
    apiUrl: "http://127.0.0.1:11434/api/chat"   # Full URL, not baseUrl
    provider: "ollama"                           # Dialect identifier
    model: "llama3.2"                            # Model name
    headers:
      x-demo: "demo1"
Enter fullscreen mode Exit fullscreen mode

Then create a @Bean to get a ready-to-use ChatModel:

import org.noear.solon.ai.chat.ChatConfig;
import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.annotation.Bean;
import org.noear.solon.annotation.Configuration;
import org.noear.solon.annotation.Inject;

@Configuration
public class AiConfig {
    @Bean
    public ChatModel chatModel(@Inject("${solon.ai.chat.demo}") ChatConfig config) {
        return ChatModel.of(config).build();
    }
}
Enter fullscreen mode Exit fullscreen mode

3.2 Programmatic Builder

Prefer code over config? Use the builder directly:

@Bean
public ChatModel chatModel() {
    return ChatModel.of("http://127.0.0.1:11434/api/chat")
            .standard("ollama")      // or .provider("ollama") pre-4.0
            .model("llama3.2")
            .timeout(Duration.ofSeconds(60))
            .build();
}
Enter fullscreen mode Exit fullscreen mode

3.3 Supported Model Providers

The standard (or provider) field selects the dialect:

Standard Example apiUrl Models
openai (default) https://api.openai.com/v1/chat/completions GPT, DeepSeek, Qwen, GLM, Kimi, etc.
ollama http://127.0.0.1:11434/api/chat Any local Ollama model
anthropic https://api.anthropic.com/v1/messages Claude
gemini https://generativelanguage.googleapis.com/v1beta/models/... Gemini
dashscope Aliyun DashScope endpoint Qwen (DashScope native)

4. Synchronous Calls (The Simple Way)

The most basic use case — send a prompt and get a full response:

import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.ai.chat.ChatResponse;
import org.noear.solon.annotation.Inject;
import org.noear.solon.annotation.Component;

@Component
public class ChatService {
    @Inject
    ChatModel chatModel;

    public String ask(String question) throws IOException {
        ChatResponse resp = chatModel.prompt(question).call();
        return resp.getMessage().getContent();
    }
}
Enter fullscreen mode Exit fullscreen mode

That's it. Three lines of business code.


5. Streaming Calls (Real-Time Responses)

For chatbots and assistants, streaming is essential. ChatModel returns a Reactor Flux<ChatResponse>:

import reactor.core.publisher.Flux;

public Flux<String> askStream(String question) throws IOException {
    return chatModel.prompt(question)
            .stream()
            .filter(ChatResponse::hasContent)       // skip empty chunks
            .map(resp -> resp.getMessage().getContent());
}
Enter fullscreen mode Exit fullscreen mode

You can then subscribe, or — if you're using Solon Web Reactive — return the Flux directly to an SSE endpoint:

import org.noear.solon.web.sse.SseEvent;
import org.noear.solon.annotation.Mapping;
import reactor.core.publisher.Flux;

@Mapping("/chat/stream")
public Flux<SseEvent> chatStream(String prompt) throws IOException {
    return chatModel.prompt(prompt)
            .stream()
            .filter(ChatResponse::hasContent)
            .map(resp -> new SseEvent()
                    .data(resp.getMessage().getContent()));
}
Enter fullscreen mode Exit fullscreen mode

The streaming protocol uses standard SSE (text/event-stream) or x-ndjson depending on the provider.


6. Conversation Memory with ChatSession

LLMs are stateless. To maintain conversation context, you need to pass history with each request. ChatSession handles this automatically.

6.1 Basic Session Usage

import org.noear.solon.ai.chat.ChatSession;
import org.noear.solon.ai.chat.session.InMemoryChatSession;

ChatSession session = InMemoryChatSession.builder()
        .sessionId("user-123")
        .maxMessages(10)     // keep last 10 turns
        .build();

// First turn
ChatResponse resp1 = chatModel.prompt("Hello!")
        .session(session)
        .call();

// Second turn — model remembers context
ChatResponse resp2 = chatModel.prompt("What did I just say?")
        .session(session)
        .call();
Enter fullscreen mode Exit fullscreen mode

6.2 Web Chat with Per-User Sessions

In a real web app, you'll want one session per user. Here's a controller that does exactly that:

import org.noear.solon.annotation.Controller;
import org.noear.solon.web.sse.SseEvent;
import reactor.core.publisher.Flux;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

@Controller
public class ChatController {
    @Inject
    ChatModel chatModel;

    final Map<String, ChatSession> sessionMap = new ConcurrentHashMap<>();

    @Mapping("/chat")
    public Flux<SseEvent> chat(String sessionId, String prompt) throws IOException {
        ChatSession session = sessionMap.computeIfAbsent(sessionId,
                k -> InMemoryChatSession.builder().sessionId(k).build());

        return chatModel.prompt(prompt)
                .session(session)
                .options(o -> o.systemPrompt("You are a helpful and friendly assistant."))
                .stream()
                .filter(ChatResponse::hasContent)
                .map(resp -> new SseEvent().data(resp.getMessage().getContent()));
    }
}
Enter fullscreen mode Exit fullscreen mode

6.3 Built-in Session Implementations

Implementation Storage Use Case
InMemoryChatSession Local Map Dev, single-node
FileChatSession File system CLI tools, desktop apps
RedisChatSession Redis Production, distributed

7. Fine-Tuning with ChatOptions

Control model behavior per-request with ChatOptions:

chatModel.prompt("Write a poem about Java")
        .options(o -> o
            .temperature(0.8)
            .max_tokens(500)
            .top_p(0.9)
            .systemPrompt("You are a creative poet."))
        .call();
Enter fullscreen mode Exit fullscreen mode

Common options include:

Method Description
temperature(val) Sampling temperature (0.0–2.0)
max_tokens(val) Max output tokens
top_p(val) Nucleus sampling
top_k(val) Top-K sampling
frequency_penalty(val) Reduce repetition
presence_penalty(val) Encourage new topics
tool_choice(val) Force tool use: none, auto, required, or tool name
systemPrompt(val) System message for this request
role(val) Agent role (v3.9.1+)
instruction(val) Agent instruction (v3.9.1+)

8. Multi-Message Prompts

Sometimes you need more than a simple string. Use Prompt and ChatMessage:

import org.noear.solon.ai.chat.Prompt;
import org.noear.solon.ai.chat.message.ChatMessage;

Prompt prompt = Prompt.of(
    ChatMessage.ofSystem("You translate English to French."),
    ChatMessage.ofUser("Hello, how are you?"),
    ChatMessage.ofAssistant("Bonjour, comment allez-vous?"),
    ChatMessage.ofUser("What is your name?")
);

ChatResponse resp = chatModel.prompt(prompt).call();
Enter fullscreen mode Exit fullscreen mode

9. Putting It All Together: A Practical Example

Let's build a simple knowledge-aware chatbot — the kind of RAG-lite pattern you see in real projects. This example uses ChatMessage.ofUserAugment() to inject context into the prompt:

import org.noear.solon.ai.chat.ChatModel;
import org.noear.solon.ai.chat.ChatResponse;
import org.noear.solon.ai.chat.message.ChatMessage;
import org.noear.solon.annotation.Component;
import org.noear.solon.annotation.Inject;

@Component
public class KnowledgeChatbot {
    @Inject
    ChatModel chatModel;

    public String answer(String question, String referenceContext) throws Exception {
        // Augment the user message with reference context
        ChatMessage augmented = ChatMessage.ofUserAugment(question, referenceContext);

        ChatResponse resp = chatModel.prompt(augmented)
                .options(o -> o
                    .temperature(0.3)
                    .systemPrompt("You are a knowledgeable assistant. Answer based on the provided references."))
                .call();

        return resp.getMessage().getContent();
    }
}
Enter fullscreen mode Exit fullscreen mode

This pattern — augment user input with context, then call the model — is the foundation of RAG (Retrieval-Augmented Generation) in Solon AI.


10. What's Next?

ChatModel is just the entry point. Solon AI also offers:

  • Tool Calling — define @ToolMapping methods the LLM can invoke
  • Talent System — reusable capability modules (Skill-like)
  • AgentsReActAgent and TeamAgent for multi-step reasoning
  • RAG — full pipeline with document loading, splitting, embedding, and retrieval
  • MCP Protocol — connect to MCP servers for external tools

For the full documentation, check out the official Solon AI guide:

👉 https://solon.noear.org/article/918 (Model construction)
👉 https://solon.noear.org/article/920 (API reference)


Have you tried integrating LLMs in Java? What's your biggest pain point? Let me know in the comments — I might cover it in the next post.

Top comments (0)