<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nguyen Phuc Hai</title>
    <description>The latest articles on DEV Community by Nguyen Phuc Hai (@nguyen_phuchai_b01cae130).</description>
    <link>https://dev.to/nguyen_phuchai_b01cae130</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1621371%2F68ee1056-2f11-44e3-bc41-7a84c12935a7.jpg</url>
      <title>DEV Community: Nguyen Phuc Hai</title>
      <link>https://dev.to/nguyen_phuchai_b01cae130</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nguyen_phuchai_b01cae130"/>
    <language>en</language>
    <item>
      <title>Building a Desktop AI Chat App for ChatGPT, Claude, Gemini &amp; Ollama</title>
      <dc:creator>Nguyen Phuc Hai</dc:creator>
      <pubDate>Tue, 17 Feb 2026 00:54:28 +0000</pubDate>
      <link>https://dev.to/nguyen_phuchai_b01cae130/building-a-desktop-ai-chat-app-for-chatgpt-claude-gemini-ollama-1abh</link>
      <guid>https://dev.to/nguyen_phuchai_b01cae130/building-a-desktop-ai-chat-app-for-chatgpt-claude-gemini-ollama-1abh</guid>
      <description>&lt;p&gt;Learn how to build an open-source desktop AI chat client that connects multiple AI providers in one application. This technical guide covers Kotlin Compose architecture, streaming responses, RAG implementation, and production patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Using ChatGPT, Claude, Gemini Means Opening Multiple Apps
&lt;/h2&gt;

&lt;p&gt;As developers, we've discovered that different AI models excel at different tasks. &lt;a href="https://askimo.chat/app/openai/" rel="noopener noreferrer"&gt;ChatGPT&lt;/a&gt; is great for general conversation and brainstorming. &lt;a href="https://askimo.chat/app/claude/" rel="noopener noreferrer"&gt;Claude&lt;/a&gt; works well for coding questions and technical analysis. &lt;a href="https://askimo.chat/app/gemini/" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt; handles multimodal tasks with images and documents. &lt;a href="https://askimo.chat/app/ollama/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; gives you free, unlimited access to open-source models without subscription limits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But here's the frustration:&lt;/strong&gt; Each of these requires a different web application, a different account, a different browser tab.&lt;/p&gt;

&lt;p&gt;The modern AI workflow reality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ChatGPT web app&lt;/strong&gt; open for general questions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude web app&lt;/strong&gt; open for coding help&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google AI Studio&lt;/strong&gt; open for multimodal tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama command line&lt;/strong&gt; running locally for experimentation without API costs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Constant context switching&lt;/strong&gt; between different interfaces, keyboard shortcuts, and UX patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fragmented conversation history&lt;/strong&gt; - your coding discussion with Claude is separate from your general brainstorming in ChatGPT&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No unified search&lt;/strong&gt; - can't search across all your AI conversations in one place&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple subscriptions&lt;/strong&gt; - managing different payment plans and free tier limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What if you could have &lt;strong&gt;one desktop application&lt;/strong&gt; that works with ChatGPT, Claude, Gemini, Ollama, and any other AI provider - letting you choose the best model for each task without switching apps?&lt;/p&gt;

&lt;p&gt;That's why we built &lt;strong&gt;&lt;a href="https://askimo.chat/app/" rel="noopener noreferrer"&gt;Askimo&lt;/a&gt;&lt;/strong&gt; - an open-source desktop AI chat client built with Kotlin and Compose for Desktop. You can &lt;a href="https://askimo.chat/download/" rel="noopener noreferrer"&gt;download it for free&lt;/a&gt; for macOS, Windows, and Linux.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Desktop? Why Not Another Web App?
&lt;/h2&gt;

&lt;p&gt;Before diving into the technical implementation, let's address the elephant in the room: &lt;strong&gt;Why build a desktop app in 2026?&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Desktop Advantages for AI Chat
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Zero Infrastructure&lt;/strong&gt;: Just download and run. No server to set up, no deployment, no hosting costs. Open the app and start chatting.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Persistent State&lt;/strong&gt;: Desktop apps don't lose state when you close a tab. Chat in up to 20 tabs simultaneously - more than enough for any workflow - and they all stay exactly where you left them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;True Privacy&lt;/strong&gt;: Local-first architecture means conversations never leave your machine unless you explicitly send them to an AI provider.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Native Performance&lt;/strong&gt;: No browser overhead. Direct access to system resources for faster rendering and lower memory usage (50-300 MB vs 500+ MB for browser tabs).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Offline Capability&lt;/strong&gt;: Read past conversations, search history, and manage projects - all without internet.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;System Integration&lt;/strong&gt;: Deep OS integration for keyboard shortcuts, native notifications, and file system access.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Why Kotlin + Compose for Desktop?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Modern declarative UI&lt;/strong&gt; with Compose's reactive paradigm&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared code&lt;/strong&gt; between &lt;a href="https://askimo.chat/cli/" rel="noopener noreferrer"&gt;CLI&lt;/a&gt; and desktop modules&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coroutines&lt;/strong&gt; for elegant async/concurrent programming&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Type safety&lt;/strong&gt; that prevents entire classes of runtime errors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mature ecosystem&lt;/strong&gt; with LangChain4j for AI integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Strategic advantage: Code reuse for mobile apps&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Choosing Kotlin and Compose Multiplatform gives us a significant long-term benefit: &lt;strong&gt;when we expand to mobile (iOS/Android), we can reuse 60-80% of our codebase.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The same business logic that powers the desktop app can power mobile:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Session management&lt;/strong&gt; - Same conversation state management across all platforms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI provider integrations&lt;/strong&gt; - OpenAI, Anthropic, Ollama clients work identically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming handling&lt;/strong&gt; - The concurrent stream management we built for desktop works on mobile&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database layer&lt;/strong&gt; - SQLite-based storage runs on all platforms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Markdown rendering&lt;/strong&gt; - Custom renderer works on iOS and Android without changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG pipeline&lt;/strong&gt; - Document processing and embedding logic is platform-agnostic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Only the UI layer needs platform-specific adaptation - and even there, Compose Multiplatform lets us share UI components with platform-specific tweaks.&lt;br&gt;
&lt;strong&gt;Compare this to the web app path:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Web → Mobile means rebuilding everything in Swift/Kotlin or using slower hybrid frameworks&lt;/li&gt;
&lt;li&gt;Desktop Electron → Mobile means completely separate codebases&lt;/li&gt;
&lt;li&gt;Native from the start → Future mobile apps share the same proven, battle-tested core&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And this isn't some experimental tech we're betting on - Compose Multiplatform is already battle-tested in production by companies like JetBrains and Netflix. So when we decide to ship mobile apps, we won't be starting from scratch. All the tricky stuff - session management, streaming handlers, RAG pipelines - will already work. We'll just need to adapt the UI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Desktop first, but with mobile in our back pocket for later.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Trade-offs: More Effort, Better Control
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Let's be honest: building a native desktop app requires significantly more effort than a web app.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you build for the web, the browser gives you useful tools for free:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Markdown rendering&lt;/strong&gt; - Just use a library like &lt;code&gt;marked.js&lt;/code&gt; and let the browser's HTML engine handle it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Syntax highlighting&lt;/strong&gt; - Drop in Prism.js or highlight.js&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Charts and visualizations&lt;/strong&gt; - Chart.js, D3.js, countless options&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File handling&lt;/strong&gt; - Browser APIs abstract the complexity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-platform rendering&lt;/strong&gt; - Write once, runs everywhere with the same look&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For a native desktop app, we had to build all of this ourselves:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Custom Markdown Rendering&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implemented a CommonMark parser in Kotlin&lt;/li&gt;
&lt;li&gt;Built custom rendering logic for code blocks, tables, lists&lt;/li&gt;
&lt;li&gt;Created syntax highlighting integration for 50+ programming languages&lt;/li&gt;
&lt;li&gt;No browser HTML engine to fall back on&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Platform-Specific Challenges&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;File system access differs on macOS, Windows, and Linux&lt;/li&gt;
&lt;li&gt;Window management and keyboard shortcuts need OS-specific handling&lt;/li&gt;
&lt;li&gt;Native menus and notifications require platform adapters&lt;/li&gt;
&lt;li&gt;Different packaging systems for each OS (DMG, MSI, DEB/RPM)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Custom UI Components&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Built chart rendering using Compose Canvas APIs&lt;/li&gt;
&lt;li&gt;Implemented custom text editors with syntax highlighting&lt;/li&gt;
&lt;li&gt;Created scrollable containers with proper touch/mouse handling&lt;/li&gt;
&lt;li&gt;Designed responsive layouts without CSS flexbox&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Resource Management&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manual memory management for long-running processes&lt;/li&gt;
&lt;li&gt;Thread pool sizing for concurrent AI streams&lt;/li&gt;
&lt;li&gt;Database connection pooling&lt;/li&gt;
&lt;li&gt;No browser garbage collection to rely on&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;So why go through all this extra effort?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We believe the benefits are worth it for this specific use case:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Performance &amp;amp; Resource Efficiency&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;50-300 MB memory usage&lt;/strong&gt; vs 500+ MB for equivalent web apps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1.5-3 second startup&lt;/strong&gt; vs 5-10 seconds for web-based alternatives&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Direct system access&lt;/strong&gt; - no browser overhead for file operations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficient rendering&lt;/strong&gt; - only what changed, not full DOM diffing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. User Control &amp;amp; Privacy&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Complete local storage&lt;/strong&gt; - users truly own their data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No cloud dependencies&lt;/strong&gt; for core functionality&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encrypted local database&lt;/strong&gt; - conversations never leave the machine (learn more about &lt;a href="https://askimo.chat/security/" rel="noopener noreferrer"&gt;Askimo's security features&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No telemetry or tracking&lt;/strong&gt; by default - users control everything&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Long-term Strategic Benefits&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Local tool integration&lt;/strong&gt; - direct access to file system, terminal, development tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offline-first&lt;/strong&gt; - full functionality without internet (except AI API calls)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System integration&lt;/strong&gt; - global keyboard shortcuts, menu bar presence, system notifications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Future extensibility&lt;/strong&gt; - can integrate with OS-level features (Spotlight search, Quick Look, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Better UX for AI Workflows&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Instant search&lt;/strong&gt; - local SQLite queries are 10-100x faster than cloud-based search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reliable state&lt;/strong&gt; - no session timeouts, no lost tabs, no connection drops&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-tab workflows&lt;/strong&gt; - handle 20+ concurrent conversations without browser memory bloat (see &lt;a href="https://askimo.chat/docs/desktop/features/" rel="noopener noreferrer"&gt;desktop features&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistent experience&lt;/strong&gt; - same UI across all platforms, not dependent on browser quirks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The web browser is well-suited for content consumption, but for productivity tools handling sensitive data and requiring deep system integration, native apps offer advantages.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For Askimo specifically, the ability to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store thousands of conversations locally with instant search&lt;/li&gt;
&lt;li&gt;Switch AI providers without page reloads or state loss&lt;/li&gt;
&lt;li&gt;Work with multiple AI platforms - from cloud services like ChatGPT, Claude, and Gemini to &lt;a href="https://askimo.chat/docs/desktop/providers/ollama/" rel="noopener noreferrer"&gt;local Ollama models&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Access project files and web content directly for RAG context&lt;/li&gt;
&lt;li&gt;Work offline for reviewing past conversations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...made the extra development effort a worthwhile investment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; If you're building a simple content-focused app, choose web. If you're building a productivity tool that needs privacy, performance, and deep system integration, the native desktop path - despite its challenges - delivers better long-term value for users.&lt;/p&gt;


&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;Askimo uses a &lt;strong&gt;provider-agnostic architecture&lt;/strong&gt; that abstracts AI models behind a common interface. Here's the high-level structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────┐
│     Compose Desktop UI Layer            │
│   (ViewModels + Reactive State)         │
└──────────────┬──────────────────────────┘
               │
┌──────────────▼──────────────────────────┐
│      Session Management Layer           │
│   (up to 20 tabs, LRU cache)            │
└──────────────┬──────────────────────────┘
               │
┌──────────────▼──────────────────────────┐
│    Provider Abstraction Layer           │
│   ChatModelFactory&amp;lt;T: ProviderSettings&amp;gt; │
└──────────────┬──────────────────────────┘
               │
       ┌───────┴───────┐
       │               │
┌──────▼─────┐   ┌────▼──────┐
│  OpenAI    │   │  Ollama   │  ...
│  Factory   │   │  Factory  │
└────────────┘   └───────────┘
       │               │
┌──────▼───────────────▼──────────────────┐
│         LangChain4j Integration         │
│  (Streaming, Memory, RAG, Tools)        │
└─────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Core Implementation: Provider Abstraction
&lt;/h2&gt;

&lt;p&gt;The heart of Askimo's multi-provider support is the &lt;code&gt;ChatModelFactory&lt;/code&gt; interface. This is how we achieve provider independence. You can see all &lt;a href="https://askimo.chat/docs/desktop/ai-providers/" rel="noopener noreferrer"&gt;supported AI providers&lt;/a&gt; and their configuration in the documentation.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. ChatModelFactory Interface
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;ChatModelFactory&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;T&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ProviderSettings&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// List available models for this provider&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;availableModels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;T&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;

    &lt;span class="c1"&gt;// Identify which provider this factory creates&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;getProvider&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nc"&gt;ModelProvider&lt;/span&gt;

    &lt;span class="c1"&gt;// Default configuration for this provider&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;defaultSettings&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nc"&gt;T&lt;/span&gt;

    &lt;span class="c1"&gt;// Create a chat client instance&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ContentRetriever&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;executionMode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ExecutionMode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;chatMemory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ChatMemory&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;ChatClient&lt;/span&gt;

    &lt;span class="c1"&gt;// Create cheap utility client for classification tasks&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;createUtilityClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;T&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;ChatClient&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Design Decisions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Generic type parameter &lt;code&gt;&amp;lt;T: ProviderSettings&amp;gt;&lt;/code&gt;&lt;/strong&gt; - Each factory specifies its own settings type, ensuring type safety at compile time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ContentRetriever for RAG&lt;/strong&gt; - Optional parameter enables Retrieval-Augmented Generation for file/project context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ChatMemory injection&lt;/strong&gt; - Conversation history managed externally but injected at creation time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ExecutionMode awareness&lt;/strong&gt; - Different behavior for CLI vs Desktop (e.g., tools disabled in desktop)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Utility client for background tasks&lt;/strong&gt; - &lt;code&gt;createUtilityClient()&lt;/code&gt; returns a cheap, fast model for tasks that don't need the most powerful AI&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why createUtilityClient?
&lt;/h3&gt;

&lt;p&gt;Many AI workflows involve tasks that don't require expensive, state-of-the-art models. Examples include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory summarization:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Condensing old conversation messages into summaries&lt;/li&gt;
&lt;li&gt;A simple task that GPT-3.5-turbo handles just as well as GPT-4&lt;/li&gt;
&lt;li&gt;Running hundreds of times during long conversations&lt;/li&gt;
&lt;li&gt;Using GPT-4 would cost 10-20x more with no quality benefit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Intent classification:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deciding "should we use RAG for this query?" → YES/NO&lt;/li&gt;
&lt;li&gt;Validating "is this a question?" → YES/NO&lt;/li&gt;
&lt;li&gt;Simple binary decisions that don't need advanced reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The trade-off:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloud providers (OpenAI, Anthropic, Google):&lt;/strong&gt; Use a cheaper model (e.g., GPT-3.5-turbo costs ~$0.001/1K tokens vs GPT-4's ~$0.03/1K tokens)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local providers (Ollama, LM Studio):&lt;/strong&gt; Use the same model (no API costs, so no benefit to switching)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Example: OpenAI implementation&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OpenAiChatModelFactory&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ChatModelFactory&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;OpenAiSettings&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;createUtilityClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;OpenAiSettings&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;ChatClient&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Use GPT-3.5-turbo for cheap background tasks&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;sessionId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gpt-3.5-turbo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Cheap model for utility tasks&lt;/span&gt;
            &lt;span class="n"&gt;settings&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;executionMode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ExecutionMode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DESKTOP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;chatMemory&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Example: Ollama implementation&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OllamaChatModelFactory&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ChatModelFactory&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;OllamaSettings&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;createUtilityClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;OllamaSettings&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;ChatClient&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Local models have no API cost, use the same model&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;sessionId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;defaultModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Same model, no cost difference&lt;/span&gt;
            &lt;span class="n"&gt;settings&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;executionMode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ExecutionMode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DESKTOP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;chatMemory&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real-world impact:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A user with 100 conversations averaging 200 messages each triggers ~100 summarization calls&lt;/li&gt;
&lt;li&gt;With GPT-4: ~$6-10 in API costs for background tasks&lt;/li&gt;
&lt;li&gt;With GPT-3.5-turbo utility client: ~$0.30-0.50 in API costs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;20x cost reduction&lt;/strong&gt; for the same functionality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This pattern keeps the AI experience responsive and affordable without compromising the quality of user-facing responses.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. ProviderSettings Interface
&lt;/h3&gt;

&lt;p&gt;Each provider has its own settings class:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;interface&lt;/span&gt; &lt;span class="nc"&gt;ProviderSettings&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;defaultModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;

    &lt;span class="c1"&gt;// Human-readable description (masks sensitive data)&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;

    &lt;span class="c1"&gt;// Configurable fields for UI&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;getFields&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;SettingField&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;

    &lt;span class="c1"&gt;// Update a field and return new instance (immutable pattern)&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;updateField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fieldName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;ProviderSettings&lt;/span&gt;

    &lt;span class="c1"&gt;// Validate settings are ready for use&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nc"&gt;Boolean&lt;/span&gt;

    &lt;span class="c1"&gt;// Help text when validation fails&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;getSetupHelpText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messageResolver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Example: OpenAI Provider Implementation
&lt;/h3&gt;

&lt;p&gt;Here's how we implement the &lt;a href="https://askimo.chat/docs/desktop/providers/openai/" rel="noopener noreferrer"&gt;OpenAI/ChatGPT provider&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Serializable&lt;/span&gt;
&lt;span class="kd"&gt;data class&lt;/span&gt; &lt;span class="nc"&gt;OpenAiSettings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="py"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;defaultModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gpt-4o"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="py"&gt;baseUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://api.openai.com/v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="py"&gt;presets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Presets&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Presets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Style&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;BALANCED&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ProviderSettings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;HasApiKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;HasBaseUrl&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nc"&gt;Boolean&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isNotBlank&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;getSetupHelpText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messageResolver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"""
            OpenAI requires an API key to use.
            1. Get your API key: https://platform.openai.com/api-keys
            2. Configure it with: :set-param api_key YOUR_KEY
        """&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trimIndent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Streaming AI Responses: Managing Multiple Concurrent Conversations
&lt;/h2&gt;

&lt;p&gt;One of Askimo's key features: &lt;strong&gt;handling up to 20 simultaneous AI conversations&lt;/strong&gt;, each with its own streaming response thread.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Challenge
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Each active conversation needs a dedicated thread for streaming AI responses&lt;/li&gt;
&lt;li&gt;Memory must be bounded to prevent resource exhaustion&lt;/li&gt;
&lt;li&gt;Inactive sessions should be cached but unloaded from memory&lt;/li&gt;
&lt;li&gt;Thread-safe state management across concurrent operations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Our Approach
&lt;/h3&gt;

&lt;p&gt;We use Kotlin's &lt;strong&gt;StateFlow&lt;/strong&gt; for reactive state management and &lt;strong&gt;Coroutines&lt;/strong&gt; for concurrent streaming:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ChatViewModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;chatService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ChatService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;_isStreaming&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MutableStateFlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;isStreaming&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;StateFlow&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Boolean&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_isStreaming&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;asStateFlow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;_messages&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MutableStateFlow&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;gt;(&lt;/span&gt;&lt;span class="nf"&gt;emptyList&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;StateFlow&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;asStateFlow&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;sendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;viewModelScope&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;_isStreaming&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;

            &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;chatService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;streamResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;handleStreamingError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
                    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collect&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;appendToLastMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;_isStreaming&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;false&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reactive UI updates&lt;/strong&gt; - Compose automatically recomposes when StateFlow changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thread-safe&lt;/strong&gt; - StateFlow handles concurrent access safely&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backpressure handling&lt;/strong&gt; - Won't overwhelm UI with rapid updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic cleanup&lt;/strong&gt; - Coroutines cancelled when ViewModel disposed&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Session Management
&lt;/h3&gt;

&lt;p&gt;We maintain up to 20 active sessions in memory with LRU-style eviction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sessions only created when first accessed (lazy initialization)&lt;/li&gt;
&lt;li&gt;Inactive sessions automatically cleaned up when limit reached&lt;/li&gt;
&lt;li&gt;Active streaming sessions are never evicted&lt;/li&gt;
&lt;li&gt;Mutex-protected state for thread safety&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This keeps memory usage bounded (~50-300 MB total) while supporting real-world workflows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Error Recovery: Preserving Partial AI Responses
&lt;/h2&gt;

&lt;p&gt;AI APIs can fail at any moment - network issues, rate limits, timeouts. Most chat apps lose everything when this happens. Askimo preserves partial responses.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;When an AI streaming call fails:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You've already received 500 words of a 1000-word response&lt;/li&gt;
&lt;li&gt;The API connection drops&lt;/li&gt;
&lt;li&gt;Standard implementations discard everything&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Our Solution: Incremental Persistence
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;suspend&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;handleStreamingError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Throwable&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Get the partial content we've accumulated so far&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;partialContent&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getCurrentAccumulatedContent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;partialContent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isNotEmpty&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Save what we have&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;partialMessage&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;partialContent&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="s"&gt;"\n\n[Response interrupted: ${error.message}]"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ASSISTANT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;timestamp&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Clock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;isError&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;// Replace the temporary streaming message with saved version&lt;/span&gt;
        &lt;span class="nf"&gt;replaceTemporaryMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;partialMessage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;// Persist to database immediately&lt;/span&gt;
        &lt;span class="n"&gt;messageRepository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;partialMessage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Notify user with non-intrusive error indicator&lt;/span&gt;
    &lt;span class="n"&gt;eventBus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;StreamingErrorEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;User Experience:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Partial responses are &lt;strong&gt;preserved and saved&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;✅ Clear &lt;strong&gt;visual indication&lt;/strong&gt; that response was interrupted&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Resume capability&lt;/strong&gt; - Users can retry from the partial state&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;No data loss&lt;/strong&gt; - Everything is persisted immediately&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Project-Based Context: RAG for Your Documents
&lt;/h2&gt;

&lt;p&gt;One useful feature we added: &lt;strong&gt;point it at your documents and ask questions&lt;/strong&gt;. Whether it's code, PDFs, Microsoft Office files, OpenOffice documents, or web pages - Askimo can understand and answer questions about your content. Learn more about &lt;a href="https://askimo.chat/docs/desktop/rag/" rel="noopener noreferrer"&gt;Askimo's RAG capabilities&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture: Content Retrieval
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// User attaches a project folder or documents&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;project&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Project&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"my-project"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;knowledgeSources&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;listOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;FileSystemSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/path/to/documents"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;// PDFs, Office docs, text files&lt;/span&gt;
        &lt;span class="nc"&gt;FileSystemSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/path/to/codebase/src"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;// Source code&lt;/span&gt;
        &lt;span class="nc"&gt;UrlSource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"https://docs.example.com"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// Web documentation&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// When user sends a message in this project's session:&lt;/span&gt;
&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;retriever&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createContentRetriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;chatClient&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;factory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;sessionId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gpt-4o"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;settings&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openAiSettings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;retriever&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;retriever&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// RAG enabled!&lt;/span&gt;
    &lt;span class="n"&gt;executionMode&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ExecutionMode&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DESKTOP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chatMemory&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;conversationMemory&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How RAG Works in Askimo
&lt;/h3&gt;

&lt;p&gt;Askimo supports a wide range of document formats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Office Documents&lt;/strong&gt;: Microsoft Word (.docx), Excel (.xlsx), PowerPoint (.pptx)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenOffice&lt;/strong&gt;: Writer (.odt), Calc (.ods), Impress (.odp)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PDFs&lt;/strong&gt;: Extracts text content from PDF files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code&lt;/strong&gt;: All programming languages and text-based formats&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web Pages&lt;/strong&gt;: Crawl and index documentation sites&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The RAG Pipeline:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ingestion&lt;/strong&gt;: Documents are parsed, chunked, and embedded when project is created&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query-time retrieval&lt;/strong&gt;: User's question is embedded and similar chunks retrieved&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context injection&lt;/strong&gt;: Retrieved chunks are added to the prompt automatically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response&lt;/strong&gt;: AI answers using both conversation history AND document context&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Hybrid Search: JVector + Lucene
&lt;/h3&gt;

&lt;p&gt;We chose a &lt;strong&gt;hybrid content retriever&lt;/strong&gt; that combines two complementary search strategies:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Vector Search (JVector)&lt;/strong&gt; - Semantic similarity&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Finds content that's conceptually related to the query&lt;/li&gt;
&lt;li&gt;Example: Query "error handling" matches "exception management" even without exact words&lt;/li&gt;
&lt;li&gt;Uses embeddings to capture meaning, not just keywords&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Keyword Search (Lucene)&lt;/strong&gt; - Exact term matching&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Finds content with specific terms, names, or identifiers&lt;/li&gt;
&lt;li&gt;Example: Query "UserRepository.findById" finds exact method references&lt;/li&gt;
&lt;li&gt;Critical for code, API names, and technical terms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why hybrid?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Neither approach alone is sufficient:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vector-only&lt;/strong&gt;: Misses exact matches (class names, function signatures, specific error codes)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keyword-only&lt;/strong&gt;: Misses semantic relationships (synonyms, paraphrased concepts, related ideas)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The hybrid retriever combines both using &lt;strong&gt;Reciprocal Rank Fusion (RRF)&lt;/strong&gt; - a proven algorithm that merges ranked lists:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;HybridContentRetriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;vectorRetriever&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ContentRetriever&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;keywordRetriever&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ContentRetriever&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;maxResults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;  &lt;span class="c1"&gt;// Standard RRF constant&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ContentRetriever&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;vectorResults&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectorRetriever&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;keywordResults&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;keywordRetriever&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;// Merge using Reciprocal Rank Fusion&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;reciprocalRankFusion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vectorResults&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keywordResults&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;take&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maxResults&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;How RRF works:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For each document, calculate a fusion score based on its rank in each list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RRF_score(doc) = Σ 1 / (k + rank_i)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;k = 60&lt;/code&gt; (standard constant that balances the contribution from different retrievers)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;rank_i&lt;/code&gt; is the position of the document in retriever i's results (1st = rank 1, 2nd = rank 2, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;A document ranked &lt;strong&gt;#1 in vector search&lt;/strong&gt; and &lt;strong&gt;#3 in keyword search&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vector score: 1/(60+1) ≈ 0.016&lt;/li&gt;
&lt;li&gt;Keyword score: 1/(60+3) ≈ 0.016&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total RRF score: 0.032&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;A document ranked &lt;strong&gt;#1 in both&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vector score: 1/(60+1) ≈ 0.016&lt;/li&gt;
&lt;li&gt;Keyword score: 1/(60+1) ≈ 0.016&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total RRF score: 0.032&lt;/strong&gt; (same as above!)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;A document ranked &lt;strong&gt;#1 in vector&lt;/strong&gt; but &lt;strong&gt;not found in keyword&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vector score: 1/(60+1) ≈ 0.016&lt;/li&gt;
&lt;li&gt;Keyword score: 0&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total RRF score: 0.016&lt;/strong&gt; (lower than documents found in both)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why RRF is better than weighted averaging:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Rank-based, not score-based&lt;/strong&gt;: Different retrievers produce incomparable scores. RRF only cares about relative ranking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Robust to failures&lt;/strong&gt;: If one retriever fails, we gracefully fall back to the other&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rewards consensus&lt;/strong&gt;: Documents appearing in both lists naturally get higher scores&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Well-researched&lt;/strong&gt;: RRF is a proven algorithm used in information retrieval research&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Real-world impact:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Asks "how to fix null pointer" → finds both "NullPointerException" (keyword) and "defensive null checks" (semantic)&lt;/li&gt;
&lt;li&gt;Asks about "database queries" → finds both "SQL" (keyword) and "data access patterns" (semantic)&lt;/li&gt;
&lt;li&gt;More accurate retrieval = better AI answers grounded in your actual documents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Implementation uses LangChain4j's RAG components:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;createContentRetriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Project&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;ContentRetriever&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;embeddingStore&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;InMemoryEmbeddingStore&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;TextSegment&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;embeddingModel&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createEmbeddingModel&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;// Index all knowledge sources&lt;/span&gt;
    &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;knowledgeSources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;documents&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;loadDocuments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;segments&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DocumentSplitters&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recursive&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;embeddingStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;segments&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embeddingModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embedAll&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;segments&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;EmbeddingStoreContentRetriever&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;embeddingStore&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embeddingStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;embeddingModel&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embeddingModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;maxResults&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;minScore&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Memory Management: Token-Aware Conversation History
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Token Problem
&lt;/h3&gt;

&lt;p&gt;Here's something many users don't realize: &lt;strong&gt;Every time you send a message to an AI model, the entire conversation history goes with it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you ask ChatGPT or Claude a question, the API call looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You are a helpful assistant"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"What is Python?"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Python is a programming language..."&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"How do I install it?"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You can install Python by..."&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Show me a hello world example"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the pattern? &lt;strong&gt;Every previous message is sent again.&lt;/strong&gt; This is how AI models maintain context - they don't actually "remember" your conversation. Each request is stateless, so you must resend the entire history for the model to understand what you're talking about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The consequences:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Token consumption grows quadratically&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Message 1: ~100 tokens sent&lt;/li&gt;
&lt;li&gt;Message 10: ~1,000 tokens sent (all previous messages)&lt;/li&gt;
&lt;li&gt;Message 50: ~5,000+ tokens sent&lt;/li&gt;
&lt;li&gt;Message 100: ~10,000+ tokens sent (approaching model limits!)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;API costs increase&lt;/strong&gt;: You pay for every token sent, so longer conversations get exponentially more expensive&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Context limits&lt;/strong&gt;: Most models have token limits (4K-128K depending on the model). Once you hit the limit, you can't continue the conversation without removing history&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Performance degradation&lt;/strong&gt;: Larger context windows slow down response times&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Askimo's solution:&lt;/strong&gt; Auto-summarize old messages while keeping recent ones to maintain conversational flow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Token-Aware Memory with Intelligent Summarization
&lt;/h3&gt;

&lt;p&gt;The key insight: &lt;strong&gt;You don't need the entire history, just enough context.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most conversations follow a natural pattern - the most recent exchanges are what matter for understanding the current question. Earlier messages provide background context, but you rarely need word-for-word accuracy from 50 messages ago. What you need is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Recent messages in full&lt;/strong&gt; - The last 50-60% of conversation for immediate context and continuity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Historical overview&lt;/strong&gt; - A structured summary of earlier messages capturing key facts, decisions, and topics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System instructions preserved&lt;/strong&gt; - Original prompts and setup never discarded&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it like a work meeting - you don't replay the entire 2-hour discussion. You recap the key decisions from the first hour, then dive into the details of recent conversation.&lt;/p&gt;

&lt;p&gt;Askimo's approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Summarize old messages&lt;/strong&gt; - Condense the oldest 45% into a structured summary with key facts and topics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep recent messages intact&lt;/strong&gt; - Preserve the remaining 55% for immediate conversational context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Never touch system messages&lt;/strong&gt; - Instructions are always preserved&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run asynchronously&lt;/strong&gt; - Doesn't block user interaction
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TokenAwareSummarizingMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;appContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;AppContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;sessionId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;summarizationThreshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Double&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.6&lt;/span&gt;  &lt;span class="c1"&gt;// Trigger at 60% of max tokens&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ChatMemory&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="c1"&gt;// Maximum tokens: 40% of model's context window (dynamically calculated)&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt;
        &lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ModelContextSizeCache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;currentModel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.4&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toInt&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;persistToDatabase&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;totalTokens&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;estimateTotalTokens&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;threshold&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;maxTokens&lt;/span&gt; &lt;span class="p"&gt;*&lt;/span&gt; &lt;span class="n"&gt;summarizationThreshold&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toInt&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;totalTokens&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="p"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;!&lt;/span&gt;&lt;span class="n"&gt;summarizationInProgress&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nf"&gt;triggerAsyncSummarization&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;  &lt;span class="c1"&gt;// Non-blocking&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buildList&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Structured summary as system message (if exists)&lt;/span&gt;
        &lt;span class="n"&gt;structuredSummary&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;let&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// Recent conversation messages&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toChatMessage&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Structured summary format&lt;/span&gt;
&lt;span class="nd"&gt;@Serializable&lt;/span&gt;
&lt;span class="kd"&gt;data class&lt;/span&gt; &lt;span class="nc"&gt;ConversationSummary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;keyFacts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Map&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;mainTopics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;recentContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What this achieves:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Before (100 messages, ~15,000 tokens):
[Message 1] [Message 2] ... [Message 98] [Message 99] [Message 100]
❌ Exceeds token limit, API call fails

After summarization (~10,000 tokens):
[Summary of messages 1-45] [Message 46] ... [Message 99] [Message 100]
✅ Under token limit, context preserved, conversation continues smoothly
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Implementation details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Summarizes the oldest &lt;strong&gt;45% of conversation messages&lt;/strong&gt; when token threshold (60% of max) is reached&lt;/li&gt;
&lt;li&gt;System messages (instructions) are &lt;strong&gt;never summarized or removed&lt;/strong&gt; - they're preserved indefinitely&lt;/li&gt;
&lt;li&gt;Runs &lt;strong&gt;asynchronously&lt;/strong&gt; so it doesn't block the user's interaction&lt;/li&gt;
&lt;li&gt;Falls back to &lt;strong&gt;extractive summary&lt;/strong&gt; if AI-powered summarization fails&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real-world example:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Imagine a 100-message conversation about building a React app that hits the token limit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Messages 1-45&lt;/strong&gt;: Initial planning, architecture decisions, setup questions, debugging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Messages 46-100&lt;/strong&gt;: Recent implementation and current discussion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Without summarization:&lt;/strong&gt; All 100 messages sent = ~15,000 tokens ❌ Exceeds limit&lt;br&gt;
&lt;strong&gt;With summarization:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Structured summary of messages 1-45: ~800 tokens&lt;/li&gt;
&lt;li&gt;Messages 46-100 (55 messages): ~8,250 tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total: ~9,050 tokens&lt;/strong&gt; (~40% reduction, under the limit ✅)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why keep the majority of recent messages intact?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The AI needs &lt;strong&gt;immediate context&lt;/strong&gt; to understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What you discussed in the last 50+ messages&lt;/li&gt;
&lt;li&gt;The current flow and direction of conversation&lt;/li&gt;
&lt;li&gt;Recent code examples, error messages, or specific questions&lt;/li&gt;
&lt;li&gt;Continuity between related topics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A structured summary with key facts like "User is building a React app with TypeScript, discussed routing and API integration" provides useful background context. But the AI needs the actual recent messages to understand nuanced questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"So should I use try-catch or error boundaries?" (referring to your error handling discussion 10 messages ago)&lt;/li&gt;
&lt;li&gt;"Can you show me the implementation for the second approach?" (referring to two options discussed recently)&lt;/li&gt;
&lt;li&gt;"What was that library you mentioned earlier?" (needs the actual message where the library was named)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The 45/55 split&lt;/strong&gt; strikes the right balance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;45% oldest messages&lt;/strong&gt; → Summarized into key facts and topics (compressed ~95%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;55% recent messages&lt;/strong&gt; → Kept verbatim for full conversational context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System messages&lt;/strong&gt; → Always preserved (these are instructions, not conversation)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach ensures the AI has both:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Condensed historical context&lt;/strong&gt; - What the conversation has been about overall&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full recent detail&lt;/strong&gt; - The nuanced back-and-forth needed to continue naturally&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;30-50% token reduction&lt;/strong&gt; - Meaningful API cost savings over time&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Unlimited conversations&lt;/strong&gt; - Never hit token limits, chat forever&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Structured summaries&lt;/strong&gt; - AI extracts key facts and topics, not just truncation&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Transparent to users&lt;/strong&gt; - Happens asynchronously in the background&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Robust fallback&lt;/strong&gt; - If AI summarization fails, uses extractive summary&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Dynamic limits&lt;/strong&gt; - Automatically adjusts based on model's context window (40% allocation)&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Smart preservation&lt;/strong&gt; - System messages (instructions) are never removed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No manual intervention&lt;/strong&gt; - Summarization happens transparently when 60% threshold is reached&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost optimization&lt;/strong&gt; - Reducing 30-50% of tokens adds up over hundreds of conversations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better context quality&lt;/strong&gt; - Structured summaries preserve key facts and topics, removing conversational noise&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistence&lt;/strong&gt; - Memory is saved to database, survives app restarts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Async operation&lt;/strong&gt; - 60-second timeout ensures it doesn't block user interaction&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Performance Insights: Managing Multiple AI Platforms in One Desktop App
&lt;/h2&gt;

&lt;p&gt;Building a desktop app that manages multiple concurrent AI conversations taught us important lessons about resource management. Here's what we learned about the performance trade-offs:&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Usage Patterns
&lt;/h3&gt;

&lt;p&gt;A typical desktop AI chat application's memory footprint consists of:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Base application layer (~50 MB)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JVM runtime overhead&lt;/li&gt;
&lt;li&gt;Compose Desktop UI framework&lt;/li&gt;
&lt;li&gt;Core application state&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Per-session overhead (~2-5 MB each)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each conversation needs its own ViewModel instance&lt;/li&gt;
&lt;li&gt;State management (messages list, streaming state, settings)&lt;/li&gt;
&lt;li&gt;With 20 concurrent sessions: ~40-100 MB additional&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Conversation history caching (~5-10 MB per 100 messages)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Messages are kept in memory for active sessions&lt;/li&gt;
&lt;li&gt;Lazy loading from SQLite for inactive sessions&lt;/li&gt;
&lt;li&gt;A power user with 20 tabs × 100 messages each ≈ 100-200 MB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;RAG embedding stores (varies by project size)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small project (500 files): ~50 MB&lt;/li&gt;
&lt;li&gt;Medium project (5,000 files): ~200-500 MB&lt;/li&gt;
&lt;li&gt;Large project (20,000+ files): 1+ GB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Total memory range: 50-300 MB&lt;/strong&gt; for typical usage (excluding large RAG projects and AI model memory).&lt;/p&gt;

&lt;h3&gt;
  
  
  Why These Numbers Matter
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Compared to web-based alternatives:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Web apps in browser tabs: 200-500 MB &lt;strong&gt;per tab&lt;/strong&gt; (browser overhead included)&lt;/li&gt;
&lt;li&gt;Our approach: 2-5 MB per session (no browser overhead)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trade-off&lt;/strong&gt;: We had to build custom rendering, but gained 10-100x better per-session memory efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Startup time trade-offs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cold start: 1.5-3 seconds (loading JVM + Compose Desktop)&lt;/li&gt;
&lt;li&gt;Web apps: ~1 second initial load, but 3-5 seconds for full interactivity&lt;/li&gt;
&lt;li&gt;Electron alternatives: 5-10 seconds (loading Chromium)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learning&lt;/strong&gt;: Desktop app initialization is competitive once you account for full interactivity&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Database Performance
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;SQLite for local message storage:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Write latency: &amp;lt;10ms per message (includes indexing)&lt;/li&gt;
&lt;li&gt;Full-text search: &amp;lt;50ms across 10,000+ messages&lt;/li&gt;
&lt;li&gt;No network round-trip delays like cloud-based alternatives&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why local-first matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero API latency for message retrieval&lt;/li&gt;
&lt;li&gt;Works fully offline for history browsing&lt;/li&gt;
&lt;li&gt;No sync conflicts or version issues&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Concurrency Limits
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Why we cap at 20 concurrent sessions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each streaming session holds an open HTTP connection&lt;/li&gt;
&lt;li&gt;Memory grows linearly with active sessions&lt;/li&gt;
&lt;li&gt;UI remains responsive up to ~30 tabs, but 20 is a comfortable limit&lt;/li&gt;
&lt;li&gt;Real-world usage: Most users have 3-8 active conversations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The lesson:&lt;/strong&gt; Hard limits prevent resource exhaustion. Better to cap explicitly than let the system degrade unpredictably.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rendering Performance
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Compose Desktop maintains 60 FPS&lt;/strong&gt; because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only re-renders changed UI components (reactive architecture)&lt;/li&gt;
&lt;li&gt;Streaming updates are throttled to prevent overwhelming the UI thread&lt;/li&gt;
&lt;li&gt;Message virtualization for long conversation lists&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Trade-off we made:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Custom markdown renderer required significant effort&lt;/li&gt;
&lt;li&gt;But we gained full control over rendering performance and caching&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Takeaways for Desktop AI Application Development
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Memory management is crucial&lt;/strong&gt; - With multiple concurrent sessions, every MB counts. Lazy loading and LRU eviction prevented unbounded growth.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Local-first architecture pays off&lt;/strong&gt; - SQLite message storage gives us instant search and offline access without cloud sync complexity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Async everywhere&lt;/strong&gt; - Kotlin coroutines made concurrent streaming manageable. Every blocking operation runs in a background dispatcher.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cap resources explicitly&lt;/strong&gt; - 20 concurrent sessions is a reasonable limit that prevents degradation while supporting real workflows.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Desktop overhead is acceptable&lt;/strong&gt; - The 1.5-3s startup time and 50MB base memory are worthwhile for the privacy, performance, and offline benefits.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Askimo is open source (AGPLv3) and available now:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🌐 &lt;strong&gt;Website&lt;/strong&gt;: &lt;a href="https://askimo.chat" rel="noopener noreferrer"&gt;https://askimo.chat&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 &lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/haiphucnguyen/askimo" rel="noopener noreferrer"&gt;github.com/haiphucnguyen/askimo&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📥 &lt;strong&gt;Download&lt;/strong&gt;: &lt;a href="https://askimo.chat/download/" rel="noopener noreferrer"&gt;Get installers for macOS, Windows, Linux&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📖 &lt;strong&gt;Docs&lt;/strong&gt;: &lt;a href="https://askimo.chat/docs/" rel="noopener noreferrer"&gt;Complete documentation and setup guides&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Related resources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://askimo.chat/docs/desktop/installation/" rel="noopener noreferrer"&gt;Installation guides&lt;/a&gt; for &lt;a href="https://askimo.chat/docs/desktop/installation/macos/" rel="noopener noreferrer"&gt;macOS&lt;/a&gt;, &lt;a href="https://askimo.chat/docs/desktop/installation/windows/" rel="noopener noreferrer"&gt;Windows&lt;/a&gt;, and &lt;a href="https://askimo.chat/docs/desktop/installation/linux/" rel="noopener noreferrer"&gt;Linux&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;We're actively developing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Voice input/output&lt;/strong&gt; - Hands-free conversations with speech-to-text and text-to-speech support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugin system&lt;/strong&gt; - Extensible architecture for custom integrations:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Custom RAG material sources&lt;/strong&gt; - Integrate with Confluence, Notion, Google Drive, databases, or any data source&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP (Model Context Protocol) integrations&lt;/strong&gt; - Connect AI models to external tools and services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom AI providers&lt;/strong&gt; - Add support for new AI services without modifying core code&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Team features&lt;/strong&gt; - Share prompts, custom directives, and RAG projects across your organization&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Mobile companion app&lt;/strong&gt; - iOS and Android apps using Kotlin Multiplatform to reuse 60-80% of desktop codebase&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Want to contribute?&lt;/strong&gt; Check out our &lt;a href="https://github.com/haiphucnguyen/askimo/blob/main/CONTRIBUTING.md" rel="noopener noreferrer"&gt;CONTRIBUTING.md&lt;/a&gt; - we welcome PRs for new providers, features, and bug fixes!&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Found this helpful?&lt;/strong&gt; ⭐ &lt;a href="https://github.com/haiphucnguyen/askimo" rel="noopener noreferrer"&gt;Star Askimo on GitHub&lt;/a&gt; and try it for your own AI workflows!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This article showcases production patterns from &lt;a href="https://askimo.chat" rel="noopener noreferrer"&gt;Askimo&lt;/a&gt;, an AGPLv3-licensed desktop AI chat application built with Kotlin and Compose for Desktop. All code examples are simplified from the actual implementation available on GitHub.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>openai</category>
      <category>kotlin</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Inside Askimo: My Daily Journey with an AI CLI</title>
      <dc:creator>Nguyen Phuc Hai</dc:creator>
      <pubDate>Thu, 04 Sep 2025 16:00:00 +0000</pubDate>
      <link>https://dev.to/nguyen_phuchai_b01cae130/inside-askimo-my-daily-journey-with-an-ai-cli-534l</link>
      <guid>https://dev.to/nguyen_phuchai_b01cae130/inside-askimo-my-daily-journey-with-an-ai-cli-534l</guid>
      <description>&lt;h2&gt;
  
  
  Inside Askimo
&lt;/h2&gt;

&lt;p&gt;When I first started tinkering with Askimo, I wasn’t trying to create a big project. I just wanted something simple to make my day easier. I live in the terminal and bounce between AI tools—OpenAI for some things, Ollama locally, Copilot at work. Switching between them felt clunky, and being tied to one vendor didn’t make sense.&lt;/p&gt;

&lt;p&gt;Then it clicked: what if I had one CLI that could talk to all of them, and let me automate the boring parts? Not just cross-platform, but repeatable. I often need to run the same command with different inputs—a set of messages, a list of files, variations of a prompt—pipe data in, script it, and reuse it later. That’s how Askimo began.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Tool I Actually Use Every Day
&lt;/h2&gt;

&lt;p&gt;Askimo isn’t just a side project that I work on in spare time - it has become something I rely on daily. I use it to summarize long documents, generate quick drafts, or even suggest names for functions when I’m stuck. Because it lives in the terminal, it feels natural - just another command, like git or docker.&lt;/p&gt;

&lt;p&gt;I didn’t build Askimo for show. I built it for myself first. But once it became part of my routine, I realized it might be useful for others too.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Askimo Can Do (Right Now)
&lt;/h2&gt;

&lt;p&gt;Even though it’s still early, Askimo already fits neatly into my workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Runs everywhere - Homebrew on macOS/Linux, binaries for Windows, or Docker if I don’t want to install anything.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Feels consistent - the same commands work whether I’m on my laptop or a server.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Local file context - I can ask questions about a file in my project.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multiple providers - I can switch between OpenAI, Ollama, Gemini, or X AI without leaving the CLI.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These weren’t “features” I brainstormed - they were gaps I ran into while working. Each one exists because I personally needed it.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Journey of Learning
&lt;/h2&gt;

&lt;p&gt;Askimo has also been my way of learning how to apply AI, not just read about it. Building it forced me to experiment: to test prompts, to break things, to see where AI adds value and where it doesn’t.&lt;/p&gt;

&lt;p&gt;I’ve come to realize that AI doesn’t replace my work - it extends it. Sometimes it saves me from tedious repetition. Other times, it pushes me to think differently about automation. Each step of Askimo’s development has been a reflection of how I’m learning to work with AI rather than around it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;If you want to try it out, installation is simple - Homebrew, binaries, or Docker. I keep the instructions here:&lt;br&gt;
&lt;a href="https://haiphucnguyen.github.io/askimo/installation/" rel="noopener noreferrer"&gt;👉 Installation Guide&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Next
&lt;/h2&gt;

&lt;p&gt;I’ve got plenty of ideas for where to take it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Chaining commands into more powerful workflows.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Custom commands - I can turn repeated prompts into shortcuts, so I don’t waste time retyping.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Indexing projects so Askimo understands my real workspace - source code, database schemas/migrations, configuration files, API specs, docs, and even build logs - not just isolated files.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The vision isn’t just a CLI for chat - I want Askimo to grow into a programmable AI environment that feels at home in the terminal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Moving Forward
&lt;/h2&gt;

&lt;p&gt;What excites me most isn’t just the tool itself, but what it represents. Askimo started as a weekend hack, but it’s grown into both a part of my daily workflow and a mirror of my own journey learning to apply AI.&lt;/p&gt;

&lt;p&gt;For me, it’s proof that AI can be practical, lightweight, and personal. And as I keep building, I’ll keep sharing the lessons I learn along the way - because Askimo isn’t just about what AI can do, it’s about how we, as developers, can shape it into something that fits naturally into our work.&lt;/p&gt;

&lt;p&gt;If you try Askimo, I’d love to hear how it fits into your routine.&lt;/p&gt;

&lt;p&gt;I’ve made the project open source because I believe tools like this get better when they’re shaped by a community, not just by one developer’s perspective. If you’re curious, want to contribute, or simply want to star the project to follow its progress, you can find it here:&lt;br&gt;
&lt;a href="https://github.com/haiphucnguyen/askimo" rel="noopener noreferrer"&gt;👉 Askimo on GitHub&lt;/a&gt;&lt;/p&gt;

</description>
      <category>cli</category>
      <category>ai</category>
      <category>opensource</category>
      <category>openai</category>
    </item>
    <item>
      <title>Askimo: An Open-Source Command-Line AI Assistant</title>
      <dc:creator>Nguyen Phuc Hai</dc:creator>
      <pubDate>Tue, 19 Aug 2025 16:00:00 +0000</pubDate>
      <link>https://dev.to/nguyen_phuchai_b01cae130/askimo-an-open-source-command-line-ai-assistant-3dnj</link>
      <guid>https://dev.to/nguyen_phuchai_b01cae130/askimo-an-open-source-command-line-ai-assistant-3dnj</guid>
      <description>&lt;p&gt;Over the last two weeks, I’ve been working on a side project called Askimo — a command-line AI assistant that I’ve released under the MIT license.&lt;/p&gt;

&lt;p&gt;It started from a simple need: I use AI a lot in my daily work — OpenAI, Claude, Ollama, Copilot. Many of the tasks are small and repetitive, like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Generating release notes from commits&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Summarizing logs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Updating a GitHub email owner etc&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wanted a tool that could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Switch between providers quickly&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Automate repetitive tasks in a creative way&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Be customized for my workflow&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Why build another CLI AI tool?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There are already some great tools out there. But I decided to build my own for a few reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Learning → I wanted to explore GraalVM for cross-platform native images and get hands-on with LangChain4j to experiment with system messages, tokens, memory, and prompt tuning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Control → Having my own tool means I can set the pace, customize features for my workflow, and extend it however I want.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Openness → Askimo is MIT licensed, with a pluggable design that makes it easy to support both closed APIs and open-source models like Ollama.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;What Askimo can do today&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Streaming chat in the terminal (interactive REPL)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pipe execution: cat logs.txt | askimo "summarize this"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multiple AI providers: currently OpenAI and Ollama, with a pluggable design to add more (e.g., Gemini)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Simple web chat page for those who prefer not to use the terminal (though CLI is where automation really shines)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;What’s next?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Askimo is still very early. I’m exploring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Adding more providers (both open-source and hosted APIs)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Custom function for repetitive tasks (e.g., release notes, log analysis, ticket triage)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Extending the plugin system so the community can add their own commands and providers&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Contributions welcome 🙌&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I built Askimo mainly for myself, but I’d love to see how others use it. Every contribution helps — whether it’s opening issues, suggesting features, or sending a PR.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/haiphucnguyen/askimo" rel="noopener noreferrer"&gt;https://github.com/haiphucnguyen/askimo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you’re interested in AI at the terminal, or just want to tinker with GraalVM and LangChain4j, give it a try. And if you like the idea, a ⭐️ would mean a lot.&lt;/p&gt;

</description>
      <category>cli</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
