<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Stephan Janssen</title>
    <description>The latest articles on DEV Community by Stephan Janssen (@stephanj).</description>
    <link>https://dev.to/stephanj</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F499111%2Ffe24f5de-2c71-42a9-a88d-8d65aa691f61.jpeg</url>
      <title>DEV Community: Stephan Janssen</title>
      <link>https://dev.to/stephanj</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/stephanj"/>
    <language>en</language>
    <item>
      <title>LLM Inference using 100% Modern Java ☕️🔥</title>
      <dc:creator>Stephan Janssen</dc:creator>
      <pubDate>Mon, 21 Oct 2024 18:37:50 +0000</pubDate>
      <link>https://dev.to/stephanj/llm-inference-using-100-modern-java-30i2</link>
      <guid>https://dev.to/stephanj/llm-inference-using-100-modern-java-30i2</guid>
      <description>&lt;p&gt;In the rapidly evolving world of (Gen)AI, Java developers now have powerful new (LLM Inference) tools at their disposal: &lt;a href="https://github.com/mukel/llama3.java" rel="noopener noreferrer"&gt;Llama3.java&lt;/a&gt; and &lt;a href="https://github.com/tjake/Jlama" rel="noopener noreferrer"&gt;JLama&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;These projects brings the capabilities of large language models (LLMs) to the Java ecosystem, offering an exciting opportunity for developers to integrate advanced language processing into their applications.&lt;/p&gt;

&lt;p&gt;Here's an example of Llama3.java providing inference for the &lt;a href="https://plugins.jetbrains.com/plugin/24169-devoxxgenie" rel="noopener noreferrer"&gt;DevoxxGenie&lt;/a&gt; IDEA plugin.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/pDafHplEVPk"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  The JLama Project
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/tjake/Jlama" rel="noopener noreferrer"&gt;JLama&lt;/a&gt; (a 100% Java inference engine) is developed by Jake Luciani and supports a whole range of LLM's : &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gemma &amp;amp; Gemma 2 Models&lt;/li&gt;
&lt;li&gt;Llama &amp;amp; Llama2 &amp;amp; Llama3 Models&lt;/li&gt;
&lt;li&gt;Mistral &amp;amp; Mixtral Models&lt;/li&gt;
&lt;li&gt;Qwen2 Models&lt;/li&gt;
&lt;li&gt;GPT-2 Models&lt;/li&gt;
&lt;li&gt;BERT Models&lt;/li&gt;
&lt;li&gt;BPE Tokenizers&lt;/li&gt;
&lt;li&gt;WordPiece Tokenizers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's his Devoxx Belgium 2024 presentation with more information and demo's.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/p-p_oRjEVow"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;From a features perspective this is the most advanced Java implementation currently available. He even supports LLM sharding on layers and head attention level 🤩&lt;/p&gt;

&lt;p&gt;Features includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Paged Attention&lt;/li&gt;
&lt;li&gt;Mixture of Experts&lt;/li&gt;
&lt;li&gt;Tool Calling&lt;/li&gt;
&lt;li&gt;Generate Embeddings&lt;/li&gt;
&lt;li&gt;Classifier Support&lt;/li&gt;
&lt;li&gt;Huggingface SafeTensors model and tokenizer format&lt;/li&gt;
&lt;li&gt;Support for F32, F16, BF16 types&lt;/li&gt;
&lt;li&gt;Support for Q8, Q4 model quantization&lt;/li&gt;
&lt;li&gt;Fast GEMM operations&lt;/li&gt;
&lt;li&gt;Distributed Inference!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://github.com/tjake/Jlama" rel="noopener noreferrer"&gt;JLama&lt;/a&gt; requires Java 20 or later and utilises the new Vector API for faster inference.&lt;/p&gt;

&lt;p&gt;You can easily run JLama on your computer, on Apple Silicon make sure you have an ARM based SDK.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export JAVA_HOME=/Library/Java/JavaVirtualMachines/liberica-jdk-21.jdk/Contents/Home
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can start JLama with the restapi param and the optional auto-download to start the inference service.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;jlama restapi tjake/TinyLlama-1.1B-Chat-v1.0-Jlama-Q4 --auto-download
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will download the model if you haven't already.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6v17hx0n4rr6hqy8cknl.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6v17hx0n4rr6hqy8cknl.jpeg" alt="Experimental JLama and DevoxxGenie integration" width="800" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvc1b1lb9vfhkd2g5wxp0.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvc1b1lb9vfhkd2g5wxp0.jpeg" alt="Alina and Alfonso at Devoxx Belgium 2024" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The JLama3.java Project
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://github.com/mukel/llama3.java" rel="noopener noreferrer"&gt;Llama3.java&lt;/a&gt; is also a 100% Java implementation developed by Alfonso² Peterssen and inspired by Andrej Karpathy. &lt;/p&gt;

&lt;p&gt;Features includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single file, no dependencies&lt;/li&gt;
&lt;li&gt;GGUF format parser&lt;/li&gt;
&lt;li&gt;Llama 3 tokenizer based on minbpe&lt;/li&gt;
&lt;li&gt;Llama 3 inference with Grouped-Query Attention&lt;/li&gt;
&lt;li&gt;Support Llama 3.1 (ad-hoc RoPE scaling) and 3.2 (tie word embeddings)&lt;/li&gt;
&lt;li&gt;Support for Q8_0 and Q4_0 quantizations&lt;/li&gt;
&lt;li&gt;Fast matrix-vector multiplication routines for quantized tensors using Java's Vector API&lt;/li&gt;
&lt;li&gt;Simple CLI with --chat and --instruct modes.&lt;/li&gt;
&lt;li&gt;GraalVM's Native Image support (EA builds here)&lt;/li&gt;
&lt;li&gt;AOT model pre-loading for instant time-to-first-token&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's the Devoxx Belgium 2024 presentation by Alfonso and Alina.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/zgAMxC7lzkc"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  Llama3.java + (OpenAI) REST API
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/mukel/llama3.java" rel="noopener noreferrer"&gt;Llama3.java&lt;/a&gt; doesn't have any REST interface so I decided to contribute that part ❤️&lt;/p&gt;

&lt;p&gt;I've added a Spring Boot wrapper around the core &lt;a href="https://github.com/mukel/llama3.java" rel="noopener noreferrer"&gt;Llama3.java&lt;/a&gt; library, allowing developers to easily set up and run an OpenAI-compatible REST API for text generation and chat completions. The goal is to use this as the 100% Java inference engine for the IDEA &lt;a href="https://plugins.jetbrains.com/plugin/24169-devoxxgenie" rel="noopener noreferrer"&gt;DevoxxGenie&lt;/a&gt; plugin. Allowing local inference using a complete Java solution.&lt;/p&gt;

&lt;p&gt;Code is available on &lt;a href="https://github.com/stephanj/Llama3JavaChatCompletionService" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the time being I've copied the Llama3.java source code into my project but ideally this should be integrated as a Maven dependency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;OpenAI-compatible API: The project implements an API that mimics OpenAI's chat completions endpoint, making it easy to integrate with existing applications.&lt;/li&gt;
&lt;li&gt;Support for GGUF Models: Llama3.java can work with GGUF (GPT-Generated Unified Format) models, which are optimised for efficiency and performance.&lt;/li&gt;
&lt;li&gt;Vector API Utilization: The project leverages Java's incubator Vector API for improved performance on matrix operations.&lt;/li&gt;
&lt;li&gt;Cross-Platform Compatibility: While optimized for Apple Silicon (M1/M2/M3), the project can run on various platforms with the appropriate Java SDK.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;To get started with Llama3.java, follow these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Setup: Ensure you have a compatible Java SDK installed. For Apple Silicon users, an ARM-compliant SDK is recommended.&lt;/li&gt;
&lt;li&gt;Build: Use Maven to build the project with "mvn clean package".&lt;/li&gt;
&lt;li&gt;Download a Model: Obtain a GGUF model from the Hugging Face model hub and place it in the 'models' directory.&lt;/li&gt;
&lt;li&gt;Configure: Update the application.properties file with your model details and server settings.&lt;/li&gt;
&lt;li&gt;Run: Start the Spring Boot application using the provided Java command.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  DevoxxGenie
&lt;/h2&gt;

&lt;p&gt;When the &lt;a href="https://github.com/mukel/llama3.java" rel="noopener noreferrer"&gt;Llama3.java&lt;/a&gt; Spring Boot application is running, you can use &lt;a href="https://plugins.jetbrains.com/plugin/24169-devoxxgenie" rel="noopener noreferrer"&gt;DevoxxGenie&lt;/a&gt; for local inference 🤩&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17rqyqrp7b0vznv1xig3.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17rqyqrp7b0vznv1xig3.jpeg" alt="DevoxxGenie" width="800" height="1197"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Future Directions
&lt;/h3&gt;

&lt;p&gt;The next step is to move the MatMul bottleneck to the GPU using TornadoVM. Also once GraalVM supports &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Externalise Llama3.java as a maven service dependency (if/when available)&lt;/li&gt;
&lt;li&gt;Add GPU support using TornadoVM &lt;/li&gt;
&lt;li&gt;GraalVM native versions 🍏&lt;/li&gt;
&lt;li&gt;LLM sharding capabilities&lt;/li&gt;
&lt;li&gt;Support for different models: BitNets &amp;amp; Ternary Models&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/mukel/llama3.java" rel="noopener noreferrer"&gt;Llama3.java&lt;/a&gt; and &lt;a href="https://github.com/tjake/Jlama" rel="noopener noreferrer"&gt;JLama&lt;/a&gt; represents a significant step forward in bringing large language model capabilities to the Java ecosystem. By providing an easy-to-use, OpenAI-compatible API and leveraging Java's latest performance features, this project opens up new possibilities for AI-driven applications in Java.&lt;/p&gt;

&lt;p&gt;Whether you're building a chatbot, a content generation tool, or any application that could benefit from advanced language processing, Llama3.java and JLama offers a promising solution. &lt;/p&gt;

&lt;p&gt;As these projects continues to evolve and optimise, it's well worth keeping an eye on for Java developers interested in the cutting edge of AI technology.&lt;/p&gt;

&lt;p&gt;Exciting times for Java Developers! ☕️🔥❤️&lt;/p&gt;

&lt;p&gt;~ Stephan Janssen&lt;/p&gt;

</description>
      <category>java</category>
      <category>llm</category>
      <category>llama3</category>
    </item>
    <item>
      <title>The Power of Full Project Context using LLM's</title>
      <dc:creator>Stephan Janssen</dc:creator>
      <pubDate>Wed, 03 Jul 2024 08:10:25 +0000</pubDate>
      <link>https://dev.to/stephanj/the-power-of-full-project-context-using-llms-463c</link>
      <guid>https://dev.to/stephanj/the-power-of-full-project-context-using-llms-463c</guid>
      <description>&lt;p&gt;I've tried integrating RAG into the DevoxxGenie plugin, but why limit myself to just some parts found through similarity search when I can go all out?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;RAG is so June 2024 😂&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here's a mind-blowing secret: most of the latest features in the Devoxx Genie plugin were essentially 'developed' by the latest Claude 3.5 Sonnet large language model using the entire project code base as prompt context 🧠 🤯&lt;/p&gt;

&lt;p&gt;It's like having an expert senior developer guiding the development process, suggesting 100% correct implementations for the following &lt;/p&gt;

&lt;p&gt;Devoxx Genie features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Allow a streaming response to be stopped&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Keep selected LLM provider after settings page&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Auto complete commands&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add files based on filtered text&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Show file icons in list&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Show plugin version number in settings page with GitHub link&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Support for higher timeout values&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Show progress bar and token usage bar&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I've rapidly stopped my OpenAI subscription and gave my credit card details to Anthropic...&lt;/p&gt;

&lt;h2&gt;
  
  
  Full Project Context
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;A Quantum Leap Beyond GitHub Copilot&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Imagine having your entire project at your AI assistant's fingertips. That's now a reality with the latest version of the Devoxx Genie IDEA plugin together with cloud based models like Claude Sonnet 3.5. &lt;/p&gt;

&lt;p&gt;BTW How long will it take until we can do this with local models?! &lt;/p&gt;

&lt;h2&gt;
  
  
  Add full project to prompt
&lt;/h2&gt;

&lt;p&gt;The latest version of the plugin allows you to add the full project to your prompt, your entire codebase now becomes part of the AI's context. This feature offers a depth of understanding that traditional code completion tools can only dream of.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzjapt9yhylw5692gq7xk.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzjapt9yhylw5692gq7xk.jpg" alt="Full Project Context"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Smart Model Selection and Cost Estimation
&lt;/h2&gt;

&lt;p&gt;The language model dropdown is not just a list anymore, it's your 'compass' for smart model selection 🤩 👇🏼&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;See available context window sizes for each cloud model&lt;/li&gt;
&lt;li&gt;View associated costs upfront&lt;/li&gt;
&lt;li&gt;Make data-driven decisions on which model to use for your project&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikg0q81jr411jsqivt2m.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikg0q81jr411jsqivt2m.jpg" alt="Smart Model DropDown"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Visualizing Your Context Usage
&lt;/h2&gt;

&lt;p&gt;Leverage the prompt cost calculator for precise budget management: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track token usage with a progress bar&lt;/li&gt;
&lt;li&gt;Get real-time updates on how much of the context window you're using&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Calculate token cost with Claude Sonnet 3.5&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6u8kv9csjhl6hiwj70h.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6u8kv9csjhl6hiwj70h.jpg" alt="Claude Sonnet 3.5"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Calculate cost with Google Gemini 1.5 Flash&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsd05dj2q61eune0drg0d.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsd05dj2q61eune0drg0d.jpg" alt="Gemini 1.5 Flash"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp9a0vsxjb7ypyf1zppcb.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp9a0vsxjb7ypyf1zppcb.jpg" alt="Project Added"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Cloud Models Overview
&lt;/h2&gt;

&lt;p&gt;Via the plugin settings pages you can see the "Token Cost &amp;amp; Context Window" for all the available cloud models. In a near future release you will be able to update this table. I should probably also support the local models context windows... #PullRequestsAreWelcome &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwgiz7qkz3b8kgdyjvqq.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwgiz7qkz3b8kgdyjvqq.jpg" alt="Cloud Models Overview"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Massive Projects?
&lt;/h2&gt;

&lt;p&gt;"But wait, my project is HUGE!" you might say 😅 Fear not. We've got options:&lt;/p&gt;

&lt;h3&gt;
  
  
  Leverage Gemini's Massive Context:
&lt;/h3&gt;

&lt;p&gt;Gemini's colossal 1 million token window isn't just big, it's massive. We're talking about the capacity to ingest approximately 30,000 lines of code in a single prompt. That's enough to digest many codebases, from the tiniest scripts to some decent big projects. &lt;br&gt;
But if that's not enough you have more options...&lt;/p&gt;

&lt;p&gt;BTW Google will be releasing 2M and even 10M token windows in the near future.&lt;/p&gt;

&lt;h3&gt;
  
  
  Smart Filtering:
&lt;/h3&gt;

&lt;p&gt;The new "Copy Project" plugin settings panel lets you&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exclude specific directories &lt;/li&gt;
&lt;li&gt;Filter by file extensions&lt;/li&gt;
&lt;li&gt;Remove JavaDocs to slim down your context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fekwilqkm2lmb659zjl14.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fekwilqkm2lmb659zjl14.jpg" alt="Smart Filtering"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Selective Inclusion
&lt;/h3&gt;

&lt;p&gt;Right-click to add only the most relevant parts of your project to the context and/or clipboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcdvikarhvwbov100wmrt.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcdvikarhvwbov100wmrt.jpg" alt="Right Click Options"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also copy your project to the clipboard, allowing you to paste your project code into an external chat window. This is a useful technique for sharing and collaborating on code 👍🏼 &lt;/p&gt;

&lt;h2&gt;
  
  
  The Power of Full Context: A Real-World Example
&lt;/h2&gt;

&lt;p&gt;The DevoxxGenie project itself, at about 70K tokens, fits comfortably within most high-end LLM context windows. This allows for incredibly nuanced interactions – we're talking advanced queries and feature requests that leave tools like GitHub Copilot scratching their virtual heads!&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Stepping into the Future of Development
&lt;/h2&gt;

&lt;p&gt;With Claude 3.5 Sonnet, Devoxx Genie isn't just another developer tool... it's a glimpse into the future of software engineering. As we eagerly await Claude 3.5 Opus, one thing is clear: we're witnessing a paradigm shift in AI-augmented programming.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Alan Turing, were he here today, might just say we've taken a significant leap towards AGI (for developers with Claude Sonnet 3.5)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Welcome to the cutting edge of AI-assisted development - welcome to DevoxxGenie 🚀&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/devoxx.genie" rel="noopener noreferrer"&gt;X Twitter&lt;/a&gt; - &lt;a href="https://github.com/devoxx/DevoxxGenieIDEAPlugin" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; - &lt;a href="https://plugins.jetbrains.com/plugin/24169-devoxxgenie" rel="noopener noreferrer"&gt;IntelliJ MarketPlace&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devoxxgenie</category>
      <category>claudeai</category>
      <category>idea</category>
      <category>intelli</category>
    </item>
    <item>
      <title>Devoxx Genie Plugin : an Update</title>
      <dc:creator>Stephan Janssen</dc:creator>
      <pubDate>Tue, 28 May 2024 11:32:10 +0000</pubDate>
      <link>https://dev.to/stephanj/devoxx-genie-plugin-an-update-53hg</link>
      <guid>https://dev.to/stephanj/devoxx-genie-plugin-an-update-53hg</guid>
      <description>&lt;p&gt;When I invited Anton Arhipov from JetBrains to present during the Devoxx Belgium 2023 keynote their early Beta AI Assistant, I was eager to learn if they would support local modals, as shown in the screenshot above. &lt;/p&gt;

&lt;p&gt;After seven months without any related news, it seemed unlikely that this would happen. So, I decided to develop my own IDEA plugin to support as many local and event cloud-based LLMs as possible. "DevoxxGenie" was born ❤️&lt;/p&gt;

&lt;p&gt;&lt;a href="https://plugins.jetbrains.com/plugin/24169-devoxxgenie" rel="noopener noreferrer"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwoqwyguem353aswy0thk.png" alt="IntelliJ Marketplace" width="400" height="79"&gt;&lt;/a&gt;&lt;br&gt;
Of course, I conducted a market study and couldn't find any plugins that were fully developed in Java. Even GitHub Copilot, which doesn't allow you to select a local LLM, is primarily developed in Kotlin and native code. But more importantly, are often closed sourced.&lt;/p&gt;

&lt;p&gt;I had already built up substantial LLM expertise by integrating &lt;a href="https://github.com/langchain4j/langchain4j" rel="noopener noreferrer"&gt;LangChain4J&lt;/a&gt; into the CFP.DEV web app, as well as developing Devoxx Insights (using Python) in early 2023. More recently, I created &lt;a href="https://github.com/stephanj/rag-genie" rel="noopener noreferrer"&gt;RAG Genie&lt;/a&gt;, which allows you to debug your RAG steps using Langchain4J and Spring Boot.&lt;/p&gt;
&lt;h2&gt;
  
  
  Swing Development
&lt;/h2&gt;

&lt;p&gt;I had never developed an IDEA plugin so I started studying some existing plugins to understand how they work. I noticed that some use a local web server, allowing them to more easily output the LLM response in HTML and stream it to the plugin.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foiw9v99tqnibkpa1mnho.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foiw9v99tqnibkpa1mnho.jpeg" alt="Trying to understand how the IDEA plugins work" width="800" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I wanted to start with a simple input prompt and focus on using the "good-old" JEditorPane Swing component which does support basic HTML rendering. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqbtt1n1dmc1a3ofe5o1o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqbtt1n1dmc1a3ofe5o1o.png" alt="JEditorPane rendering HTML" width="800" height="898"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By asking the LLM to respond in Markdown, I could parse the Markdown so each document node could be rendered to HTML while adding extra styling and UI components. For example, code blocks would include an easy to use "copy-to-clipboard" button or an "insert code" button (as shown above in screenshot).&lt;/p&gt;
&lt;h2&gt;
  
  
  Focus on Local LLM's
&lt;/h2&gt;

&lt;p&gt;I focused on supporting &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;, &lt;a href="https://gpt4all.io/index.html" rel="noopener noreferrer"&gt;GPT4All&lt;/a&gt;, and &lt;a href="https://lmstudio.ai/" rel="noopener noreferrer"&gt;LMStudio&lt;/a&gt;, all of which run smoothly on a Mac computer. Many of these tools are user-friendly wrappers around &lt;a href="https://github.com/ggerganov/llama.cpp" rel="noopener noreferrer"&gt;Llama.cpp&lt;/a&gt;, allowing easy model downloads and providing a REST interface to query the available models.&lt;br&gt;
Last week, I also added &lt;a href="https://jan.ai" rel="noopener noreferrer"&gt;"👋🏼 Jan"&lt;/a&gt; support because HuggingFace has endorsed this provider out-of-the-box.&lt;/p&gt;
&lt;h2&gt;
  
  
  Cloud LLM's, why not?
&lt;/h2&gt;

&lt;p&gt;Because I use ChatGPT on a daily basis and occasionally experiment with Anthropic Claude, I quickly decided to also support LLM cloud providers. A couple of weeks ago, Google released Gemini with API keys for Europe, so I promptly integrated those too. With support for OpenAI, Anthropic, Groq, Mistral, DeepInfra, and Gemini, I believe I have covered all the major players in the field.&lt;br&gt;
Please let me know if I'm missing any?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F45tillr4derhzrqmp7db.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F45tillr4derhzrqmp7db.png" alt="Snopshot of theDevoxxGenie LLM settings page" width="800" height="354"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Multi-LLM Collaborative Review
&lt;/h2&gt;

&lt;p&gt;The size of the chat memory can now be configured in v0.1.14 in the Settings page. This makes sense when you use an LLM which has a large window context, for example Gemini with 1M tokens.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmaddhg5un5gjv213toak.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmaddhg5un5gjv213toak.png" alt="Chat Memory" width="800" height="102"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The beauty of chat memory supporting different LLM providers is that with a single prompt, you can ask one model to review some code, then switch to another model to review the previous model's answer 🤩&lt;/p&gt;
&lt;h2&gt;
  
  
  Multi-LLM Collaborative Review
&lt;/h2&gt;

&lt;p&gt;The end result is a "Multi-LLM Collaborative Review" process, leveraging multiple large language models to sequentially review and evaluate each other's responses, facilitating a more comprehensive and nuanced analysis.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsazj1dbubvgcch6z3zma.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsazj1dbubvgcch6z3zma.png" alt="Multi-LLM Collaborative Review" width="800" height="568"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The results are really fascinating, for example I asked Mistral how I could improve a certain Java class and have OpenAI (GPT-4o) review the Mistral response! &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcnxricmfa6071kq0dw21.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcnxricmfa6071kq0dw21.png" alt="Mistral 8x7B using Groq" width="800" height="787"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Switched to OpenAI GPT-4o and asked if it could review the Mistral response&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1lnvsef014b3lfmqjaa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1lnvsef014b3lfmqjaa.png" alt="GPT-4o using OpenAI" width="800" height="785"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This all results in better code (refactoring) suggestions 🚀 &lt;/p&gt;
&lt;h2&gt;
  
  
  Streaming Responses
&lt;/h2&gt;

&lt;p&gt;The latest version of &lt;a href="https://github.com/devoxx/DevoxxGenieIDEAPlugin/" rel="noopener noreferrer"&gt;DevoxxGenie&lt;/a&gt; (v0.1.14) now also supports the option to stream the results directly to the plugin, enhancing real-time interaction and responsiveness. &lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/V8KopHVz8zY"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;It's still a beta feature because I need to find a way to add "Copy to Clipboard" or "Insert into Code" buttons before each code block starts. I do accept PRs, so if you know how to make this happen, some community ❤️ would be very welcome.&lt;/p&gt;

&lt;h2&gt;
  
  
  Program Structure Interface Driven (PSI) Context Prompt
&lt;/h2&gt;

&lt;p&gt;Another new feature I developed for v0.1.14 is support for "smart(er) prompt context" using Program Structure Interface (PSI). PSI is the layer in the IntelliJ Platform responsible for parsing files and creating the syntactic and semantic code model of a project.&lt;/p&gt;

&lt;p&gt;PSI allows me to populate the prompt with more information about a class without the user having to add the extra info. It's similar to Abstract Syntax Tree (AST) in Java but PSI has extra knowledge about the project structure, externally used libraries, search features and much more.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdmn7a5wtwmp58d4c2ybf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdmn7a5wtwmp58d4c2ybf.png" alt="AST Settings" width="800" height="209"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As a result the PSIAnalyzerService class (with a Java focus) can inject automatically more code details in the chat prompt.&lt;/p&gt;

&lt;p&gt;PSI driven context prompts are really another way to introduce some basic Retrieval Augmented Generation (RAG) into the equation 💪🏻 &lt;/p&gt;

&lt;h2&gt;
  
  
  What's next?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Auto completion??
&lt;/h3&gt;

&lt;p&gt;I'm not a big fan of auto completion "using TAB" where the editor is constantly bombarded with code suggestions which often don't make sense. Also because the plugin is LLM agnostic it would be much harder to implement because of (lack) of speed and quality while using local LLM's. However it could make sense to support this with currently smarter cloud based LLM's. &lt;/p&gt;

&lt;h3&gt;
  
  
  RAG support?
&lt;/h3&gt;

&lt;p&gt;Embedding your IDEA project files using a RAG service could make sense. But this would probably need to happen outside of the plugin because of the storage and background processes needed to make this happen? I've noticed that existing plugins use an external Docker image which includes some kind of REST service. Suggestions are welcome.&lt;/p&gt;

&lt;h3&gt;
  
  
  "JIRA" support?
&lt;/h3&gt;

&lt;p&gt;Wouldn't it be great if you are able to paste a (JIRA) issue and the plugin figures out how to fix/resolve the issue? A bit like what Devin was promised to do... &lt;/p&gt;

&lt;h3&gt;
  
  
  Compile &amp;amp; Run Unit tests?
&lt;/h3&gt;

&lt;p&gt;When you ask the plugin to write a unit test, the plugin could also compile the suggested code and even run it (using REPL?). That would be an interesting R&amp;amp;D exercise IMHO.&lt;/p&gt;

&lt;h3&gt;
  
  
  Introduce Agents
&lt;/h3&gt;

&lt;p&gt;All of the above basically results most likely in introducing smart(er) agents which do some extra LLM magic using shell scripts and or Docker services... &lt;/p&gt;

&lt;h2&gt;
  
  
  Community Support
&lt;/h2&gt;

&lt;p&gt;As of this writing, the plugin has already been downloaded 1,127 times. The actual number is likely higher because the &lt;a href="https://github.com/devoxx/DevoxxGenieIDEAPlugin/" rel="noopener noreferrer"&gt;Devoxx Genie GitHub project&lt;/a&gt; also publishes plugin builds in the releases, allowing users to manually install them in their IDEA.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkfwfnctdqf1vvth52i10.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkfwfnctdqf1vvth52i10.png" alt="IntelliJ Marketplace Downloads" width="800" height="410"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm hoping the project will gain more traction and that the developer community will step up to help with new features or even bug fixes. This was one of the main reasons for open-sourcing the project.&lt;br&gt;
"We ❤️ Open Source" 😜&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4d61mjsnqzdlssvqthgi.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4d61mjsnqzdlssvqthgi.jpeg" alt="We Love Open Source" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devoxx</category>
      <category>genai</category>
      <category>openai</category>
      <category>ollama</category>
    </item>
  </channel>
</rss>
