<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shira MAS</title>
    <description>The latest articles on DEV Community by Shira MAS (@shiramalka).</description>
    <link>https://dev.to/shiramalka</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3655802%2Feb41f6a3-39f6-4d74-8b8c-1cb43b9dd484.jpg</url>
      <title>DEV Community: Shira MAS</title>
      <link>https://dev.to/shiramalka</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shiramalka"/>
    <language>en</language>
    <item>
      <title>Creating Personal AI Agents in Multiplayer Games with LoRA Adapters: An Efficient and Memory-Saving Solution</title>
      <dc:creator>Shira MAS</dc:creator>
      <pubDate>Thu, 11 Dec 2025 15:37:19 +0000</pubDate>
      <link>https://dev.to/shiramalka/creating-personal-ai-agents-in-multiplayer-games-with-lora-adapters-an-efficient-and-memory-saving-56ee</link>
      <guid>https://dev.to/shiramalka/creating-personal-ai-agents-in-multiplayer-games-with-lora-adapters-an-efficient-and-memory-saving-56ee</guid>
      <description>&lt;p&gt;Have you ever imagined a multiplayer game where &lt;strong&gt;each player has a personal AI agent&lt;/strong&gt; that continues to chat, respond, and act in their style - even when they are offline?&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;NanoVerse&lt;/strong&gt;, a 2D online open-world game inspired by &lt;strong&gt;Minecraft&lt;/strong&gt;, this is exactly what we built. Using &lt;strong&gt;small language models (LLMs) and LoRA adapters&lt;/strong&gt;, we were able to give each player a &lt;strong&gt;unique voice, persistent personality, and a dynamic, living experience&lt;/strong&gt; - all running efficiently on a &lt;strong&gt;single server with a standard GPU&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1kgmjc40zw8qbpk5mde6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1kgmjc40zw8qbpk5mde6.png" alt="Pixel art illustration of the NanoVerse game world, showing player avatars interacting with a glowing blue personal AI agent and an AI brain icon, illustrating the concept of persistent AI companions." width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Why It Matters?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Games feel more alive when AI agents can &lt;strong&gt;mimic player behavior&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Remembering speech patterns&lt;br&gt;
Responding consistently&lt;br&gt;
Continuing to contribute to the world even when the player is offline&lt;/p&gt;

&lt;p&gt;This &lt;strong&gt;personalized AI&lt;/strong&gt; creates a &lt;strong&gt;deep and engaging gaming experience&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The challenge? Creating unique &lt;strong&gt;agents for hundreds of players&lt;/strong&gt; without maintaining hundreds of separate models, or relying on massive infrastructure. &lt;strong&gt;LoRA provides a precise, lightweight, and scalable solution&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Our Goal&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Our goal was to achieve &lt;strong&gt;high performance and low response times&lt;/strong&gt;, using a &lt;strong&gt;single server with an RTX 2080 Ti GPU, while maintaining&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;• Low memory usage&lt;br&gt;
• High speed&lt;br&gt;
• High-quality player experience&lt;/p&gt;

&lt;p&gt;To achieve this, we redesigned the &lt;strong&gt;model architecture&lt;/strong&gt; and built an &lt;strong&gt;optimized inference pipeline&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Key Concepts&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fine-Tuning&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A process in which the model’s existing weights are &lt;strong&gt;carefully updated&lt;/strong&gt; to adapt to a specific dataset (like a player’s style), without losing the model’s &lt;strong&gt;general knowledge&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LoRA (Low-Rank Adaptation)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;fine-tuning technique&lt;/strong&gt; that updates only a small portion of the base model’s parameters, allowing hundreds of &lt;strong&gt;unique adapters&lt;/strong&gt; to coexist without duplicating entire models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Personal Adapter&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;compact module&lt;/strong&gt; that captures the &lt;strong&gt;personality, style, and behavior&lt;/strong&gt; of each player - effectively the player’s &lt;strong&gt;linguistic fingerprint&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DPO (Direct Preference Optimization)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A framework that allows &lt;strong&gt;calibrating the model based on real player preferences&lt;/strong&gt;, improving &lt;strong&gt;alignment&lt;/strong&gt; and &lt;strong&gt;response quality&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fduiyr2f8oqimrarlizg3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fduiyr2f8oqimrarlizg3.png" alt="Architectural diagram showing a central Base AI Model connected to multiple Personal Adapters via LoRA Adapters, demonstrating the memory-efficient approach for managing unique player personalities." width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Our Technical Pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The solution is built on the &lt;strong&gt;Gemma-3 model family (1B/4B)&lt;/strong&gt;, serving as the &lt;strong&gt;base models&lt;/strong&gt; for all players.&lt;/p&gt;

&lt;p&gt;Each player receives their &lt;strong&gt;own LoRA adapter&lt;/strong&gt;, trained exclusively on their personal dialogue data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Per-Player Data Collection&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Full automation for collecting and organizing each player’s dialogues:&lt;/p&gt;

&lt;p&gt;• Style&lt;br&gt;
• Vocabulary&lt;br&gt;
• Preferences&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommendation&lt;/strong&gt;: Use &lt;strong&gt;clean, high-quality data&lt;/strong&gt;, even if limited in size, to avoid inconsistent behavior in agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. LoRA Fine-Tuning&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Training &lt;strong&gt;only a minimal subset of the model’s parameters&lt;/strong&gt; for each player allows &lt;strong&gt;hundreds of adapters&lt;/strong&gt; to run efficiently on the &lt;strong&gt;same server&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Real-Time Inference Pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;• &lt;strong&gt;Fast Adapter Switching&lt;/strong&gt; with dynamic and intelligent            loading&lt;br&gt;
• &lt;strong&gt;KV-Cache Optimization&lt;/strong&gt; supporting dozens of simultaneous chats on the GPU&lt;br&gt;
• &lt;strong&gt;Microservices with Python + FastAPI&lt;/strong&gt; providing a robust API for the game engine and UI&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Real-Time Chat System for Players&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fibw3i0bdhiykiqb36re6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fibw3i0bdhiykiqb36re6.png" alt="Flowchart illustrating the Real-Time Inference Pipeline, highlighting the connections between the GPU Inference Engine, Fast Adapter Switching, KV-Cache Optimization, and Microservices built with Python + FastAPI." width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The chat is a &lt;strong&gt;central feature of the game&lt;/strong&gt;, enabling &lt;strong&gt;player-to-player communication&lt;/strong&gt; and world interaction.&lt;/p&gt;

&lt;p&gt;It also includes &lt;strong&gt;Like/Dislike buttons&lt;/strong&gt; to collect &lt;strong&gt;preference data&lt;/strong&gt; for future &lt;strong&gt;agent calibration&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;LoRA and Beyond - Infrastructure for DPO&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd7t7wmrjduk37tkxd0g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd7t7wmrjduk37tkxd0g.png" alt="Diagram showing the closed-loop DPO System, where player feedback (Like/Dislike) flows to the DPO System, leading to continuous AI Agent Adjustment and improved alignment." width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Beyond player personalization, we built a &lt;strong&gt;full DPO infrastructure&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;• Like/Dislike buttons inside the chat&lt;br&gt;
• Collection of real preference data&lt;br&gt;
• Pipeline planning for future agent calibration based on community preferences&lt;/p&gt;

&lt;p&gt;This allows the system to &lt;strong&gt;improve automatically&lt;/strong&gt; based solely on &lt;strong&gt;real usage&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Key Insights for Developers&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;• &lt;strong&gt;LoRA adapters are a game-changer&lt;/strong&gt;: lightweight, efficient, and provide excellent     real-time personalization&lt;br&gt;
• &lt;strong&gt;Inference efficiency matters as much as model quality&lt;/strong&gt;: memory management,   precise pipelines, and &lt;strong&gt;KV-Cache optimization&lt;/strong&gt; are crucial&lt;br&gt;
• &lt;strong&gt;Automated learning loops pay off&lt;/strong&gt;: nightly adapter updates allow agents to improve &lt;strong&gt;without manual intervention&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Developer Takeaways&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyilgqpfc55whobftiuev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyilgqpfc55whobftiuev.png" alt="Visual summary of the Developer Takeaways: Start Small (icon of robot with LoRA), Invest in Inference (server icon with KV-Cache and Fast Pipelines), and Plan Alignment (DPO cloud icon), providing strategic advice for implementing personalized AI." width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you want to integrate &lt;strong&gt;personalized AI&lt;/strong&gt; into a game or interactive application while staying &lt;strong&gt;budget-friendly and hardware-efficient&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Start small&lt;/strong&gt; - LoRA on &lt;strong&gt;small models&lt;/strong&gt; is highly effective&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Invest in inference&lt;/strong&gt; - this is what drives &lt;strong&gt;true player experience&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Plan Alignment from day one&lt;/strong&gt; - collecting preference data is key for future improvements&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For &lt;strong&gt;technical details, code examples, or the full system&lt;/strong&gt;, check out our &lt;strong&gt;GitHub&lt;/strong&gt; (&lt;a href="https://github.com/KamaTechOrg/NanoVerse" rel="noopener noreferrer"&gt;https://github.com/KamaTechOrg/NanoVerse&lt;/a&gt;) or visit the &lt;strong&gt;NanoVerse website&lt;/strong&gt; (&lt;a href="https://nanoverse.me/" rel="noopener noreferrer"&gt;https://nanoverse.me/&lt;/a&gt;).&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>llm</category>
      <category>python</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
