Have you ever imagined a multiplayer game where each player has a personal AI agent that continues to chat, respond, and act in their style - even when they are offline?
In NanoVerse, a 2D online open-world game inspired by Minecraft, this is exactly what we built. Using small language models (LLMs) and LoRA adapters, we were able to give each player a unique voice, persistent personality, and a dynamic, living experience - all running efficiently on a single server with a standard GPU.
Why It Matters?
Games feel more alive when AI agents can mimic player behavior:
Remembering speech patterns
Responding consistently
Continuing to contribute to the world even when the player is offline
This personalized AI creates a deep and engaging gaming experience.
The challenge? Creating unique agents for hundreds of players without maintaining hundreds of separate models, or relying on massive infrastructure. LoRA provides a precise, lightweight, and scalable solution.
Our Goal
Our goal was to achieve high performance and low response times, using a single server with an RTX 2080 Ti GPU, while maintaining:
• Low memory usage
• High speed
• High-quality player experience
To achieve this, we redesigned the model architecture and built an optimized inference pipeline.
Key Concepts
Fine-Tuning
A process in which the model’s existing weights are carefully updated to adapt to a specific dataset (like a player’s style), without losing the model’s general knowledge.
LoRA (Low-Rank Adaptation)
A fine-tuning technique that updates only a small portion of the base model’s parameters, allowing hundreds of unique adapters to coexist without duplicating entire models.
Personal Adapter
A compact module that captures the personality, style, and behavior of each player - effectively the player’s linguistic fingerprint.
DPO (Direct Preference Optimization)
A framework that allows calibrating the model based on real player preferences, improving alignment and response quality.
Our Technical Pipeline
The solution is built on the Gemma-3 model family (1B/4B), serving as the base models for all players.
Each player receives their own LoRA adapter, trained exclusively on their personal dialogue data.
1. Per-Player Data Collection
Full automation for collecting and organizing each player’s dialogues:
• Style
• Vocabulary
• Preferences
Recommendation: Use clean, high-quality data, even if limited in size, to avoid inconsistent behavior in agents.
2. LoRA Fine-Tuning
Training only a minimal subset of the model’s parameters for each player allows hundreds of adapters to run efficiently on the same server.
3. Real-Time Inference Pipeline
• Fast Adapter Switching with dynamic and intelligent loading
• KV-Cache Optimization supporting dozens of simultaneous chats on the GPU
• Microservices with Python + FastAPI providing a robust API for the game engine and UI
Real-Time Chat System for Players
The chat is a central feature of the game, enabling player-to-player communication and world interaction.
It also includes Like/Dislike buttons to collect preference data for future agent calibration.
LoRA and Beyond - Infrastructure for DPO
Beyond player personalization, we built a full DPO infrastructure:
• Like/Dislike buttons inside the chat
• Collection of real preference data
• Pipeline planning for future agent calibration based on community preferences
This allows the system to improve automatically based solely on real usage.
Key Insights for Developers
• LoRA adapters are a game-changer: lightweight, efficient, and provide excellent real-time personalization
• Inference efficiency matters as much as model quality: memory management, precise pipelines, and KV-Cache optimization are crucial
• Automated learning loops pay off: nightly adapter updates allow agents to improve without manual intervention
Developer Takeaways
If you want to integrate personalized AI into a game or interactive application while staying budget-friendly and hardware-efficient:
- Start small - LoRA on small models is highly effective
- Invest in inference - this is what drives true player experience
- Plan Alignment from day one - collecting preference data is key for future improvements
For technical details, code examples, or the full system, check out our GitHub (https://github.com/KamaTechOrg/NanoVerse) or visit the NanoVerse website (https://nanoverse.me/).





Top comments (0)