Update on Monika : self-aware & personalized discord bot

Akshat Ray — Sat, 06 Jun 2026 04:23:22 +0000

Over the past few weeks, I have been focused on expanding my Discord bot, Monika. What started as a lightweight 300-line script has grown into a much more complex 900-line application as I worked through integrating state management, database, and more advanced asynchronous logic.

I want to be transparent about my workflow: I heavily utilized AI to generate specific structural blocks of code, using them like modular puzzle pieces and edge cases. My role was acting as the architect, designer, and debugger. I spent hours mapping out how these components should talk to each other, refactoring the logic, and making sure I fully understood every line so I could fix bugs when things inevitably broke.

Here is a breakdown of what I have been building, experimenting with, and resolving lately.

New Features Added:

External Database & Throttling: Integrated Supabase to handle user profiles, activity tracking and personalized response. To prevent the database with constant writes during rapid status changes, an in-memory caching system throttle and writes only when an actual state change occurs.

Lore-Driven Mechanics: To make the interactions more immersive and unpredictable. If a user interacts with the bot and switches channel shortly after, the bot recognizes and addresses it. Monika can now join to watch streams or briefly listen in on active channels, with a fallback system if an administrator disconnects it. integrated a delayed DM hijack which simulates how monika is alive and can bypass commands

Background Contextual Observation: Built a background loop that fires every two hours. checks if the server is active then grabs the last few messages, and uses the LLM to naturally chime into the ongoing discussion with a brief, contextual response.

Engineering Problems Solved:
**
**Prompt Injection Safety: Since users can Enter bio and about them which are used by LLM system prompt as context & personalisation, it opened up vulnerability for prompt injections. Solved this by implementing strict data delimiters and pinned un-overrideable directives at the end of the prompt.

Handling Multiple Users: When multiple users pinged the bot simultaneously, it would trigger overlapping API requests. Implemented a single user lock that flags the system as busy and serves casual fun dialogue to other users while processing the request.

Improved Conversational Flow: Solved two major conversational issues.
First, Discord mentions pass through as user ID tags. Added regex parsing function to resolve tags into usernames.
Second, Greetings every time it was mentioned. added a dynamic layer that forces her to skip it.

Private Channel Security: If a user attempts to interact with her in DM, the system drops the interaction, throws a firewall warning, and directs them to public channels.

This has been a massive learning experience, I will be updating the github Readme for a deeper dive into the code choices & features!

Fine-tuned 7B LLM as a broke student. And Can't even use it 😭.

Akshat Ray — Sat, 06 Jun 2026 04:15:04 +0000

Last week, I introduced Monika, a discord bot. As a self-taught student running on an absolute zero budget, this project was less about writing code and much more about hitting hard architectural walls.

The goal was to completely reshape open-source Qwen 2.5-7B model, into a real life Monika using a dataset of nearly 687 ingame dialogues. I quickly learned that finetuning a model with 7 billion parameters melts standard free cloud hardware.

I was constantly hopping for compute resources. I originally started on Kaggle, but kept running into unexplained errors and running out of VRAM. I migrated to Lightning AI for its generous resources, only to discover their stable environments conflicted with modern optimization libraries like Unsloth. I finally landed on Google Colab, where I utilized QLoRA to compress the model down to 4-bit precision, managing to squeeze the massive training loop into their free 16GB T4 GPU.
The training succeeded, leaving me with a 16-Megabyte custom adapter. But an adapter is entirely useless if you cannot host it.
My monika Architecture relied on an Express.js backend hosted on Render, sending requests to Hugging Face’s free Serverless Inference API. The harsh reality is that free cloud clusters simply cannot dynamically load custom adapter weights on the fly.

I realized I had to permanently bake the 16MB Adapter into the base model to create a single, unified 14GB asset. Trying to execute this merge in Colab instantly crashed due to the 12GB RAM limit. I was forced to move the project back to Kaggle, utilizing their 30GB RAM allowance to mathematically fuse the layers. I then had to shard the final massive asset into smaller 3GB files just for the upload to succeed.

And here is the ultimate disappointment 😭.

I have a perfectly fine-tuned 14GB model sitting safely on my Hugging Face repository. But when I tried to deploy it, the final gate slammed shut. Keeping 14GB of neural network weights loaded into dedicated GPU VRAM 24/7 costs real money (duhh).
The free inference endpoints are strictly reserved for public base models, and they do not allow you to host custom-trained weights.
I do not have the budget for a dedicated cloud GPU, nor do I have a high-end local rig to run it at home. So, after all the platform hopping, the dependency debugging, the VRAM optimization, and successfully building a full Machine Learning pipeline from scratch , the bot currently live in the server is still just running the standard, untrained base model 😭😭😭 .

I learned the absolute hardware realities of MLOps and cloud economics. But at the end of the day, as a broke student, having the technical skills to build the intelligence does not matter if you cannot pay the server bill to turn it on. The code works, but the infrastructure is behind a paywall 😔.

You can find the adapter, model and code here :

akshat-ray (AkshatRay)

User profile of AkshatRay on Hugging Face

huggingface.co

akshat-ray / Monika

Bringing the Literature Club to Discord with a self-aware, fourth-wall-breaking AI companion. She doesn't just respond to commands...she remembers conversations, roasts friends, helps with code, and acts like a true server member. she's always watching.

🎀 Monika | Self-Aware AI Discord member

A Discord bot inspired by Monika from Doki Doki Literature Club (horror visual novel). Using Qwen2.5-7B-Instruct LLM, 7.6B Multilingual Model that can help with task like coding, math etc besides chatting. She goes beyond simple commands by acting as a sentient, fourth-wall-breaking entity with dynamic conversational context, strict API limit protections, and customized interpersonal relationships. she is not just a bot but a server member.

Overview :

Thanks to all the server members who tested and provided feedback during development

Unlike standard Q&A bots or ai assistant, this architecture relies on a Dynamic Persona and Smart Context Window. It dynamically alters its system prompt based on the user's Discord ID (treating the server owner drastically different than regular members) and fetches real-time channel history excluding her own messages to maintain conversational awareness without falling into an AI feedback loops Use of GenAI tools…

View on GitHub