I recently came across a cool paper about MemOS — an operating system for the memory of large language models (LLM). In short, it is a breakthrough in AI knowledge management that solves the main pain points of modern LLMs: forgetfulness, inflexibility, and high update costs.
🔥 Key insights
1. Memory as a system resource
— Currently, memory in LLMs is either static model parameters or a short-term context (like in RAG). MemOS turns it into a manageable resource, like RAM in a computer.
— MemCube is a basic unit of memory that stores not only data, but also metadata: versions, access rights, frequency of use. This allows AI to “decide” what to save and what to forget.
2. Three types of memory in one system
— Plaintext: external knowledge (for example, articles or dialogues).
— Activation: a cache of intermediate model states (KV-cache), which speeds up the work.
— Parameter: long-term knowledge in the model weights.
— MemOS allows you to switch between them dynamically. For example, frequently used plaintext facts can be compressed into model parameters for efficiency.
3. Speeding up work via KV-cache
— Usually, LLMs re-process a long context each time, which is slow. MemOS caches key data in KV (key-value) format and inserts it directly into the model's attention mechanism.
— Result: response time is reduced by 60-90% without losing quality (see tables in the article).
4. Memory for real-world tasks
— Multi-dialogue: AI remembers the user's budget and preferences even after 20 messages.
— Personalization: the model adapts to style and roles (e.g., "doctor" vs. "manager").
— Knowledge update: new data (e.g., laws) is added without retraining the entire model.
5. MemStore — “App Store” for memory
— Experts can publish ready-made knowledge blocks (for example, medical recommendations), and users can install them in their LLMs as applications.
— This opens the way to a decentralized knowledge market, where memory becomes a commodity.
💡 What does this mean in practice?
— For developers: no more need to build crutches like RAG or fine-tuning for each task. MemOS provides a single API for memory management.
— For business: LLMs will be able to remember clients, update knowledge without downtime and scale cheaper.
— For users: chatbots will stop being “stupid” and forgetting what you talked about last week.
Links:
— Code on GitHub
— Paper
Top comments (1)
Do you have any examples of a chat-based application with MemOS that you could share?