DEV Community

Cover image for The Evolution of AI Memory: From Context Windows to True Long-Term Memory
Aun Raza
Aun Raza

Posted on

The Evolution of AI Memory: From Context Windows to True Long-Term Memory

The Evolution of AI Memory: From Context Windows to True Long-Term Memory

Artificial intelligence has come a long way, but one thing has always held it back: memory. Large Language Models (LLMs) are great at short conversations, yet they quickly forget earlier parts of an interaction. This makes them inconsistent, repetitive, and unable to handle tasks that need continuity like planning projects, writing books, or learning from experience.

1. The Purpose: Bridging the Gap Between Short-Term and Long-Term Understanding

Traditional LLMs operate primarily within a fixed context window. This means they only consider a limited number of tokens (words or sub-words) from the immediate past input when generating a response. While effective for short exchanges, this approach struggles with:

  • Inconsistency: Forgetting information from earlier parts of a conversation, leading to contradictory statements.
  • Repetition: Generating redundant information because the model has "forgotten" it previously mentioned it.
  • Lack of Long-Term Planning: Inability to perform tasks requiring long-term memory, such as writing a novel or managing a complex project.
  • Inability to Learn from Experience: Difficulty in retaining and applying knowledge gained from past interactions to improve future performance.

The goal of long-term memory solutions is to address these limitations by enabling AI agents to:

  • Persistently store and retrieve information.
  • Reason about and integrate new information with existing knowledge.
  • Adapt and improve their performance over time based on past experiences.
  • Maintain consistent and coherent interactions across extended periods.

2. Features: Approaches to Long-Term Memory
Features: Paths Toward Long-Term Memory

Different approaches are emerging, each with its strengths:

  • Vector Databases: Store past text as embeddings (vectors) in databases like Chroma or Pinecone. Useful for retrieving relevant info later.
  • Memory Networks: Neural networks with external “memory slots” that can read/write information for more fine-grained recall.
  • Knowledge Graphs: Represent info as entities and relationships, enabling reasoning and connections between ideas.
  • Summarization/Compression: **Condense past conversations into shorter summaries that fit within context windows, though some detail may be lost. **3. Code Example: Implementing Vector Database-Based Long-Term Memory with Langchain and Chroma

This example demonstrates how to implement a simple long-term memory system using Langchain, Chroma, and OpenAI embeddings.

Installation:

pip install langchain chromadb openai tiktoken
Enter fullscreen mode Exit fullscreen mode

Code:

import os
import openai
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain.chains import RetrievalQA

# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"
openai.api_key = os.environ["OPENAI_API_KEY"]

# 1. Load and split the document
loader = TextLoader("data.txt") # Replace data.txt with your text file
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

# 2. Create embeddings and store in Chroma
embeddings = OpenAIEmbeddings()
db = Chroma.from_documents(texts, embeddings, persist_directory="chroma_db") # Store in chroma_db directory
db.persist() # Persist the database to disk

# 3. Load the persisted database
db = Chroma(persist_directory="chroma_db", embedding_function=embeddings)

# 4. Create a retrieval QA chain
qa = RetrievalQA.from_chain_type(
    llm=openai.Completion.create, # Use OpenAI Completion API
    chain_type="stuff",  # "stuff" simply stuffs all retrieved documents into the prompt
    retriever=db.as_retriever(),
    chain_type_kwargs={"prompt": "You are a helpful assistant. Answer the question based on the context provided:\n{context}\nQuestion: {question}\nAnswer:"}
)

# 5. Ask questions
query = "What is the main topic of the document?"
result = qa.run(query)
print(f"Question: {query}")
print(f"Answer: {result}")

query = "Who are the key people mentioned in the document?"
result = qa.run(query)
print(f"Question: {query}")
print(f"Answer: {result}")
Enter fullscreen mode Exit fullscreen mode

Explanation:

  1. Load and Split Document: Loads a text file and splits it into smaller chunks using CharacterTextSplitter. This is important for managing the size of the data sent to the embedding model.
  2. Create Embeddings and Store in Chroma: Uses OpenAIEmbeddings to generate vector embeddings for each chunk of text. These embeddings are then stored in a Chroma vector database. persist_directory specifies where the database will be saved on disk.
  3. Load Persisted Database: Loads the previously saved Chroma database. This is crucial for accessing the long-term memory in subsequent interactions.
  4. Create RetrievalQA Chain: Creates a RetrievalQA chain from Langchain. This chain combines the LLM (in this case, OpenAI Completion API) with the vector database to answer questions based on the retrieved information. The chain_type="stuff" specifies that all retrieved documents will be included in the prompt sent to the LLM. The chain_type_kwargs allows customization of the prompt.
  5. Ask Questions: The qa.run(query) method sends a query to the LLM, retrieves relevant documents from the vector database, and generates an answer based on the retrieved context.

4. Installation: Setting Up the Environment

The code example utilizes several libraries:

  • Langchain: A framework for building applications powered by LLMs.
  • Chroma: An open-source embedding database.
  • OpenAI: For accessing OpenAI's embedding and language models.
  • tiktoken: For tokenizing text.

To install these libraries, use pip:

pip install langchain chromadb openai tiktoken
Enter fullscreen mode Exit fullscreen mode

You will also need an OpenAI API key. Sign up for an account at https://platform.openai.com/ and obtain your API key from the API keys section. Remember to set the OPENAI_API_KEY environment variable.

5. Conclusion: The Future of AI Memory

Giving AI real memory isn’t just a technical upgrade—it’s a game-changer. Instead of treating every conversation as brand new, future systems will learn, adapt, and stay consistent over time. Techniques like vector databases, memory networks, and knowledge graphs are early steps, but the destination is clear: AI that doesn’t just respond,but actually remembers.

Top comments (0)