DEV Community

Chinmay Bhosale
Chinmay Bhosale

Posted on • Edited on

Building 100% local AI memory with cognee

If you've explored AI memory frameworks, you've probably encountered Cognee

GitHub logo topoteretes / cognee

Memory for AI Agents in 6 lines of code

Cognee Logo

cognee - Memory for AI Agents in 6 lines of code

Demo Learn more · Join Discord · Join r/AIMemory Docs cognee community repo

GitHub forks GitHub stars GitHub commits Github tag Downloads License Contributors Sponsor

cognee - Memory for AI Agents  in 5 lines of code | Product Hunt topoteretes%2Fcognee | Trendshift

Build dynamic memory for Agents and replace RAG using scalable, modular ECL (Extract, Cognify, Load) pipelines.


🌐 Available Languages
:

Deutsch |
Español |
français |
日本語 |
한국어 |
Português |
Русский |
中文

Why cognee?

Get Started

Get started quickly with a Google Colab notebook , Deepnote notebook or starter repo

About cognee

cognee works locally and stores your data on your device. Our hosted solution is just our deployment of OSS cognee on Modal, with the goal of making development and productionization easier.

Self-hosted package:

  • Interconnects any kind of documents: past conversations, files, images, and audio transcriptions
  • Replaces RAG systems with a memory layer based on graphs and vectors
  • Reduces developer effort and cost, while increasing quality and precision
  • Provides Pythonic data…

—an open-source memory engine that's transforming how AI agents handle information. While there's plenty of content online showing how to use Cognee with OpenAI and other paid models, comprehensive guides for running Cognee entirely locally with Ollama are surprisingly scarce. This blog fills that gap by walking you through a complete local setup using Ollama, covering everything from model selection to knowledge graph generation—all without external API dependencies or subscription costs

What is Cognee?

Cognee is an open-source AI memory engine that transforms how AI agents handle information. Unlike traditional large language models that treat every interaction as a blank slate, Cognee provides persistent, structured memory that allows AI systems to remember, reason, and build upon previous context across sessions.

At its core, Cognee addresses a fundamental limitation of modern AI systems—they forget everything. Ask an LLM a follow-up question, and it acts like you've never spoken before. That's not intelligence; that's mimicry. For production-grade applications requiring contextual continuity, consistency, and personalized responses, this ephemeral nature becomes a critical bottleneck.

Cognee solves this through a memory-first architecture that combines embeddings with graph-based triplet extraction (subject-relation-object) stored in knowledge graphs. When you feed documents into Cognee, it doesn't just store text chunks—it extracts entities, identifies relationships, and links everything into a queryable graph structure that serves as a persistent memory layer. This approach achieves 92.5% accuracy compared to traditional RAG's 60%, making it solid enough for decision-making rather than guesswork.

The framework operates through three core operations: .add() to ingest and prepare your data, .cognify() to build the knowledge graph with embeddings, and .search() to query with context using a combination of vector similarity and graph traversal. This hybrid approach enables context-aware recall where entities are represented with granular, context-rich connections rather than flat vector similarities.

Cognee has gained significant traction in the open-source community with over 7,000 GitHub stars, thousands of library downloads, and adoption across 200-300 projects. Its flexible architecture supports multiple backends—Neo4j, FalkorDB, KuzuDB, and NetworkX for graphs; Redis, Qdrant, and Weaviate for vectors; plus SQLite or Postgres for relational metadata. This poly-store design allows developers to deploy everything from simple chatbots to complex multi-agent systems with memory that scales from gigabytes to terabytes.

Setting Up Ollama

Ollama provides the easiest way to run large language models locally on your machine. The installation process is straightforward and works across Windows, macOS, and Linux.

Installing Ollama

For Windows and macOS: Visit the official Ollama website at ollama.com and download the installer for your operating system. Run the installer and follow the on-screen prompts—the process takes just a few minutes. The installer automatically starts the Ollama server in the background and sets it to start on system boot.

For Linux: Open your terminal and run the following command :

curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Once installed, verify the installation by running:

ollama --version
Enter fullscreen mode Exit fullscreen mode

The Ollama server runs automatically on http://localhost:11434 after installation. You can verify it's running with:

curl http://localhost:11434
Enter fullscreen mode Exit fullscreen mode

If the server is running, you'll see the response: Ollama is running.

Pulling Required Models

For this tutorial, we'll need two models: one for language processing and one for generating embeddings.

Language Model - gpt-oss:20b: A compact yet capable model that balances performance with resource requirements :

ollama pull gpt-oss:20b
Enter fullscreen mode Exit fullscreen mode

Embedding Model - avr/sfr-embedding-mistral: This model will handle text embeddings for Cognee's vector search capabilities:

ollama pull avr/sfr-embedding-mistral:latest
Enter fullscreen mode Exit fullscreen mode

Both downloads may take a few minutes depending on your internet connection. To verify the models are ready, list all downloaded models:

ollama list
Enter fullscreen mode Exit fullscreen mode

You should see both gpt-oss:20b and avr/sfr-embedding-mistral:latest in the output.

Installing Cognee

With Ollama running, install Cognee with Ollama support :​

pip install "cognee[ollama]"
Enter fullscreen mode Exit fullscreen mode

The [ollama] extras package includes all dependencies needed for local Ollama integration.

Installing BAML for Structured Output

Cognee uses specialized libraries to extract structured output from LLMs—primarily Instructor and BAML (BoundaryML). While both libraries serve the same purpose of ensuring LLMs return valid, structured data, BAML works more reliably with Ollama.

For our local Ollama setup, install Cognee with BAML support:

pip install "cognee[baml]"
Enter fullscreen mode Exit fullscreen mode

Configuring Cognee for Ollama

Create a .env file in your project directory to configure Cognee for local Ollama models :

# Use BAML for structured outputs  
STRUCTURED_OUTPUT_FRAMEWORK="BAML"  

# LLM Configuration 
LLM_API_KEY="ollama"  
LLM_MODEL="gpt-oss:20b"  
LLM_PROVIDER="ollama"  
LLM_ENDPOINT="http://localhost:11434/v1"  

# Embedding Configuration
EMBEDDING_PROVIDER="ollama"  
EMBEDDING_MODEL="avr/sfr-embedding-mistral:latest"  
EMBEDDING_ENDPOINT="http://localhost:11434/api/embeddings"  
EMBEDDING_DIMENSIONS=4096  
HUGGINGFACE_TOKENIZER="Salesforce/SFR-Embedding-Mistral"  

# BAML Configuration
BAML_LLM_PROVIDER="ollama"  
BAML_LLM_MODEL="gpt-oss:20b"  
BAML_LLM_ENDPOINT="http://localhost:11434/v1"  
BAML_LLM_API_KEY="ollama"  

# Database Settings (defaults)
DB_PROVIDER="sqlite"  
VECTOR_DB_PROVIDER="lancedb"  
GRAPH_DATABASE_PROVIDER="kuzu"
Enter fullscreen mode Exit fullscreen mode

Key Points: Both LLM and embedding providers must be configured separately. The embedding dimensions (4096) match the model's output size. Cognee automatically loads this file when imported.

Building Your First Knowledge Graph with Cognee

Now that we have everything set up, let's create a simple example that demonstrates Cognee's core functionality. We'll add some text, process it into a knowledge graph, search it, and visualize the results.

The Complete Code

import cognee  
import asyncio  

async def main():  
    # Sample text to process
    text = """
    Quantum computing represents a revolutionary approach to computation 
    that leverages quantum mechanical phenomena like superposition and 
    entanglement. Unlike classical computers that use bits (0 or 1), 
    quantum computers use qubits that can exist in multiple states 
    simultaneously. This allows quantum computers to solve certain 
    problems exponentially faster than classical computers, particularly 
    in areas like cryptography, drug discovery, and optimization problems.
    """

    # Add the text to Cognee
    await cognee.add(text, dataset_name="quantum_computing")  

    # Process the data into a knowledge graph  
    await cognee.cognify(["quantum_computing"])  

    # Search the knowledge graph  
    results = await cognee.search("What is quantum computing")  

    # Visualize the knowledge graph
    html_file = await cognee.visualize_graph("./graph_visualization.html")  

    print(f"✅ Graph visualization saved to: {html_file}") 

    # Display search results  
    print("\n=== Search Results ===")  
    for result in results:  
        print(result)  

if __name__ == '__main__':  
    asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Understanding the Code

The code follows Cognee's three core operations we mentioned earlier:

Adding Data: The cognee.add() function ingests your text and prepares it for processing. The dataset_name parameter organizes your data into logical collections—in this case, "quantum_computing". You can add multiple documents to the same dataset.

Cognifying: The cognee.cognify() function is where the magic happens. It processes your text using the local Ollama models we configured earlier, extracting entities (like "quantum computing," "qubits," "superposition") and their relationships (like "quantum computers use qubits," "qubits enable superposition"). These are structured into a knowledge graph with both vector embeddings and graph connections.

Searching: The cognee.search() function performs hybrid retrieval—combining vector similarity search with graph traversal to find contextually relevant information. Unlike simple keyword matching, it understands the semantic meaning and relationships in your query.

Visualizing: The cognee.visualize_graph() function generates an interactive HTML visualization of your knowledge graph. Simply open the generated graph_visualization.html file in your browser to explore the entities and relationships Cognee extracted from your text.

Running the Code

Save the code to a file (e.g., cognee_demo.py) and run it:

python cognee_demo.py
Enter fullscreen mode Exit fullscreen mode

You'll see Cognee processing your text using the local Ollama models. Once complete, open graph_visualization.html in your browser to see the knowledge graph—nodes represent entities like "quantum computing" and "qubits," while edges show their relationships.

Find more about cognee at

Improve your AI infrastructure - AI memory engine

Cognee is an open source AI memory engine. Try it today to find hidden connections in your data and improve your AI infrastructure.

favicon cognee.ai

Top comments (1)

Collapse
 
hande_kafkas_16805c7d4eab profile image
Hande Kafkas

thanks for sharing Chinmay. This is super helpful!