If you've explored AI memory frameworks, you've probably encountered Cognee—an open-source memory engine that's transforming how AI agents handle information. While there's plenty of content online showing how to use Cognee with OpenAI and other paid models, comprehensive guides for running Cognee entirely locally with Ollama are surprisingly scarce. This blog fills that gap by walking you through a complete local setup using Ollama, covering everything from model selection to knowledge graph generation—all without external API dependencies or subscription costs
What is Cognee?
Cognee is an open-source AI memory engine that transforms how AI agents handle information. Unlike traditional large language models that treat every interaction as a blank slate, Cognee provides persistent, structured memory that allows AI systems to remember, reason, and build upon previous context across sessions.
At its core, Cognee addresses a fundamental limitation of modern AI systems—they forget everything. Ask an LLM a follow-up question, and it acts like you've never spoken before. That's not intelligence; that's mimicry. For production-grade applications requiring contextual continuity, consistency, and personalized responses, this ephemeral nature becomes a critical bottleneck.
Cognee solves this through a memory-first architecture that combines embeddings with graph-based triplet extraction (subject-relation-object) stored in knowledge graphs. When you feed documents into Cognee, it doesn't just store text chunks—it extracts entities, identifies relationships, and links everything into a queryable graph structure that serves as a persistent memory layer. This approach achieves approximately 90% accuracy compared to traditional RAG's 60%, making it solid enough for decision-making rather than guesswork.
The framework operates through three core operations: .add()
to ingest and prepare your data, .cognify()
to build the knowledge graph with embeddings, and .search()
to query with context using a combination of vector similarity and graph traversal. This hybrid approach enables context-aware recall where entities are represented with granular, context-rich connections rather than flat vector similarities.
Cognee has gained significant traction in the open-source community with over 7,000 GitHub stars, thousands of library downloads, and adoption across 200-300 projects. Its flexible architecture supports multiple backends—Neo4j, FalkorDB, KuzuDB, and NetworkX for graphs; Redis, Qdrant, and Weaviate for vectors; plus SQLite or Postgres for relational metadata. This poly-store design allows developers to deploy everything from simple chatbots to complex multi-agent systems with memory that scales from gigabytes to terabytes.
Setting Up Ollama
Ollama provides the easiest way to run large language models locally on your machine. The installation process is straightforward and works across Windows, macOS, and Linux.
Installing Ollama
For Windows and macOS: Visit the official Ollama website at ollama.com
and download the installer for your operating system. Run the installer and follow the on-screen prompts—the process takes just a few minutes. The installer automatically starts the Ollama server in the background and sets it to start on system boot.
For Linux: Open your terminal and run the following command :
curl -fsSL https://ollama.com/install.sh | sh
Once installed, verify the installation by running:
ollama --version
The Ollama server runs automatically on http://localhost:11434
after installation. You can verify it's running with:
curl http://localhost:11434
If the server is running, you'll see the response: Ollama is running
.
Pulling Required Models
For this tutorial, we'll need two models: one for language processing and one for generating embeddings.
Language Model - gpt-oss:20b: A compact yet capable model that balances performance with resource requirements :
ollama pull gpt-oss:20b
Embedding Model - avr/sfr-embedding-mistral: This model will handle text embeddings for Cognee's vector search capabilities:
ollama pull avr/sfr-embedding-mistral:latest
Both downloads may take a few minutes depending on your internet connection. To verify the models are ready, list all downloaded models:
ollama list
You should see both gpt-oss:20b
and avr/sfr-embedding-mistral:latest
in the output.
Installing Cognee
With Ollama running, install Cognee with Ollama support :
pip install "cognee[ollama]"
The [ollama] extras package includes all dependencies needed for local Ollama integration.
Installing BAML for Structured Output
Cognee uses specialized libraries to extract structured output from LLMs—primarily Instructor and BAML (BoundaryML). While both libraries serve the same purpose of ensuring LLMs return valid, structured data, BAML works more reliably with Ollama.
For our local Ollama setup, install Cognee with BAML support:
pip install "cognee[baml]"
Configuring Cognee for Ollama
Create a .env
file in your project directory to configure Cognee for local Ollama models :
# Use BAML for structured outputs
STRUCTURED_OUTPUT_FRAMEWORK="BAML"
# LLM Configuration
LLM_API_KEY="ollama"
LLM_MODEL="gpt-oss:20b"
LLM_PROVIDER="ollama"
LLM_ENDPOINT="http://localhost:11434/v1"
# Embedding Configuration
EMBEDDING_PROVIDER="ollama"
EMBEDDING_MODEL="avr/sfr-embedding-mistral:latest"
EMBEDDING_ENDPOINT="http://localhost:11434/api/embeddings"
EMBEDDING_DIMENSIONS=4096
HUGGINGFACE_TOKENIZER="Salesforce/SFR-Embedding-Mistral"
# BAML Configuration
BAML_LLM_PROVIDER="ollama"
BAML_LLM_MODEL="gpt-oss:20b"
BAML_LLM_ENDPOINT="http://localhost:11434/v1"
BAML_LLM_API_KEY="ollama"
# Database Settings (defaults)
DB_PROVIDER="sqlite"
VECTOR_DB_PROVIDER="lancedb"
GRAPH_DATABASE_PROVIDER="kuzu"
Key Points: Both LLM and embedding providers must be configured separately. The embedding dimensions (4096) match the model's output size. Cognee automatically loads this file when imported.
Building Your First Knowledge Graph with Cognee
Now that we have everything set up, let's create a simple example that demonstrates Cognee's core functionality. We'll add some text, process it into a knowledge graph, search it, and visualize the results.
The Complete Code
import cognee
import asyncio
async def main():
# Sample text to process
text = """
Quantum computing represents a revolutionary approach to computation
that leverages quantum mechanical phenomena like superposition and
entanglement. Unlike classical computers that use bits (0 or 1),
quantum computers use qubits that can exist in multiple states
simultaneously. This allows quantum computers to solve certain
problems exponentially faster than classical computers, particularly
in areas like cryptography, drug discovery, and optimization problems.
"""
# Add the text to Cognee
await cognee.add(text, dataset_name="quantum_computing")
# Process the data into a knowledge graph
await cognee.cognify(["quantum_computing"])
# Search the knowledge graph
results = await cognee.search("What is quantum computing")
# Visualize the knowledge graph
html_file = await cognee.visualize_graph("./graph_visualization.html")
print(f"✅ Graph visualization saved to: {html_file}")
# Display search results
print("\n=== Search Results ===")
for result in results:
print(result)
if __name__ == '__main__':
asyncio.run(main())
Understanding the Code
The code follows Cognee's three core operations we mentioned earlier:
Adding Data: The cognee.add()
function ingests your text and prepares it for processing. The dataset_name
parameter organizes your data into logical collections—in this case, "quantum_computing". You can add multiple documents to the same dataset.
Cognifying: The cognee.cognify()
function is where the magic happens. It processes your text using the local Ollama models we configured earlier, extracting entities (like "quantum computing," "qubits," "superposition") and their relationships (like "quantum computers use qubits," "qubits enable superposition"). These are structured into a knowledge graph with both vector embeddings and graph connections.
Searching: The cognee.search()
function performs hybrid retrieval—combining vector similarity search with graph traversal to find contextually relevant information. Unlike simple keyword matching, it understands the semantic meaning and relationships in your query.
Visualizing: The cognee.visualize_graph()
function generates an interactive HTML visualization of your knowledge graph. Simply open the generated graph_visualization.html
file in your browser to explore the entities and relationships Cognee extracted from your text.
Running the Code
Save the code to a file (e.g., cognee_demo.py
) and run it:
python cognee_demo.py
You'll see Cognee processing your text using the local Ollama models. Once complete, open graph_visualization.html
in your browser to see the knowledge graph—nodes represent entities like "quantum computing" and "qubits," while edges show their relationships.
Top comments (0)