DEV Community

Rost
Rost

Posted on

Choosing the Right LLM for Cognee: Local Ollama Setup

Choosing the Best LLM for Cognee demands balancing graph-building quality, hallucination rates, and hardware constraints.
Cognee excels with larger, low-hallucination models (32B+) via Ollama but mid-size options work for lighter setups.

Key Cognee Requirements

Cognee relies on the LLM for entity extraction, relation inference, and metadata generation. Models under 32B often produce noisy graphs, while high hallucination (e.g., 90%+) pollutes nodes/edges, degrading retrieval. Official docs recommend deepseek-r1:32b or llama3.3-70b-instruct-q3_K_M paired with Mistral embeddings.

Model Comparison Table

Model Params Hallucination (SimpleQA/est.) VRAM (quantized) Cognee Strengths Weaknesses
gpt-oss:20b 20B 91.4% ~16GB Fast inference, tool-calling Severe graph noise
Qwen3:14b 14B ~40-45% ~12-14GB Efficient on modest hardware Limited depth for graphs
Devstral Small 2 24B ~8-10% ~18-20GB Coding focus, clean entities Higher VRAM than Qwen3
Llama3.3-70b 70B ~30-40% ~40GB+ Optimal graph quality Heavy resource needs
Deepseek-r1:32b 32B Low (recommended) ~24-32GB Best for reasoning/graphs Slower on consumer GPUs

Data synthesized from Cognee docs, model cards, and benchmarks, the hallucination level data even though looks out of wack, might be not far off...

Recommendations by Hardware

  • High-end (32GB+ VRAM): Deepseek-r1:32b or Llama3.3-70b. These yield the cleanest graphs per Cognee guidance.
  • Mid-range (16-24GB VRAM): Devstral Small 2. Low hallucination and coding prowess suit structured memory tasks.
  • Budget (12-16GB VRAM): Qwen3:14b over gpt-oss:20b - avoid 91% hallucination pitfalls.
  • Thinking to avoid gpt-oss:20b for Cognee; there are notes that its errors amplify in unfiltered graph construction. But the inferrence speed on my GPU is 2+ times faster....

Quick Ollama + Cognee Setup

# 1. Pull model (e.g., Devstral)
ollama pull devstral-small-2:24b  # or qwen3:14b, etc.

# 2. Install Cognee
pip install "cognee[ollama]"

# 3. Env vars
export LLM_PROVIDER="ollama"
export LLM_MODEL="devstral-small-2:24b"
export EMBEDDING_PROVIDER="ollama"
export EMBEDDING_MODEL="nomic-embed-text"  # 768 dims
export EMBEDDING_DIMENSIONS=768

# 4. Test graph
cognee add --file "your_data.txt" --name "test_graph"
Enter fullscreen mode Exit fullscreen mode

Match embedding dims (e.g., 768, 1024) across config and vector store. Qwen3 Embeddings (unproven in Cognee) could work at 1024-4096 dims if Ollama-supported.

Prioritize low-hallucination models for production Cognee pipelines—your graphs will thank you.
Test on your hardware and monitor graph coherence.

Embedding models

Didn't think much on this one, but here is a table I brought together, for future reference

Ollama Model Size, GB Embedding Dimensions Context Length
nomic-embed-text:latest 0.274 768 2k
jina-embeddings-v2-base-en:latest 0.274 768 8k
nomic-embed-text-v2-moe 0.958 768 512
qwen3-embedding:0.6b 0.639 1024 32K
qwen3-embedding:4b 2.5 2560 32K
qwen3-embedding:8b 4.7 4096 32K
avr/sfr-embedding-mistral:latest 4.4 4096 32K

Useful links

Top comments (0)