Armel BOBDA

Posted on Jan 25

Building a CLI Tool with Cognee: Lessons from 5 Epics

#cognee #python #knowledgegraph #cli

I just finished building Sentinel, a CLI tool that uses Cognee to detect energy conflicts in personal schedules. Five development epics. 860+ tests. Four critical bugs were found and fixed.

Along the way, I learned a lot about working with Cognee that isn't in the documentation. This article shares those lessons so you can avoid my mistakes.

What I Built

Sentinel analyses schedule text and builds a knowledge graph to find "energy collisions"—situations where a draining activity (dinner with a difficult relative) precedes a demanding one (important presentation).

$ sentinel paste < schedule.txt
✓ Extracted 7 entities
Found 6 relationships.
✓ Graph saved to ~/.local/share/sentinel/graph.db

$ sentinel check
⚠️  COLLISION DETECTED                    Confidence: 85%

[Aunt Susan] --DRAINS--> (drained)
                            |
                     CONFLICTS_WITH
                            |
                      (focused) <--REQUIRES-- [Strategy Presentation]

HTML export with collision highlighting:

The tool uses Cognee for entity extraction and relationship building, then applies custom collision detection logic on top.

Here's what I learned.

Lesson 1: Use CYPHER, Not GRAPH_COMPLETION

This one cost me hours of debugging.

The mistake:

# DON'T DO THIS for graph extraction
results = await cognee.search(
    SearchType.GRAPH_COMPLETION,
    query_text="*"
)

My unit tests (with mocked Cognee) passed. Production extracted zero entities.

The problem: GRAPH_COMPLETION returns LLM-generated prose, not structured graph data:

"The schedule contains a dinner event with Aunt Susan on Sunday,
which is described as emotionally draining..."

Useful for chat interfaces. Useless for graph algorithms.

The fix:

from cognee.api.v1.search import SearchType

# Get nodes
node_results = await cognee.search(
    query_text="MATCH (n) RETURN n",
    query_type=SearchType.CYPHER,
)

# Get edges
edge_results = await cognee.search(
    query_text="MATCH (a)-[r]->(b) RETURN a, r, b",
    query_type=SearchType.CYPHER,
)

Takeaway: If you need structured graph data for programmatic use, always use SearchType.CYPHER with explicit Cypher queries.

Lesson 2: Cognee Results Are Deeply Nested

When you get Cypher results back, don't expect a flat list of nodes.

Actual structure:

results = [
    {
        'search_result': [
            [
                [node1_data],  # <-- Your actual node is here
                [node2_data],
                ...
            ]
        ]
    }
]

Access pattern:

def extract_nodes(results):
    if not results:
        return []

    nodes = []
    search_result = results[0].get('search_result', [])

    if search_result:
        node_list = search_result[0]  # First level unwrap
        for node_wrapper in node_list:
            if isinstance(node_wrapper, list) and node_wrapper:
                node = node_wrapper[0]  # Second level unwrap
                nodes.append(node)

    return nodes

Takeaway: Write robust extraction helpers and test them against real Cognee output, not mocks.

Lesson 3: Filter to Entity Nodes Only

Cognee's graph contains multiple node types. Not all of them are what you want.

Node Type	What It Is	Keep?
`Entity`	Actual entities from your text	✅ Yes
`DocumentChunk`	Text segments	❌ No
`EntityType`	Category definitions	❌ No
`TextDocument`	Source document metadata	❌ No
`TextSummary`	LLM-generated summaries	❌ No

Filter pattern:

def extract_entities(nodes):
    return [
        node for node in nodes
        if node.get('type') == 'Entity'
    ]

Without this filter, your graph will be cluttered with infrastructure nodes that aren't useful for domain logic.

Lesson 4: Properties Are JSON Strings

Cognee returns node properties as JSON strings, not Python dicts:

# What you get
node = {
    'id': 'abc-123',
    'name': 'Aunt Susan',
    'type': 'Entity',
    'properties': '{"description": "Family member", "entity_type": "PERSON"}'
}

Parse them:

import json

def parse_properties(node):
    props = node.get('properties', '{}')
    if isinstance(props, str):
        try:
            return json.loads(props)
        except json.JSONDecodeError:
            return {}
    return props if isinstance(props, dict) else {}

Lesson 5: The LLM Will Generate Unexpected Relation Types

This was my biggest surprise. I expected Cognee to use consistent relation type names. Instead:

What I expected:

DRAINS, REQUIRES, INVOLVES, SCHEDULED_AT

What I got (sampling from multiple runs):

drains, depletes, exhausts, causes_fatigue,
emotionally_draining, negatively_impacts,
is_emotionally_draining, energy_draining,
leads_to_exhaustion, causes_exhaustion...

Eleven variations for one concept. Per run.

Why this happens: Cognee's LLM extraction has no ontology constraints. The model generates semantically correct but lexically variable relation names.

The fix: Build a normalisation layer. I wrote a 3-tier matching system:

# Tier 1: Exact match dictionary (85+ entries)
RELATION_MAP = {
    "drains": "DRAINS",
    "depletes": "DRAINS",
    "exhausts": "DRAINS",
    # ...
}

# Tier 2: Keyword matching (stems)
KEYWORDS = {
    "DRAINS": ["drain", "exhaust", "deplet", "fatigue"],
    # ...
}

# Tier 3: Fuzzy matching (RapidFuzz)
from rapidfuzz import fuzz, process
# Match against candidate phrases

Takeaway: Don't assume LLM output will be consistent. Build robust normalisation for any categorical data coming from Cognee.

I wrote a full deep-dive on this pattern: Taming LLM Output Chaos: A 3-Tier Normalisation Pattern

Lesson 6: Custom Prompts Change Everything

Cognee's cognify() function accepts a custom_prompt parameter. This was the key to getting domain-specific relationships.

Default behavior:

Generic entity extraction
Relations like involves, about, scheduled_at
No energy-domain relationships (DRAINS, REQUIRES)

With custom prompt:

EXTRACTION_PROMPT = """
You are extracting a PERSONAL ENERGY knowledge graph.

**REQUIRED RELATIONSHIP TYPES** (use ONLY these):
- DRAINS: Activity depletes energy/focus
- REQUIRES: Activity needs energy/focus
- CONFLICTS_WITH: Energy state conflicts with requirement
- SCHEDULED_AT: Activity occurs at time
- INVOLVES: Activity includes person/thing

**COLLISION PATTERN** (create when applicable):
[draining_activity] --DRAINS--> (energy_state) --CONFLICTS_WITH-->
[requiring_activity] --REQUIRES--> (resource)

**EXAMPLE**:
Input: "Sunday: Draining dinner. Monday: Important presentation."
Graph:
- [dinner] --DRAINS--> (emotional_energy)
- (emotional_energy) --CONFLICTS_WITH--> [presentation]
- [presentation] --REQUIRES--> (sharp_focus)
"""

await cognee.cognify(custom_prompt=EXTRACTION_PROMPT)

Results:

Before custom prompt: ~20% collision detection rate
After custom prompt: ~70% edge type accuracy (still needed normalisation)
After prompt + normalisation: 100% collision detection

Takeaway: Don't fight Cognee's defaults. Guide them with domain-specific prompts that include examples and explicit relationship ontologies.

Lesson 7: Node IDs Vary Too (Semantic Consolidation)

Even with good prompts and relation normalisation, I had one more problem:

Run 1: [dinner] --DRAINS--> (emotional_exhaustion)
Run 2: [dinner] --DRAINS--> (low_energy)
Run 3: [dinner] --DRAINS--> (drained_state)

Same concept, different node labels. My BFS collision detection couldn't find paths because it was doing exact string matching on node IDs.

The fix: Semantic node consolidation using RapidFuzz:

from rapidfuzz import fuzz

def group_similar_nodes(nodes, threshold=70):
    groups = []
    for node in nodes:
        merged = False
        for group in groups:
            if fuzz.WRatio(node.label, group[0].label) >= threshold:
                group.append(node)
                merged = True
                break
        if not merged:
            groups.append([node])
    return groups

def consolidate(graph):
    groups = group_similar_nodes(graph.nodes)
    # Pick canonical representative, rewrite edge references
    # ...

Takeaway: LLM variability affects both relation types AND node identity. Handle both.

Lesson 8: Mocked Tests Will Lie to You

I had 178 tests passing. All green. Two critical bugs in production.

Bug 1: SearchType.GRAPH_COMPLETION returned prose instead of graph data. My mock returned what I expected Cognee to return, not what it actually returns.

Bug 2: Rich console interpreted [node labels] as style markup. My tests didn't render through the actual Rich console.

The fix: Live API tests.

@pytest.mark.live
async def test_real_entity_extraction():
    """Verify actual Cognee behavior."""
    engine = CogneeEngine()
    graph = await engine.ingest("Dinner with Aunt Susan on Sunday")

    assert len(graph.nodes) > 0, "No entities extracted"
    labels = {n.label.lower() for n in graph.nodes}
    assert any("susan" in l for l in labels)

Run them manually before marking stories "done":

# Requires API key
uv run pytest tests/live/ -m live -v

# Skip in CI
uv run pytest -m "not live"

Takeaway: For LLM integrations, unit tests with mocks are necessary but not sufficient. Add live API tests for critical paths.

Lesson 9: Suppress Cognee's Logging (But Keep a Debug Mode)

Cognee produces verbose output during normal operation. Great for debugging, annoying for users.

Solution: Lazy import with suppression:

from contextlib import redirect_stdout, redirect_stderr
from io import StringIO

def get_engine():
    with redirect_stdout(StringIO()), redirect_stderr(StringIO()):
        import warnings
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            from sentinel.core.engine import CogneeEngine
    return CogneeEngine()

But keep a debug flag:

@click.option('--debug', '-d', is_flag=True)
def main(debug):
    if debug:
        engine = CogneeEngine()  # Normal import, verbose
    else:
        engine = get_engine()  # Suppressed

The Journey: 5 Epics in Numbers

Metric	Value
Development epics	5
Stories completed	37
Tests written	860+
Critical bugs found	4
Relation type mappings	85+
Collision detection rate	15% → 100%

The architecture that worked:

User Input
    ↓
Cognee Extraction (custom prompt)
    ↓
3-Tier Relation Mapping (exact → keyword → fuzzy)
    ↓
Semantic Node Consolidation (RapidFuzz grouping)
    ↓
BFS Collision Detection
    ↓
Rich Terminal Output

Each layer handles a different source of LLM variability.