You get paged at 3AM. You fix production half-asleep. By the time the graphs go green, the context is gone.
I built BlackoutOps — a morning-after incident brain — for the WeMakeDevs × Cognee "The Hangover Part AI: Where's My Context?" hackathon. It reconstructs the night you can't remember, tells you whether the team has seen the failure before, learns the confirmed root cause, and — because retention policies are real — provably forgets on command.
This is the honest build story: how I designed every feature around Cognee's four memory verbs — remember, recall, improve, and forget — what each one actually did for the build, and the two war stories the live platform handed me for free.
🔗 Repo: https://github.com/devanshug2307/blackoutops (MIT)
What I Built
The interesting part isn't the app — it's that the app has zero storage of its own. No database, no cache, no local vector index. Every button in the war-room UI is a live call to Cognee, organized around Cognee's memory lifecycle rather than a single search() endpoint.
Project: BlackoutOps — the morning-after incident brain
Built for: WeMakeDevs x Cognee, "The Hangover Part AI" (July 2026)
Stack: FastAPI + a static war-room UI, ~150-line memory layer
Memory: Cognee Cloud (remember / recall / improve / forget)
Storage: none of its own — Cognee is the database
Repo: github.com/devanshug2307/blackoutops (MIT)
(War-room UI below — left: the night's raw fragments; middle: the debrief chat; right: the memory lifecycle, where every card is labeled with the exact Cognee Cloud call it fires.)
The 3AM Hangover
Every on-call engineer knows this specific hangover. At 02:47 a Redis memory alert fires. At 03:15 PagerDuty finds you. For the next forty minutes you are a raccoon in a server room: kubectl logs, redis-cli INFO, a rollback, a config flag you half-remember from a runbook. At 03:56 the dashboards go green and you go back to sleep.
The next morning, the only artifact of that firefight is a green dashboard and a vague sense of dread. What broke? What did you actually run? Why did the rollback work? And the question that costs the most six months later: haven't we seen this exact failure before?
That knowledge normally dies in scrollback, and the org re-diagnoses the same eviction storm from scratch, over and over. The hackathon theme was "Where's My Context?" and this is the most literal lost-context problem I know.
Why a Graph, Not RAG
The artifacts of an incident are fragments in five different dialects: Prometheus alerts, PagerDuty pages, sleepy Slack messages, shell history, deploy logs. The meaning of the night lives in the connections between them, not in any single fragment:
- the deploy at 02:30 caused the eviction storm at 02:45
- the
rollout undoat 03:33 targeted that exact deploy - last night shares a failure pattern with an incident three weeks earlier — different service, same TTL-less-keys-in-shared-Redis mistake
A vector store can hand you "similar text." It cannot tell you that two incidents three weeks apart are the same disease in different clothes. That's a graph traversal. Cognee ingests raw text and builds a knowledge graph where services, people, commands, and causes become connected entities — so I get the connective answers for free. Seeded with two incident nights, my demo graph came out to 98 nodes and 186 edges.
The Four-Verb Lifecycle
Cognee's API is a memory lifecycle, not a database. The moment I mapped features to the four verbs, the architecture fell out of the problem. My whole memory layer is one honest function per verb — about 150 lines total.
| Cognee call | In the code | What it does in BlackoutOps |
|---|---|---|
remember() |
ingest_artifacts() |
Turns each raw ops fragment — an alert, a Slack line, my shell history — into permanent graph memory in a per-incident dataset. |
remember(session_id=…) |
store_qa() |
Logs every debrief Q&A as a typed QAEntry in session memory, so the app survives a refresh with its context intact. |
recall() |
ask() |
Auto-routed search across both incident datasets and session memory — graph-completion answers, not nearest-neighbor text. |
improve() |
file_postmortem() |
Feeds the human-confirmed root cause back into the graph and runs an enrichment pass, so the confirmed pattern leads the next answer. |
forget() |
purge() |
Dataset-scoped deletion that returns Cognee's own receipt — retention policy as a first-class button. |
One decision underpins all of it: datasets are the unit of meaning. Each incident gets its own dataset (incident_2026_07_05, incident_2026_06_14_redis_storm). That single choice buys isolation, explicit routing for cross-incident questions, and surgical deletion.
remember(): The Night, Ingested
Everything starts by connecting the same cognee package to Cognee Cloud with serve(), then pushing each raw artifact in as permanent graph memory. No parsing, no schema — Cognee does the entity extraction and graph building.
import cognee
# same package, but memory now lives in Cognee Cloud
await cognee.serve(url=TENANT_URL, api_key=API_KEY)
# each fragment -> permanent graph memory, in a per-incident dataset
for artifact in LAST_NIGHT: # alerts, Slack, shell history, deploy log
await cognee.remember(artifact, dataset_name="incident_2026_07_05")
There's a second, sneakier remember in the app, and it was the cheapest "wow" in the whole build. Every debrief Q&A gets stored to session memory as a typed QAEntry:
# session memory: one parameter, and the app survives a refresh
await cognee.remember(
cognee.QAEntry(question=q, answer=a),
session_id="morning-after-debrief",
)
Refresh the page and the UI's chat history is gone. But ask "what did we conclude earlier?" and the brain recalls it. The UI forgets; Cognee doesn't. Because the entry is typed, I can also attach a typed FeedbackEntry (👍/👎) to the exact answer it judges — human signal stored right next to the memory it grades.
recall(): The Morning Debrief
The debrief chat is one function. Every question is an auto-routed recall() over both incident datasets and session memory. I don't pick the search strategy; I pass every incident dataset and let auto_route=True decide.
entries = await cognee.recall(
"Have we seen this failure before?",
datasets=["incident_2026_07_05", "incident_2026_06_14_redis_storm"],
session_id="morning-after-debrief",
top_k=12,
auto_route=True,
)
This is where the demo stops being a chatbot and becomes a memory. Last night's outage was checkout-service v2.14.1 shipping cache warming that wrote per-session keys with no TTL into the shared redis-cache, filling maxmemory and triggering an eviction storm that broke checkout session lookups and cascaded into api-gateway 502s. Three weeks earlier, promo-service had done the exact same TTL-less thing during a sale. Ask "have we seen this before?" and the graph connects the two through their shared entities and answers yes, here's the twin. That's the question a plain vector store fumbles and a knowledge graph nails.
improve(): Memory That Learns
Postmortems usually go to a wiki to be read by no one. Here, filing the postmortem writes the human-confirmed root cause back into graph memory and then attempts Cognee's dedicated enrichment pass. I want to be precise and honest about this one, because it's the verb with the most nuance:
# 1) the confirmed pattern goes back through the full remember pipeline
await cognee.remember(POSTMORTEM, dataset_name="incident_2026_07_05")
# 2) then attempt the dedicated improve() enrichment pass
try:
await cognee.improve(dataset="incident_2026_07_05")
except Exception as e:
# honest receipt: on tenants that don't expose improve() yet,
# the remember pipeline above IS the enrichment we got
note = f"improve() route not exposed on this tenant ({e})"
The effect a user sees is real: after filing the postmortem, "what should we watch for next time?" answers with the confirmed pattern — TTL-less per-session keys in shared Redis — ranked first, now cross-referenced against the June incident. The memory adapts to human-verified truth. This is the part of the lifecycle almost nobody uses, and it's the part that turns a log pile into an institution's memory. I show the honest receipt in the UI either way, because a build story that hides its rough edges is a worse story.
forget(): Deletion as a Feature
Incident data is full of internal hostnames and human mistakes, so it has retention policies. BlackoutOps treats deletion as a first-class demo beat, not an apology buried in a settings page:
receipt = await cognee.forget(
dataset="incident_2026_06_14_redis_storm",
memory_only=True, # wipe graph + vector memory, keep the dataset name
)
Purge the June incident, get Cognee's deletion receipt, then ask about it again — and the brain honestly says it has no memory of that night. Watching recall() go from a detailed cross-incident answer to "I have no memory of that" in one click is more convincing than any privacy-policy paragraph. Provable forgetting is rarer, and more valuable, than remembering.
The Tenant That Didn't Exist
Mid-build, every request to my freshly provisioned Cognee Cloud tenant failed with Could not resolve host. The tenant was live — curl --resolve against the right IP returned a healthy heartbeat. The culprit: my Mac's resolver is Cloudflare's 1.1.1.1, which had cached an NXDOMAIN from before the tenant's DNS record existed. Google's 8.8.8.8 already knew the answer; Cloudflare confidently remembered the absence.
The fix became part of the project — a tiny dns_fallback.py that, when the system resolver fails for the tenant host, resolves it over DNS-over-HTTPS and pins the answer by patching socket.getaddrinfo:
def ensure_resolvable(hostname: str) -> str:
try:
socket.getaddrinfo(hostname, 443)
return f"{hostname}: system DNS OK"
except socket.gaierror:
pass # system resolver has a stale NXDOMAIN
ip = _resolve_via_doh(hostname) # ask dns.google over HTTPS instead
if not ip:
return f"{hostname}: UNRESOLVABLE"
_pinned[hostname] = ip # pin it for every client in-process
socket.getaddrinfo = _patched_getaddrinfo
return f"{hostname}: pinned to {ip} via DoH fallback"
Then a second layer of the same onion: it still failed under uvicorn, because uvloop does DNS in C and never calls socket.getaddrinfo. One --loop asyncio flag later, everything held. I appreciate the irony: while building an app about stale memory, I was defeated by a stale cache remembering that something didn't exist. Negative memory is still memory. Even DNS needed a forget().
A Bug on a Live Platform
Building on a live cloud during a hackathon is the best fuzzer there is, and I found the next bug the honest way — by breaking my own demo. On this Cloud build, remember() into a dataset name previously deleted with a full forget() fails forever with a 409 RetryError[ProgrammingError]. The name is tombstoned. A memory_only=True forget doesn't have the problem, so that's what BlackoutOps ships with. The minimal repro I filed upstream:
async def cycle(name, memory_only):
await cognee.remember("v1", dataset_name=name)
await cognee.forget(dataset=name, memory_only=memory_only)
try:
await cognee.remember("v2", dataset_name=name)
return "re-remember OK"
except Exception as e:
return f"re-remember FAILED: {e}"
# A (full forget): re-remember FAILED: 409 RetryError[ProgrammingError]
# B (memory-only forget): re-remember OK
# observed on Cognee Cloud server 1.2.2.dev0 / SDK 1.2.2, 2026-07-05
It went upstream as topoteretes/cognee#3895. This is also why my historical dataset is named incident_2026_06_14_redis_storm and not the cleaner incident_2026_06_14: I'd already tombstoned the tidy name during testing. The scar is in the codebase, and I left it there on purpose.
Four Lessons for Cognee Builders
If you're about to build on Cognee, here's what I wish I'd known at hour zero:
- Design around the lifecycle, not around search. The second I mapped features to the four verbs, the architecture wrote itself — one honest function per verb.
-
Datasets are your unit of meaning. Per-incident (or per-customer, per-project) datasets give you routing, isolation, and surgical
forget()for free. - Session memory is the cheapest wow. One parameter, and your app survives a refresh with its context intact. Reach for it before you build custom state.
- Demo the question grep can't answer. Every domain has one cross-cutting question a vector store fumbles. For incidents it's "have we seen this before?" Find yours and make the graph answer it.
Architecture at a Glance
war-room UI (static) Cognee Cloud (tenant)
chat . lifecycle . graph remember -> knowledge graph
| HTTP recall -> graph + session
FastAPI (app.py) -----> improve -> pattern enrichment
memory layer (memory.py) <----- forget -> surgical deletion
. cognee.serve(url, key) visualize -> force-graph HTML
. zero local storage
The app keeping no database of its own isn't a limitation I apologized for — it's the entire thesis. The brain is the product, and the brain is Cognee. Reconstruct, ask, learn, forget.
FAQ
What is Cognee's memory lifecycle?
Four verbs instead of one database call: remember() ingests text into a knowledge graph, recall() searches across that graph and session memory, improve() enriches memory with confirmed patterns, and forget() deletes it. Designing around those four is what makes memory feel like memory.
How is a knowledge graph different from a vector database for agent memory?
A vector DB finds semantically similar text. A knowledge graph stores entities (services, people, commands, causes) and their relationships, so it can answer connective questions — like two incidents three weeks apart sharing a root-cause pattern across different services. That traversal is exactly what nearest-neighbor search can't do.
What does session memory do?
Passing session_id writes to a fast session store that also syncs into the graph. Refreshing the browser wipes the UI's chat but the brain still remembers what you concluded. One parameter, big jump in perceived intelligence.
How do you delete data from an agent's memory?
forget(dataset=…) for surgical, dataset-scoped deletion. BlackoutOps uses forget(memory_only=True), which wipes graph and vector memory (recall honestly returns nothing) while keeping the dataset name reusable.
Sources and Links
Built solo in one appropriately sleepless day, with Claude Code (Claude Fable 5) as a pair programmer for scaffolding, debugging, and docs, under human direction — disclosed here per hackathon rules.
- BlackoutOps source (MIT): https://github.com/devanshug2307/blackoutops
- Cognee: https://github.com/topoteretes/cognee
- Cognee docs: https://docs.cognee.ai
- The upstream bug + repro: https://github.com/topoteretes/cognee/issues/3895
- Hackathon: https://www.wemakedevs.org/hackathons/cognee
If you're building on Cognee and want the full memory lifecycle wired end-to-end, the ~150-line memory.py in the repo is the whole map. Seed the night, and every verb is a live call you can watch fire.

Top comments (0)