The problem with cosine similarity for alerts
Alerts are short, terse, and full of nouns that look distinctive
but aren't. "checkout-service CrashLoopBackOff after deploy" and
"payments-service CrashLoopBackOff after deploy" have a high cosine
similarity in any reasonable embedding space, and they probably
have the same root cause family — bad config from a recent change.
But "checkout-service CrashLoopBackOff after deploy" and
"checkout-service high latency p99" embed close too, because both
mention checkout-service and both are SEV-2. They are completely
different incidents.
Free-text similarity ranks the wrong axes. It treats the service
name and the severity prefix as load-bearing tokens because they're
distinctive in the corpus, when actually the load-bearing tokens
are the error class, the dependency that broke, and the recent
change.
I needed a similarity function that ranked the causal tokens, not
the distinctive ones. Embedding similarity doesn't know which is
which.
The structured fingerprint
Every alert that enters the system gets reduced to a six-field
record I call the AlertFingerprint:
python
incident_agent/models.py
class AlertFingerprint(BaseModel):
model_config = ConfigDict(frozen=True)
error_class: str = "" # crashloopbackoff, oomkilled, http_5xx, ...
service_role: str = "" # api_edge, batch_worker, db_primary, ...
dependency_pattern: str = "" # configmap, dns, downstream_api, queue, ...
signal_shape: str = "" # spike, sustained, sawtooth, regression, ...
attack_pattern: str = "" # brute_force, port_scan, suspicious_login, ...
environment: str = "" # prod, staging, dev
Two design choices in this declaration matter.
The frozen=True config makes the model hashable. That lets me use
the fingerprint itself as a dedupe key for memory retains, which I
get to in a minute.
The empty-string defaults — instead of None — keep the canonical
serialization deterministic. Two alerts that produce the exact same
fingerprint serialize to the exact same string, which means they
hash the same, dedupe the same, and recall the same memories.
How the fields are extracted
The hot path is regex over the alert body, scoped per field. There's
also a cascadeflow-routed cheap-model extraction path for cases
where the regex misses, but the regex is the source of truth for
the "first run with no live API" demo mode.
python
incident_agent/fingerprint.py
ERROR_CLASS_PATTERNS: Final[dict[str, re.Pattern[str]]] = {
"crashloopbackoff": re.compile(r"\bcrashloopback\s*off\b", re.I),
"oomkilled": re.compile(r"\b(oomkilled|out[- ]of[- ]memory)\b", re.I),
"http_5xx": re.compile(r"\b5\d{2}\b|\b5xx\b", re.I),
"timeout": re.compile(r"\btime[- ]?out\b", re.I),
"tls_handshake": re.compile(r"\btls[- ]handshake\b", re.I),
# ... more
}
def extract_error_class(raw: str) -> str:
for label, pattern in ERROR_CLASS_PATTERNS.items():
if pattern.search(raw):
return label
return ""
Six small extractors, each independent, each cheap, each falsifiable
with a regex test. The fingerprint is the union of their outputs.
I tried doing this with a single LLM call that produced all six
fields in one shot. It worked, but it was slow, costly, and — the
real problem — non-deterministic. Two runs of the same alert
produced two slightly different fingerprints, which broke memory
recall. The regex-first design is boringly stable, which is exactly
what you want for a key in a content-addressed memory store.
The canonical serializer
A fingerprint is only useful as a memory key if two equivalent
fingerprints produce the same string. I serialize them like this:
python
incident_agent/fingerprint.py
def format_fingerprint(fp: AlertFingerprint) -> str:
return (
f"err={fp.error_class}|"
f"role={fp.service_role}|"
f"dep={fp.dependency_pattern}|"
f"sig={fp.signal_shape}|"
f"att={fp.attack_pattern}|"
f"env={fp.environment}"
)
def parse_fingerprint(serialized: str) -> AlertFingerprint:
fields = dict(part.split("=", 1) for part in serialized.split("|"))
return AlertFingerprint(
error_class=fields.get("err", ""),
service_role=fields.get("role", ""),
dependency_pattern=fields.get("dep", ""),
signal_shape=fields.get("sig", ""),
attack_pattern=fields.get("att", ""),
environment=fields.get("env", ""),
)
The order is fixed, the separator is fixed, every field is always
present. parse(format(fp)) == fp for every fingerprint, which I
verify with a property test:
python
tests/property/test_fingerprint.py
@given(fp=alert_fingerprint_strategy())
@settings(deadline=None, max_examples=200)
def test_fingerprint_round_trip(fp: AlertFingerprint) -> None:
"""Feature: openrecall, Property 1: format/parse round-trip."""
serialized = format_fingerprint(fp)
assert parse_fingerprint(serialized) == fp
Hypothesis throws 200 randomly-shaped fingerprints at the round-trip
on every CI run. If anyone changes the format string and forgets to
update the parser, the property fails before the change merges.
Why this beats embeddings for this domain
The fingerprint approach has three properties cosine similarity
doesn't.
It composes. "All alerts with error_class=crashloopbackoff
AND dependency_pattern=configmap" is one set membership check.
The same query on a vector store is a top-k with a similarity
threshold and a re-rank.
It's debuggable. When an alert recalls the wrong memory, I can
read the fingerprint and see exactly why. With embeddings, I can
read 1,536 floats.
It explains itself. When the cockpit displays "matched memory:
checkout-service CrashLoopBackOff due to bad environment variable —
score 0.95," the analyst can see that err=crashloopbackoff and
dep=configmap match across the two alerts. That's the explanation.
The trade-off is that the regex extractor will occasionally miss a
new error class until I add the pattern. I accepted that. New error
classes show up at human speed, not machine speed; adding a regex
takes ten minutes; and the cost of a missed match is one extra
strong-model call, not a wrong answer.
The dedupe key trick
Because AlertFingerprint is frozen and hashable, retaining a
memory is idempotent without me writing dedupe code:
python
incident_agent/memory.py
def retain(
self,
content: str,
*,
fingerprint: AlertFingerprint,
decision: TriageDecision,
dead_ends: list[str] | None = None,
analyst_id: str | None = None,
business_impact_minutes: int | None = None,
) -> str:
from .fingerprint import format_fingerprint # local import: avoid cycle
key = (format_fingerprint(fingerprint), decision)
if key in self._retained:
return "skipped duplicate"
# ... write to Hindsight Cloud + local mirror
self._retained.add(key)
return "retained"
The dedupe key is the tuple (serialized_fingerprint, decision).
Re-clicking the "retain" button doesn't double-write; calling
retain twice with the same fingerprint and decision short-circuits
to skipped duplicate. That makes the analyst flow fearless — over-
clicking is safe.
What the fingerprint enables downstream
Once every alert reduces to a fingerprint, three things become easy.
The first is memory recall. I send the canonical fingerprint string
to Hindsight as the recall query
and get back prior incidents keyed on the same DNA. Hindsight Cloud
gives me a managed instance with a clean API; the
GitHub repo hosts the
SDK. The fingerprint is the bridge.
The second is the bypass rule. The triage engine I built (which I
won't rehash here — there's a separate write-up on the
counterfactual memory side) uses fingerprint match score as one of
its four required bypass clauses. When the score crosses 0.85 and
the dominant prior decision is consistent, the strong model gets
skipped entirely. The fingerprint is what the threshold is computed
on.
The third is the queue ordering. When 100 alerts arrive at once,
sorting by fingerprint groups duplicates together, and the cockpit
renders them as a table where the analyst can see "I have eight
copies of err=crashloopbackoff|dep=configmap and the proposal is
the same for all of them." That visual collapse is impossible
without a structured key.
What I'd do differently next time
I overthought the cascadeflow-routed extraction path. I built it
expecting to need an LLM fallback for fields the regex couldn't
catch, and in practice the regex path handles 95% of the corpus.
The cheap-model extraction stayed in the codebase as the
fingerprint_with_trace route — it's there for the rare alert
that's truly novel — but on the live demo path, it almost never
fires. cascadeflow's
provider routing docs make the
fallback trivial to wire when you do need it; I just didn't need it
as often as I expected.
If I were starting over, I'd ship the regex-only extractor first,
prove the bypass works, and only add the model-routed path when a
specific class of misses justified it. Optionality has a cost.
The takeaway
When the matching axis is causal — error class, dependency, change
shape — a structured fingerprint beats embedding similarity. The
fingerprint becomes the join key for memory, the threshold input
for routing, the dedupe key for retains, and the visual identity
for queue rendering. One small Pydantic model is doing four jobs
at once.
The whole thing is six fields, six regex extractors, and a frozen
config. The code is at
https://github.com/Dawn-Fighter/openrecall
The Hindsight memory layer is what
gives the fingerprint somewhere to live;
Vectorize's overview of agent memory
is a good place to start if you want to understand why content-
addressed memory matters more than top-k recall for systems that
need to reproduce the same answer twice.

Top comments (0)