<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Chethas Dileep</title>
    <description>The latest articles on DEV Community by Chethas Dileep (@edneam).</description>
    <link>https://dev.to/edneam</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3940398%2F642bbc14-54c9-433c-851b-043d8a80bff6.jpeg</url>
      <title>DEV Community: Chethas Dileep</title>
      <link>https://dev.to/edneam</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/edneam"/>
    <language>en</language>
    <item>
      <title>Alert DNA: how I stopped matching incidents on free text</title>
      <dc:creator>Chethas Dileep</dc:creator>
      <pubDate>Tue, 19 May 2026 16:05:08 +0000</pubDate>
      <link>https://dev.to/edneam/alert-dna-how-i-stopped-matching-incidents-on-free-text-4kpo</link>
      <guid>https://dev.to/edneam/alert-dna-how-i-stopped-matching-incidents-on-free-text-4kpo</guid>
      <description>&lt;h2&gt;
  
  
  The problem with cosine similarity for alerts
&lt;/h2&gt;

&lt;p&gt;Alerts are short, terse, and full of nouns that look distinctive&lt;br&gt;
but aren't. "checkout-service CrashLoopBackOff after deploy" and&lt;br&gt;
"payments-service CrashLoopBackOff after deploy" have a high cosine&lt;br&gt;
similarity in any reasonable embedding space, and they probably&lt;br&gt;
have the same root cause family — bad config from a recent change.&lt;/p&gt;

&lt;p&gt;But "checkout-service CrashLoopBackOff after deploy" and&lt;br&gt;
"checkout-service high latency p99" embed close too, because both&lt;br&gt;
mention checkout-service and both are SEV-2. They are completely&lt;br&gt;
different incidents.&lt;/p&gt;

&lt;p&gt;Free-text similarity ranks the wrong axes. It treats the service&lt;br&gt;
name and the severity prefix as load-bearing tokens because they're&lt;br&gt;
distinctive in the corpus, when actually the load-bearing tokens&lt;br&gt;
are the error class, the dependency that broke, and the recent&lt;br&gt;
change.&lt;/p&gt;

&lt;p&gt;I needed a similarity function that ranked the causal tokens, not&lt;br&gt;
the distinctive ones. Embedding similarity doesn't know which is&lt;br&gt;
which.&lt;/p&gt;

&lt;h2&gt;
  
  
  The structured fingerprint
&lt;/h2&gt;

&lt;p&gt;Every alert that enters the system gets reduced to a six-field&lt;br&gt;
record I call the AlertFingerprint:&lt;/p&gt;

&lt;p&gt;python&lt;/p&gt;

&lt;h1&gt;
  
  
  incident_agent/models.py
&lt;/h1&gt;

&lt;p&gt;class AlertFingerprint(BaseModel):&lt;br&gt;
    model_config = ConfigDict(frozen=True)&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;error_class: str = ""         # crashloopbackoff, oomkilled, http_5xx, ...
service_role: str = ""        # api_edge, batch_worker, db_primary, ...
dependency_pattern: str = ""  # configmap, dns, downstream_api, queue, ...
signal_shape: str = ""        # spike, sustained, sawtooth, regression, ...
attack_pattern: str = ""      # brute_force, port_scan, suspicious_login, ...
environment: str = ""         # prod, staging, dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Two design choices in this declaration matter.&lt;/p&gt;

&lt;p&gt;The frozen=True config makes the model hashable. That lets me use&lt;br&gt;
the fingerprint itself as a dedupe key for memory retains, which I&lt;br&gt;
get to in a minute.&lt;/p&gt;

&lt;p&gt;The empty-string defaults — instead of None — keep the canonical&lt;br&gt;
serialization deterministic. Two alerts that produce the exact same&lt;br&gt;
fingerprint serialize to the exact same string, which means they&lt;br&gt;
hash the same, dedupe the same, and recall the same memories.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the fields are extracted
&lt;/h2&gt;

&lt;p&gt;The hot path is regex over the alert body, scoped per field. There's&lt;br&gt;
also a cascadeflow-routed cheap-model extraction path for cases&lt;br&gt;
where the regex misses, but the regex is the source of truth for&lt;br&gt;
the "first run with no live API" demo mode.&lt;/p&gt;

&lt;p&gt;python&lt;/p&gt;

&lt;h1&gt;
  
  
  incident_agent/fingerprint.py
&lt;/h1&gt;

&lt;p&gt;ERROR_CLASS_PATTERNS: Final[dict[str, re.Pattern[str]]] = {&lt;br&gt;
    "crashloopbackoff": re.compile(r"\bcrashloopback\s*off\b", re.I),&lt;br&gt;
    "oomkilled": re.compile(r"\b(oomkilled|out[- ]of[- ]memory)\b", re.I),&lt;br&gt;
    "http_5xx": re.compile(r"\b5\d{2}\b|\b5xx\b", re.I),&lt;br&gt;
    "timeout": re.compile(r"\btime[- ]?out\b", re.I),&lt;br&gt;
    "tls_handshake": re.compile(r"\btls[- ]handshake\b", re.I),&lt;br&gt;
    # ... more&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;def extract_error_class(raw: str) -&amp;gt; str:&lt;br&gt;
    for label, pattern in ERROR_CLASS_PATTERNS.items():&lt;br&gt;
        if pattern.search(raw):&lt;br&gt;
            return label&lt;br&gt;
    return ""&lt;/p&gt;

&lt;p&gt;Six small extractors, each independent, each cheap, each falsifiable&lt;br&gt;
with a regex test. The fingerprint is the union of their outputs.&lt;/p&gt;

&lt;p&gt;I tried doing this with a single LLM call that produced all six&lt;br&gt;
fields in one shot. It worked, but it was slow, costly, and — the&lt;br&gt;
real problem — non-deterministic. Two runs of the same alert&lt;br&gt;
produced two slightly different fingerprints, which broke memory&lt;br&gt;
recall. The regex-first design is boringly stable, which is exactly&lt;br&gt;
what you want for a key in a content-addressed memory store.&lt;/p&gt;

&lt;h2&gt;
  
  
  The canonical serializer
&lt;/h2&gt;

&lt;p&gt;A fingerprint is only useful as a memory key if two equivalent&lt;br&gt;
fingerprints produce the same string. I serialize them like this:&lt;/p&gt;

&lt;p&gt;python&lt;/p&gt;

&lt;h1&gt;
  
  
  incident_agent/fingerprint.py
&lt;/h1&gt;

&lt;p&gt;def format_fingerprint(fp: AlertFingerprint) -&amp;gt; str:&lt;br&gt;
    return (&lt;br&gt;
        f"err={fp.error_class}|"&lt;br&gt;
        f"role={fp.service_role}|"&lt;br&gt;
        f"dep={fp.dependency_pattern}|"&lt;br&gt;
        f"sig={fp.signal_shape}|"&lt;br&gt;
        f"att={fp.attack_pattern}|"&lt;br&gt;
        f"env={fp.environment}"&lt;br&gt;
    )&lt;/p&gt;

&lt;p&gt;def parse_fingerprint(serialized: str) -&amp;gt; AlertFingerprint:&lt;br&gt;
    fields = dict(part.split("=", 1) for part in serialized.split("|"))&lt;br&gt;
    return AlertFingerprint(&lt;br&gt;
        error_class=fields.get("err", ""),&lt;br&gt;
        service_role=fields.get("role", ""),&lt;br&gt;
        dependency_pattern=fields.get("dep", ""),&lt;br&gt;
        signal_shape=fields.get("sig", ""),&lt;br&gt;
        attack_pattern=fields.get("att", ""),&lt;br&gt;
        environment=fields.get("env", ""),&lt;br&gt;
    )&lt;/p&gt;

&lt;p&gt;The order is fixed, the separator is fixed, every field is always&lt;br&gt;
present. parse(format(fp)) == fp for every fingerprint, which I&lt;br&gt;
verify with a property test:&lt;/p&gt;

&lt;p&gt;python&lt;/p&gt;

&lt;h1&gt;
  
  
  tests/property/test_fingerprint.py
&lt;/h1&gt;

&lt;p&gt;@given(fp=alert_fingerprint_strategy())&lt;br&gt;
@settings(deadline=None, max_examples=200)&lt;br&gt;
def test_fingerprint_round_trip(fp: AlertFingerprint) -&amp;gt; None:&lt;br&gt;
    """Feature: openrecall, Property 1: format/parse round-trip."""&lt;br&gt;
    serialized = format_fingerprint(fp)&lt;br&gt;
    assert parse_fingerprint(serialized) == fp&lt;/p&gt;

&lt;p&gt;Hypothesis throws 200 randomly-shaped fingerprints at the round-trip&lt;br&gt;
on every CI run. If anyone changes the format string and forgets to&lt;br&gt;
update the parser, the property fails before the change merges.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this beats embeddings for this domain
&lt;/h2&gt;

&lt;p&gt;The fingerprint approach has three properties cosine similarity&lt;br&gt;
doesn't.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;It composes.&lt;/em&gt; "All alerts with error_class=crashloopbackoff&lt;br&gt;
AND dependency_pattern=configmap" is one set membership check.&lt;br&gt;
The same query on a vector store is a top-k with a similarity&lt;br&gt;
threshold and a re-rank.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;It's debuggable.&lt;/em&gt; When an alert recalls the wrong memory, I can&lt;br&gt;
read the fingerprint and see exactly why. With embeddings, I can&lt;br&gt;
read 1,536 floats.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;It explains itself.&lt;/em&gt; When the cockpit displays "matched memory:&lt;br&gt;
checkout-service CrashLoopBackOff due to bad environment variable —&lt;br&gt;
score 0.95," the analyst can see that err=crashloopbackoff and&lt;br&gt;
dep=configmap match across the two alerts. That's the explanation.&lt;/p&gt;

&lt;p&gt;The trade-off is that the regex extractor will occasionally miss a&lt;br&gt;
new error class until I add the pattern. I accepted that. New error&lt;br&gt;
classes show up at human speed, not machine speed; adding a regex&lt;br&gt;
takes ten minutes; and the cost of a missed match is one extra&lt;br&gt;
strong-model call, not a wrong answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The dedupe key trick
&lt;/h2&gt;

&lt;p&gt;Because AlertFingerprint is frozen and hashable, retaining a&lt;br&gt;
memory is idempotent without me writing dedupe code:&lt;/p&gt;

&lt;p&gt;python&lt;/p&gt;

&lt;h1&gt;
  
  
  incident_agent/memory.py
&lt;/h1&gt;

&lt;p&gt;def retain(&lt;br&gt;
    self,&lt;br&gt;
    content: str,&lt;br&gt;
    *,&lt;br&gt;
    fingerprint: AlertFingerprint,&lt;br&gt;
    decision: TriageDecision,&lt;br&gt;
    dead_ends: list[str] | None = None,&lt;br&gt;
    analyst_id: str | None = None,&lt;br&gt;
    business_impact_minutes: int | None = None,&lt;br&gt;
) -&amp;gt; str:&lt;br&gt;
    from .fingerprint import format_fingerprint  # local import: avoid cycle&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;key = (format_fingerprint(fingerprint), decision)
if key in self._retained:
    return "skipped duplicate"
# ... write to Hindsight Cloud + local mirror
self._retained.add(key)
return "retained"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The dedupe key is the tuple (serialized_fingerprint, decision).&lt;br&gt;
Re-clicking the "retain" button doesn't double-write; calling&lt;br&gt;
retain twice with the same fingerprint and decision short-circuits&lt;br&gt;
to skipped duplicate. That makes the analyst flow fearless — over-&lt;br&gt;
clicking is safe.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the fingerprint enables downstream
&lt;/h2&gt;

&lt;p&gt;Once every alert reduces to a fingerprint, three things become easy.&lt;/p&gt;

&lt;p&gt;The first is memory recall. I send the canonical fingerprint string&lt;br&gt;
to &lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;Hindsight&lt;/a&gt; as the recall query&lt;br&gt;
and get back prior incidents keyed on the same DNA. Hindsight Cloud&lt;br&gt;
gives me a managed instance with a clean API; the&lt;br&gt;
&lt;a href="https://github.com/vectorize-io/hindsight" rel="noopener noreferrer"&gt;GitHub repo&lt;/a&gt; hosts the&lt;br&gt;
SDK. The fingerprint is the bridge.&lt;/p&gt;

&lt;p&gt;The second is the bypass rule. The triage engine I built (which I&lt;br&gt;
won't rehash here — there's a separate write-up on the&lt;br&gt;
counterfactual memory side) uses fingerprint match score as one of&lt;br&gt;
its four required bypass clauses. When the score crosses 0.85 and&lt;br&gt;
the dominant prior decision is consistent, the strong model gets&lt;br&gt;
skipped entirely. The fingerprint is what the threshold is computed&lt;br&gt;
on.&lt;/p&gt;

&lt;p&gt;The third is the queue ordering. When 100 alerts arrive at once,&lt;br&gt;
sorting by fingerprint groups duplicates together, and the cockpit&lt;br&gt;
renders them as a table where the analyst can see "I have eight&lt;br&gt;
copies of err=crashloopbackoff|dep=configmap and the proposal is&lt;br&gt;
the same for all of them." That visual collapse is impossible&lt;br&gt;
without a structured key.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently next time
&lt;/h2&gt;

&lt;p&gt;I overthought the cascadeflow-routed extraction path. I built it&lt;br&gt;
expecting to need an LLM fallback for fields the regex couldn't&lt;br&gt;
catch, and in practice the regex path handles 95% of the corpus.&lt;br&gt;
The cheap-model extraction stayed in the codebase as the&lt;br&gt;
fingerprint_with_trace route — it's there for the rare alert&lt;br&gt;
that's truly novel — but on the live demo path, it almost never&lt;br&gt;
fires. cascadeflow's&lt;br&gt;
&lt;a href="https://docs.cascadeflow.ai/" rel="noopener noreferrer"&gt;provider routing docs&lt;/a&gt; make the&lt;br&gt;
fallback trivial to wire when you do need it; I just didn't need it&lt;br&gt;
as often as I expected.&lt;/p&gt;

&lt;p&gt;If I were starting over, I'd ship the regex-only extractor first,&lt;br&gt;
prove the bypass works, and only add the model-routed path when a&lt;br&gt;
specific class of misses justified it. Optionality has a cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;When the matching axis is causal — error class, dependency, change&lt;br&gt;
shape — a structured fingerprint beats embedding similarity. The&lt;br&gt;
fingerprint becomes the join key for memory, the threshold input&lt;br&gt;
for routing, the dedupe key for retains, and the visual identity&lt;br&gt;
for queue rendering. One small Pydantic model is doing four jobs&lt;br&gt;
at once.&lt;/p&gt;

&lt;p&gt;The whole thing is six fields, six regex extractors, and a frozen&lt;br&gt;
config. The code is at&lt;br&gt;
&lt;a href="https://github.com/Dawn-Fighter/openrecall" rel="noopener noreferrer"&gt;https://github.com/Dawn-Fighter/openrecall&lt;/a&gt; &lt;br&gt;
The Hindsight memory layer is what&lt;br&gt;
gives the fingerprint somewhere to live;&lt;br&gt;
&lt;a href="https://vectorize.io/what-is-agent-memory" rel="noopener noreferrer"&gt;Vectorize's overview of agent memory&lt;/a&gt;&lt;br&gt;
is a good place to start if you want to understand why content-&lt;br&gt;
addressed memory matters more than top-k recall for systems that&lt;br&gt;
need to reproduce the same answer twice.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdo448ricri2obue40stb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdo448ricri2obue40stb.png" alt=" " width="799" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>productivity</category>
      <category>cybersecurity</category>
    </item>
  </channel>
</rss>
