<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mario Gutierrez</title>
    <description>The latest articles on DEV Community by Mario Gutierrez (@terrizoaguimor).</description>
    <link>https://dev.to/terrizoaguimor</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3880854%2F56d8a0a4-d491-4869-8f35-908d1b3d2b95.jpeg</url>
      <title>DEV Community: Mario Gutierrez</title>
      <link>https://dev.to/terrizoaguimor</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/terrizoaguimor"/>
    <language>en</language>
    <item>
      <title>Six failures, a 32-minute TPU lie, and the moment a language model ignored my prompt on purpose</title>
      <dc:creator>Mario Gutierrez</dc:creator>
      <pubDate>Wed, 03 Jun 2026 17:28:05 +0000</pubDate>
      <link>https://dev.to/terrizoaguimor/six-failures-a-32-minute-tpu-lie-and-the-moment-a-language-model-ignored-my-prompt-on-purpose-oa8</link>
      <guid>https://dev.to/terrizoaguimor/six-failures-a-32-minute-tpu-lie-and-the-moment-a-language-model-ignored-my-prompt-on-purpose-oa8</guid>
      <description>&lt;p&gt;Every language model you've ever used is a &lt;strong&gt;single-channel machine&lt;/strong&gt;: text goes in, text comes out, and the prompt is the &lt;em&gt;only&lt;/em&gt; force acting on the network. Our entire toolbox for evaluating LLMs quietly assumes that.&lt;/p&gt;

&lt;p&gt;I couldn't stop poking at a different question: &lt;strong&gt;what if a model also had a sense of its own internal state&lt;/strong&gt; — not what's in the prompt, but what &lt;em&gt;it&lt;/em&gt; is carrying — and that sense shaped how it answered, at every layer?&lt;/p&gt;

&lt;p&gt;Six signals. I call them &lt;strong&gt;proprioceptive channels&lt;/strong&gt; — by analogy to how your body knows where your arm is without looking. Not perception of the world; perception of &lt;em&gt;itself&lt;/em&gt;. What's salient in memory right now. Its mood. The time. Its caution posture. Which "self" is active. Which thread of the conversation it's on. I called the architecture &lt;strong&gt;MARS&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is the honest story of trying to prove it works — including the part where I failed six times in a row, the day my TPU lied to me for half an hour, and the test result that made me put the coffee down.&lt;/p&gt;

&lt;p&gt;Repo + everything below: 👉 &lt;strong&gt;&lt;a href="https://github.com/terrizoaguimor/tinymars" rel="noopener noreferrer"&gt;github.com/terrizoaguimor/tinymars&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Act 1: six iterations, six negatives
&lt;/h2&gt;

&lt;p&gt;For six rounds, I trained the channels to carry &lt;em&gt;content&lt;/em&gt; — a fact, a persona — and asked an LLM to judge whether the output reflected it. Six times, basically nothing. Flat. The kind of clean, repeated negative that whispers &lt;em&gt;"your idea is wrong, friend."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I almost shelved it. What saved it wasn't optimism — it was getting specific about &lt;em&gt;why&lt;/em&gt; it failed. The bug wasn't the architecture. &lt;strong&gt;It was me.&lt;/strong&gt; I was asking the channel to do the prompt's job: cram a whole fact into a pooled vector and have the model decode it back out. That's close to impossible, and it's not what the design is &lt;em&gt;for&lt;/em&gt;. (The LLM judges, unreliable on tone, hid the mistake under noise.)&lt;/p&gt;

&lt;p&gt;The fix was almost embarrassing in hindsight: &lt;strong&gt;put the content in the prompt&lt;/strong&gt; (like RAG), and let the channel carry only &lt;strong&gt;state&lt;/strong&gt; — &lt;em&gt;which&lt;/em&gt; memory is salient, &lt;em&gt;what&lt;/em&gt; posture to take. Re-ran with objective metrics and &lt;strong&gt;no LLM judges&lt;/strong&gt;. Six capabilities, all positive. The one that had been flattest (identity) gave the &lt;em&gt;biggest&lt;/em&gt; signal once I stopped testing it wrong.&lt;/p&gt;

&lt;p&gt;Lesson tattooed on me now: &lt;strong&gt;a flat negative often means a broken experiment, not a broken idea.&lt;/strong&gt; Measure the thing you actually built.&lt;/p&gt;

&lt;h2&gt;
  
  
  Act 2: "wait — this test can't even see what I care about"
&lt;/h2&gt;

&lt;p&gt;Even with 6/6, something nagged. The standard test asks: &lt;em&gt;does the channel say X better than a text prompt would?&lt;/em&gt; And honestly — for a simple instruction, a good instruction-tuned model follows text near-perfectly. So text is a strong baseline, and on that axis the channel ties it.&lt;/p&gt;

&lt;p&gt;But that's the &lt;strong&gt;wrong axis&lt;/strong&gt;. It's measuring a river current with a speedometer. The real claim isn't "the channel is a nicer way to write a prompt." It's that internal state is a &lt;strong&gt;second, perpendicular force&lt;/strong&gt; — one that can &lt;em&gt;override&lt;/em&gt; the text.&lt;/p&gt;

&lt;p&gt;So I built the test a single-channel model &lt;strong&gt;literally cannot be given&lt;/strong&gt;: set the channel to say one thing, write the prompt to say the &lt;strong&gt;opposite&lt;/strong&gt;, and measure who wins.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;capability&lt;/th&gt;
&lt;th&gt;times the internal state beat the contradicting prompt&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;affect&lt;/td&gt;
&lt;td&gt;85/86 (98%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;time&lt;/td&gt;
&lt;td&gt;80/80 (100%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ethics&lt;/td&gt;
&lt;td&gt;99/99 (100%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;264/265 (99.6%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The model followed its &lt;strong&gt;internal state over the explicit text, 264 times out of 265.&lt;/strong&gt; And the part that made me stare: &lt;strong&gt;I never trained it to do that.&lt;/strong&gt; It was only ever trained to &lt;em&gt;read&lt;/em&gt; the channel. That the channel would &lt;em&gt;win a fight with the prompt&lt;/em&gt; was never an objective — it emerged. I'd pre-registered the result that would've killed the claim (text wins) before running it. It didn't happen.&lt;/p&gt;

&lt;p&gt;That's the moment the project stopped feeling like "fine-tuning with extra steps" and started feeling like a new &lt;em&gt;kind&lt;/em&gt; of control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Act 3: the day the TPU lied to me
&lt;/h2&gt;

&lt;p&gt;dev.to is for builders, so here's the trench story. I spun up a fresh Cloud TPU, kicked off the eval, and watched the log sit there. And sit. &lt;strong&gt;Thirty-two minutes&lt;/strong&gt;, one CPU core pinned at 94%, &lt;em&gt;zero&lt;/em&gt; output. Not crashed — "running." Lying to my face.&lt;/p&gt;

&lt;p&gt;The culprit: a fresh install pulled &lt;code&gt;transformers 5.x&lt;/code&gt;, which fires &lt;code&gt;torch.compile&lt;/code&gt;/inductor inside the forward pass — pathological under &lt;code&gt;torch_xla&lt;/code&gt;. The model wasn't generating; it was stuck compiling on CPU forever. Then the VM got so saturated &lt;strong&gt;SSH wouldn't even let me in to kill it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The fix was a maze: downgrade to 4.x? — that version can't even &lt;em&gt;load&lt;/em&gt; the tokenizer (different bug). The actual answer was &lt;code&gt;transformers 5.x&lt;/code&gt; &lt;strong&gt;with &lt;code&gt;TORCHDYNAMO_DISABLE=1&lt;/code&gt;&lt;/strong&gt; — verified with a tiny "does it generate at all" smoke test before trusting it again. When SSH was locked out, a stop/start (not delete — never delete) freed the box. If you train Gemma-class models on TPU: write that env var on your hand.&lt;/p&gt;

&lt;h2&gt;
  
  
  Act 4: the honest part (this is why you can trust the 264)
&lt;/h2&gt;

&lt;p&gt;If a research project reports &lt;em&gt;everything&lt;/em&gt; as a win, be suspicious — somebody leaned on the scale. So here's the stuff that didn't go my way, on the record:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It's an adapter on a frozen base, at smoke scale&lt;/strong&gt; (~186M trained params on a frozen Gemma E2B). Not a from-scratch architecture &lt;strong&gt;yet&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;I predicted wrong, repeatedly.&lt;/strong&gt; I bet the "register" channels would just tie text. Two of them (time, ethics) the channel &lt;em&gt;won&lt;/em&gt; decisively. My calibration was off — in the architecture's favor, but off.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One whole test came back null, and I kept it.&lt;/strong&gt; I expected the channel to &lt;em&gt;persist&lt;/em&gt; better than text across long context. At the scale I can test (the model trains at 256 tokens), there was nothing to measure — the text didn't decay either. Untested, &lt;strong&gt;not refuted&lt;/strong&gt;. It's in the report as a flat line, because hiding it would make the 264 a lie too.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Act 5: what's next
&lt;/h2&gt;

&lt;p&gt;What I proved is a &lt;strong&gt;functional&lt;/strong&gt; category: internal state as a control force, measured under adversarial conflict. To make it an &lt;strong&gt;architectural&lt;/strong&gt; category, the channels have to be there &lt;strong&gt;from layer 1, trained from scratch&lt;/strong&gt; — not bolted onto a model that already knew how to talk.&lt;/p&gt;

&lt;p&gt;That experiment is specified. The trick (an epiphany I'm still smiling about): I don't reinvent the language part — I stand on a proven small-LM recipe (&lt;a href="https://github.com/karpathy/nanochat" rel="noopener noreferrer"&gt;nanochat&lt;/a&gt;) for "learn to speak," so the &lt;strong&gt;channels are the only new variable&lt;/strong&gt;. Param-matched control. A stop-loss, written down in advance: three iterations with no signal vs baseline and I downgrade the claim, in public.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you want to poke holes
&lt;/h2&gt;

&lt;p&gt;Everything's open — code, the technical report, every per-iteration metric, the objective scorers, and the pre-registration of the conflict test:&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://github.com/terrizoaguimor/tinymars" rel="noopener noreferrer"&gt;github.com/terrizoaguimor/tinymars&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Archived + citable: &lt;strong&gt;DOI &lt;a href="https://doi.org/10.5281/zenodo.20531347" rel="noopener noreferrer"&gt;10.5281/zenodo.20531347&lt;/a&gt;&lt;/strong&gt; · GPL-3.0 (code) / CC BY-SA 4.0 (docs).&lt;/p&gt;

&lt;p&gt;If you work on representation steering, control-theoretic alignment, or you just respect an honest negative result — come break it. The conflict test is the one I'd attack first. I'd genuinely love to be wrong in an interesting way.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>ai</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Why I'm building Hyphae: provenance over prediction (and the 3-line baseline that tied it)</title>
      <dc:creator>Mario Gutierrez</dc:creator>
      <pubDate>Fri, 29 May 2026 14:55:20 +0000</pubDate>
      <link>https://dev.to/terrizoaguimor/why-im-building-hyphae-provenance-over-prediction-and-the-3-line-baseline-that-tied-it-2e32</link>
      <guid>https://dev.to/terrizoaguimor/why-im-building-hyphae-provenance-over-prediction-and-the-3-line-baseline-that-tied-it-2e32</guid>
      <description>&lt;p&gt;A few months ago I set out to build a cognitive substrate without a large language model in the answering path. I had a thesis I liked, a Rust workspace, and a lot of conviction.&lt;/p&gt;

&lt;p&gt;Then I wrote a three-line baseline that tied it on every metric I cared about.&lt;/p&gt;

&lt;p&gt;This is the story of why that was the best thing that happened to the project — and why I'm still building it, just pointed at a sharper target.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem I actually care about
&lt;/h2&gt;

&lt;p&gt;When a language model answers a grounded question, it &lt;em&gt;paraphrases&lt;/em&gt; its sources. That paraphrase is fluent, often correct, and — this is the part that bothers me — &lt;strong&gt;impossible to bind back to its source byte-for-byte&lt;/strong&gt;. You can cite a document. You cannot prove, after the fact, that the words in the answer are the words that were stored, unaltered, at a known position.&lt;/p&gt;

&lt;p&gt;For a chatbot that doesn't matter. For anything that has to be &lt;em&gt;audited&lt;/em&gt; — a compliance trail, a medical or legal memory, an agent acting on your behalf over months — it matters a lot. "Trust me, I read the docs" is not a property you can verify.&lt;/p&gt;

&lt;p&gt;So I started building &lt;strong&gt;Hyphae&lt;/strong&gt;: a substrate that answers by emitting &lt;strong&gt;byte-identical quotations&lt;/strong&gt; of stored memory fragments, over a SHA-256 hash-chained journal, with no LLM in the cognition path. Rust, CPU-only, a single binary.&lt;/p&gt;

&lt;p&gt;The shape of it is simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Every stored fragment is appended verbatim to a hash chain.&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seq&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;head&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;journal&lt;/span&gt;&lt;span class="nf"&gt;.append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"memory_op"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fragment&lt;/span&gt;&lt;span class="nf"&gt;.bytes&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// An answer span is a byte-identical quotation of a stored fragment.&lt;/span&gt;
&lt;span class="c1"&gt;// Tamper with any historical entry and the recomputed chain breaks&lt;/span&gt;
&lt;span class="c1"&gt;// at the next link — verify() localises exactly where.&lt;/span&gt;
&lt;span class="n"&gt;journal&lt;/span&gt;&lt;span class="nf"&gt;.verify&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing here is cryptographically novel. Hash-chained logs are old and well understood — Haber &amp;amp; Stornetta in 1991, Merkle before that, Certificate Transparency, git. I want to be honest about that up front, because the interesting part isn't the chain.&lt;/p&gt;

&lt;h2&gt;
  
  
  The day a three-line baseline tied me
&lt;/h2&gt;

&lt;p&gt;I wanted to show Hyphae was &lt;em&gt;better&lt;/em&gt; than an LLM+RAG pipeline at grounded answering. So I built the comparison properly: a real retriever, reranking, six models across three retrieval modes, two corpora, twelve metrics.&lt;/p&gt;

&lt;p&gt;Then a reviewer asked the obvious question: &lt;em&gt;what does a trivial baseline score?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So I wrote &lt;code&gt;echo&lt;/code&gt; — a few lines that just print the retrieved fragment back. It tied Hyphae on every correctness and grounding metric. So did &lt;code&gt;echo + journal&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That stings for about a day. Then it becomes the whole point.&lt;/p&gt;

&lt;p&gt;The measured correctness and grounding were never properties of &lt;em&gt;my system&lt;/em&gt;. They are properties of &lt;strong&gt;verbatim quotation&lt;/strong&gt; itself. If you emit a stored span unchanged, of course it's "grounded" — it &lt;em&gt;is&lt;/em&gt; the source. Hyphae's seventeen subsystems weren't what made the answers auditable. The verbatim-emission-over-a-journal layer was. And that layer is &lt;strong&gt;addable to any extractive retrieval system&lt;/strong&gt; — it isn't Hyphae-specific at all.&lt;/p&gt;

&lt;p&gt;So I stopped claiming Hyphae was a better brain and started claiming something narrower and, I think, truer:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Verifiable provenance is a property you can add to grounded retrieval. A paraphrase destroys byte-level bindability to its source; a verbatim quotation preserves it, and a hash chain makes that binding independently auditable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The contribution isn't the hash chain. It's the &lt;em&gt;observation&lt;/em&gt;, and the &lt;em&gt;measurement&lt;/em&gt; of it against eighteen LLM configurations and a tamper-detection benchmark.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing the gaps, honestly
&lt;/h2&gt;

&lt;p&gt;Once you claim "tamper-evident," people who know what they're doing immediately ask where it breaks. Good. The threat model is the product.&lt;/p&gt;

&lt;p&gt;A bare hash chain catches a store-only attacker who edits a record in place. It does &lt;strong&gt;not&lt;/strong&gt; catch a &lt;em&gt;chain-aware&lt;/em&gt; attacker who recomputes every hash forward and rewrites the head — because the head lives in the same store. So I anchor the head with an Ed25519 signature held outside the store (the attacker can't re-sign). That closes it.&lt;/p&gt;

&lt;p&gt;But a single signature pins &lt;em&gt;a&lt;/em&gt; valid head, not &lt;em&gt;the latest&lt;/em&gt; one. Every head the journal ever had was, at its time, legitimately signed. An attacker can roll back to an earlier state and replay its genuine-but-stale anchor — and a lone signature check accepts it. So the heads get published to an &lt;strong&gt;append-only, hash-chained ledger&lt;/strong&gt;, and an auditor checks the current head against the ledger's &lt;em&gt;tail&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="c1"&gt;// A single signature pins *a* valid head.&lt;/span&gt;
&lt;span class="c1"&gt;// An append-only ledger pins *the latest* one.&lt;/span&gt;
&lt;span class="nf"&gt;verify_fresh_head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;current_head&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;ledger&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;verifying_key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// rollback rejected&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the pattern from Certificate Transparency and git, applied to memory provenance: the value isn't the chain, it's publishing the head to a monotonic log that third parties can compare.&lt;/p&gt;

&lt;p&gt;And I keep a column for what I &lt;em&gt;haven't&lt;/em&gt; closed: a store that withholds later ledger entries is only caught once an auditor gets the true tail from an external witness (a timestamp authority, a gossiped tree head). That's deployment work, and I'd rather write it down than pretend it's solved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this is going
&lt;/h2&gt;

&lt;p&gt;The direction got clearer the moment the echo baseline humbled me. I'm not building a better answer engine. I'm building &lt;strong&gt;provenance as a first-class, measurable property of grounded AI&lt;/strong&gt;, in the open. Concretely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A provenance benchmark.&lt;/strong&gt; Correctness benchmarks compare RAG systems on answer quality. There's no standard way to compare &lt;em&gt;verifiable-generation&lt;/em&gt; systems on whether tampering is detectable and localisable. So I built one: a tampering taxonomy, an adversary-capability matrix, and a scoring protocol any system can plug into. That's the axis I think actually matters for AI you have to trust over time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provenance as an addable layer.&lt;/strong&gt; The realizer-independence is the feature, not a caveat. The goal is for any extractive retriever to be able to adopt the layer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;External witnessing and key rotation&lt;/strong&gt; to harden the ledger for real deployments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The bigger picture&lt;/strong&gt; — this is one piece of &lt;a href="https://celiums.ai" rel="noopener noreferrer"&gt;Celiums&lt;/a&gt;, where the bet is that &lt;em&gt;memory&lt;/em&gt; is the foundation for AI agents you can actually audit, not an afterthought bolted onto a model.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  It's all open
&lt;/h2&gt;

&lt;p&gt;The substrate, the LLM+RAG comparator, every result envelope, the tamper-detection experiment, the provenance benchmark, and the full preprint are public. Code is Apache-2.0; the docs, corpora, and preprint are CC-BY-4.0.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Code:&lt;/strong&gt; &lt;a href="https://github.com/terrizoaguimor/hyphae-v2" rel="noopener noreferrer"&gt;https://github.com/terrizoaguimor/hyphae-v2&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preprint (Zenodo DOI):&lt;/strong&gt; &lt;a href="https://doi.org/10.5281/zenodo.20436643" rel="noopener noreferrer"&gt;https://doi.org/10.5281/zenodo.20436643&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm a solo, self-taught founder building this in public, which means the dead ends are public too — the echo baseline being the best example. If you work on retrieval, tamper-evident logs, or grounded generation, I'd genuinely like to hear where you think this breaks. The threat model only gets better when someone smarter than me attacks it.&lt;/p&gt;

</description>
      <category>rust</category>
      <category>ai</category>
      <category>opensource</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>I Have ADHD and I Keep Losing Context — So I Taught My AI to Remember for Me</title>
      <dc:creator>Mario Gutierrez</dc:creator>
      <pubDate>Fri, 17 Apr 2026 13:38:29 +0000</pubDate>
      <link>https://dev.to/terrizoaguimor/i-have-adhd-and-i-keep-losing-context-so-i-taught-my-ai-to-remember-for-me-afl</link>
      <guid>https://dev.to/terrizoaguimor/i-have-adhd-and-i-keep-losing-context-so-i-taught-my-ai-to-remember-for-me-afl</guid>
      <description>&lt;p&gt;I'm going to be honest about something that most people in tech don't talk about openly.&lt;/p&gt;

&lt;p&gt;I have ADHD. I'm 40. I'm a self-taught developer with no CS degree. And I lose context constantly.&lt;/p&gt;

&lt;p&gt;Not in the "oh I forgot where I put my keys" way. In the "I just spent 3 hours deep in a codebase, got interrupted by a Slack message, and now I genuinely cannot remember what I was doing or why" way.&lt;/p&gt;

&lt;p&gt;If you have ADHD, you know exactly what I'm talking about. If you don't — imagine your browser crashing and losing 47 tabs. That panic? That's every Tuesday for me.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI Paradox for ADHD Brains
&lt;/h2&gt;

&lt;p&gt;Here's the thing nobody warned me about: &lt;strong&gt;AI coding tools made my ADHD worse.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not better. Worse.&lt;/p&gt;

&lt;p&gt;Why? Because they're stateless. Every session is a blank slate. And for someone who already struggles with working memory, having to rebuild context from scratch every time I open Claude Code or ChatGPT is genuinely exhausting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Me at 9am: "I'm working on a Next.js app with Supabase auth, 
           JWT refresh tokens, the user table has these columns..."

Me at 2pm: "I'm working on a Next.js app with Supabase auth,
           JWT refresh tokens, the user table has these columns..."

Me at 9pm: "I'm working on a Next.js app with Supabase auth..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I type the same context paragraph 5-8 times a day. Every. Single. Day.&lt;/p&gt;

&lt;p&gt;For a neurotypical developer, this is annoying. For me, each repetition drains the limited executive function I have. By the third time, I'm frustrated. By the fifth, I'm done for the day.&lt;/p&gt;

&lt;p&gt;The tool that's supposed to help me think is making me spend my mental energy on &lt;em&gt;remembering what to tell it&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Breaking Point
&lt;/h2&gt;

&lt;p&gt;It happened on a Thursday at midnight. I was debugging a race condition. I'd been at it for two hours. Claude gave me a suggestion I'd already tried — because it didn't know I'd already tried it. Because it doesn't remember anything.&lt;/p&gt;

&lt;p&gt;I typed (and I'm not proud of this):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I ALREADY TRIED THAT. WE DISCUSSED THIS 20 MINUTES AGO IN THE PREVIOUS SESSION."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Then I realized: I was yelling at a machine for not having memory. And the machine was right to not remember — it was designed that way. Every AI tool is designed that way.&lt;/p&gt;

&lt;p&gt;That's when something clicked. Not the bug (that took another hour). Something else:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem isn't AI intelligence. It's AI amnesia.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Actually Needed
&lt;/h2&gt;

&lt;p&gt;I sat down and wrote a list of what would actually help my ADHD brain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Don't make me repeat context.&lt;/strong&gt; Ever. If I told you once, you should know it forever.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remember what worked AND what didn't.&lt;/strong&gt; So you don't suggest the thing I already tried and rejected.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Know when I'm frustrated.&lt;/strong&gt; If I've been debugging the same thing for 2 hours at midnight, maybe don't give me a 500-word explanation. Just give me the fix.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Carry context across projects.&lt;/strong&gt; My coding style doesn't change when I switch repos. My preferences don't reset.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forget things naturally.&lt;/strong&gt; Not everything is worth remembering forever. That random CSS hack from 6 months ago? Let it fade.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I looked for tools that did this. I found vector databases pretending to be memory. I found RAG pipelines that retrieve documents but don't understand context. I found "memory" features that are just conversation logs with a search bar.&lt;/p&gt;

&lt;p&gt;Nothing actually modeled how memory works. How &lt;em&gt;my&lt;/em&gt; brain works (when it works).&lt;/p&gt;

&lt;h2&gt;
  
  
  So I Started Building
&lt;/h2&gt;

&lt;p&gt;I'm not going to pitch you anything. But I will tell you what I learned, because maybe it's useful to someone else.&lt;/p&gt;

&lt;p&gt;I started reading neuroscience papers. Real ones. Not Medium articles — actual research on how human memory works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The Ebbinghaus forgetting curve&lt;/strong&gt; — memories decay exponentially unless reinforced. Your brain doesn't delete things; they just fade. This is the opposite of how databases work (store forever, delete explicitly).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The PAD emotional model&lt;/strong&gt; — Pleasure, Arousal, Dominance. Psychologists use three dimensions to map any emotional state. Turns out, emotions are not decorations on memory — they're part of the retrieval mechanism. You remember emotional events better. Your brain literally prioritizes them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Circadian rhythms affect cognition&lt;/strong&gt; — you're not the same developer at 9am and 11pm. Your ability to focus, recall, and make decisions fluctuates. Any system that treats you the same at all times is ignoring biology.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I started modeling these concepts in code. Not as a product — as a personal tool. Something that would sit between me and my AI tools and just... remember things for me.&lt;/p&gt;

&lt;p&gt;The first version was ugly. 200 lines of JavaScript and a JSON file. But it worked. I told it my preferences once, and the next day, they were still there. I told it about a debugging approach that failed, and it remembered that too.&lt;/p&gt;

&lt;p&gt;For the first time in years, I felt like my tools were adapting to me instead of the other way around.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned About Memory (and ADHD)
&lt;/h2&gt;

&lt;p&gt;Building this taught me things about my own brain:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Context is not content.&lt;/strong&gt; There's a difference between "what was said" and "what matters." Most AI memory solutions store conversations. But what I need is &lt;em&gt;context&lt;/em&gt; — the decisions, the preferences, the patterns. Not the 400 lines of chat where we discussed them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Forgetting is a feature, not a bug.&lt;/strong&gt; My ADHD brain forgets things constantly, and yes, it's frustrating. But it also means I don't carry stale assumptions. I approach problems fresh. The best memory system isn't one that remembers everything — it's one that remembers the right things and lets the rest fade.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Emotions are metadata.&lt;/strong&gt; When I'm frustrated with a solution, that frustration is information. It means "this approach has a problem, even if the code technically works." Any memory system that strips emotional context is throwing away signal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. My ADHD is a design constraint, not a disability.&lt;/strong&gt; When I design for my own limitations — short working memory, need for external structure, sensitivity to context switches — I end up building things that work better for everyone. Turns out, neurotypical developers also hate re-explaining context to their AI tools. They're just more patient about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Truth
&lt;/h2&gt;

&lt;p&gt;Here's what I think about late at night:&lt;/p&gt;

&lt;p&gt;Every major AI company is racing to build bigger models, longer context windows, better reasoning. And that's great. But none of them are building memory.&lt;/p&gt;

&lt;p&gt;Not real memory. Not the kind that persists, decays, carries emotion, and adapts to you over time.&lt;/p&gt;

&lt;p&gt;They're building incredibly smart assistants with permanent amnesia. And we've all just... accepted that?&lt;/p&gt;

&lt;p&gt;I didn't accept it. I couldn't. My brain wouldn't let me.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where I Am Now
&lt;/h2&gt;

&lt;p&gt;I'm a solo founder. Venezuelan, living in Medellín, Colombia. Zero employees. I run the company with AI agents (which is a whole other post).&lt;/p&gt;

&lt;p&gt;I've been working on this problem for months. Some days I feel like I'm building something important. Other days I feel like I'm a guy with ADHD who got hyperfocused on a niche problem and can't stop.&lt;/p&gt;

&lt;p&gt;Both are probably true.&lt;/p&gt;

&lt;p&gt;If you're a developer with ADHD — or honestly, any developer who's tired of being the memory system for your AI tools — I'd love to hear how you deal with it. What workarounds have you found? What do you wish your tools remembered?&lt;/p&gt;

&lt;p&gt;And if you're building AI tools: please, for the love of everything, add persistent memory. Your ADHD users will thank you. Your neurotypical users will thank you too, they just won't know why.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is my first post here. I'm Mario. I'll probably write about building things alone, neurodivergent entrepreneurship, and why AI tools should be designed for how brains actually work — not how we wish they worked.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>adhd</category>
      <category>productivity</category>
      <category>beginners</category>
    </item>
  </channel>
</rss>
