<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vladimir Panov</title>
    <description>The latest articles on DEV Community by Vladimir Panov (@vladimirpanov).</description>
    <link>https://dev.to/vladimirpanov</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2086565%2F39e7b115-cc46-411e-8e3f-73f95738cf19.jpg</url>
      <title>DEV Community: Vladimir Panov</title>
      <link>https://dev.to/vladimirpanov</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vladimirpanov"/>
    <language>en</language>
    <item>
      <title>Notes: Memory, Context, and Large Language Models (LLMs)</title>
      <dc:creator>Vladimir Panov</dc:creator>
      <pubDate>Wed, 01 Jul 2026 05:24:54 +0000</pubDate>
      <link>https://dev.to/vladimirpanov/notes-memory-context-and-large-language-models-llms-4m67</link>
      <guid>https://dev.to/vladimirpanov/notes-memory-context-and-large-language-models-llms-4m67</guid>
      <description>&lt;p&gt;Notes following a discussion on how memory works in language models - and how it could be improved: ranging from the common issue of "context window" exhaustion to node architecture and entity linking.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The illusion of an infinity chat.
&lt;/h2&gt;

&lt;p&gt;No model possesses a truly infinite context; the window size is always finite. The illusion of a continuous dialogue is maintained through information compression and selection mechanisms. Specific approaches include: Infini-attention (from Google) - a compressed long-term memory mechanism built on top of standard attention; it reportedly maintains performance quality even when exceeding the million-token threshold. StreamingLLM - utilizing several "anchor" tokens at the start of the sequence combined with a sliding window for recent tokens. MemGPT/Letta - a system resembling OS virtual memory that incorporates three tiers: core memory (always within the context window), archival memory (in vector storage), and recall memory (the full history in a database). Mem0 - instead of summarizing everything indiscriminately, it selectively stores only significant facts, reducing token volume by 80–90%. Also worth mentioning is EM-LLM: this model segments history not mechanically, but based on a "surprise" metric - an approach that appears to mirror the workings of human memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Human memory as a working model of limitation.
&lt;/h2&gt;

&lt;p&gt;Human working memory holds only a small number of items at once (the classic figure being Miller’s “magic number” of 7±2, which Cowan later revised to approximately 4). Yet, it is precisely this limitation that compels the brain to constantly compress information: what is stored in memory is not a recording, but a reconstruction. As early as the 1930s, Bartlett demonstrated that remembering is the recreation of an image based on cognitive schemas rather than the playback of a tape recording; this is why memory is prone to distortion, yet it is also what allows it to serve as a tool for thinking rather than merely an archive. Forgetting is not a data "leak" but an active process: during sleep, the hippocampus "replays" events and gradually transfers them to the neocortex in a generalized form (consolidation). This limitation is not a glitch but a factor that forces the system to generalize information rather than simply copying it.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Language, encoding, and the "cost" of memorization.
&lt;/h2&gt;

&lt;p&gt;The brain doesn't store every word form individually. Instead, it performs morphological analysis (isolating the root and the rule), ensuring that a complex morphological system doesn't create "search noise" at the semantic level. However, this poses a problem for the tokenizers used in LLMs: for instance, the o200k_base tokenizer yields an average of 1.96 tokens per word for Russian, compared to 1.16 for English. Consequently, processing the same amount of information in Russian is approximately 70% more expensive, as the rich inflectional system causes BPE token frequencies to be "diluted" across dozens of forms of the same word. Notably, a comparative study of languages (Coupé et al., 2019) demonstrated that the rate of spoken information transmission tends toward a universal value - approximately 39 bits per second. Languages with high syllabic information density require a slower speaking pace; thus, there is no "free lunch" in this regard.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Thought and language are separate systems.
&lt;/h2&gt;

&lt;p&gt;Using fMRI, Evelina Fedorenko (MIT) demonstrates that the brain's language network remains virtually inactive during the performance of mathematical and logical tasks, musical activities, or contemplation of others' intentions. Patients with severe global aphasia lose their language abilities yet retain the capacity to perform arithmetic, play chess, and reason about the world around them. A word is not a vessel for a thought but rather a label "attached" to it only after the thought has been formed. In bilinguals, this is corroborated by the Revised Hierarchical Model (Kroll and Stewart): there is a single conceptual node linked to distinct lexical units for each language. This explains the example involving the words "girl" and "девочка" (Russian for 'girl'): it is an image that surfaces in memory, not the word itself; the word is selected only subsequently.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. A New "Language" of Memory: Writing and Reading Around the Text.
&lt;/h2&gt;

&lt;p&gt;Two approaches directly address the question of exactly how writing to and reading from memory occurs - distinct from the prompting process itself.&lt;/p&gt;

&lt;p&gt;The continuous approach: "gist tokens" (Mu et al., 2023) - the model is trained to compress the prompt into a set of vectors. This is not human-readable text but data read directly by the model (achieving compression ratios of up to 26x). Similar methods are employed by AutoCompressors and ICAE, utilizing summary vectors or memory slots. The Titans model (Google, January 2025) takes this further: long-term memory is implemented as a separate neural module (an MLP) that continues learning during inference, with "memorization" decisions based on a gradient-based "surprise signal" - a mechanism virtually identical to human memory consolidation.&lt;/p&gt;

&lt;p&gt;The conceptual approach: Meta’s Large Concept Models (December 2024) - the model operates not on tokens but on "concepts": embeddings of entire sentences within the SONAR space, which is independent of both language and modality. A sentence in Russian, French, or Hindi yields a virtually identical representation; reasoning occurs at the concept level and is rendered into a specific language only at the output stage.&lt;/p&gt;

&lt;p&gt;The symbolic approach: graph-based memory (Zep/Graphiti, Mem0’s graph mode) - during writing, the LLM autonomously extracts entities and relationships, while reading involves graph traversal rather than vector search. Although less efficient in terms of data compression, this method allows humans to read and edit the memory content.&lt;/p&gt;

&lt;p&gt;The trade-off for using approaches based on continuous representations lies in their interpretability: they cannot simply be "opened and read". Individual facts within them cannot be corrected in isolation, and, as a rule, they are tightly coupled to a specific model.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. The uniqueness of interior spaces and the problem of coordination.
&lt;/h2&gt;

&lt;p&gt;If two models form their internal representations independently, there is no guarantee that these representations will align - a phenomenon that has already been studied empirically. In the game &lt;em&gt;Hanabi&lt;/em&gt;, agents that play brilliantly alongside the partner with whom they trained perform extremely poorly when paired with an independently trained agent. During joint training, they develop arbitrary conventions - or "handshakes" - unlinked to the objective values ​​of the cards; this phenomenon is known as the zero-shot coordination problem. The "other-play" method enables agents to avoid such idiosyncratic traits to ensure compatibility, though this comes at the cost of reduced flexibility and uniqueness.&lt;/p&gt;

&lt;p&gt;There is also an opposing view: the Platonic Representation Hypothesis (Huh, Isola et al., ICML 2024) posits that as models scale up and the tasks they perform become more diverse, their internal representations don't diverge; instead, they converge toward a unified statistical model of reality - since they are all modeling the same world. Furthermore, a technique utilizing relative representations (Moschella et al., 2023) demonstrates that even formally incompatible latent spaces can be aligned without retraining. This is achieved by comparing the degree of similarity to a fixed set of reference points (anchors) rather than comparing absolute coordinates. In other words - the task becomes one of recognizing different orientations of the same underlying geometric structure, rather than performing a word-for-word translation.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Subjective, objective, and the nature of "glitches".
&lt;/h2&gt;

&lt;p&gt;A category such as "cat" is not an objective entity but a culturally shaped interpretation. Eleanor Rosch’s prototype theory demonstrates that concepts are organized around a "fuzzy" prototype rather than having rigid boundaries. Individual deviations (such as a fear of cats) are not "bugs" but the result of the system tuning itself based on personal experience.&lt;/p&gt;

&lt;p&gt;The way we explain someone else's deviation (e.g., "a cat must have scared him in childhood") reflects the work of what Gazzaniga called the "left-brain interpreter" - a specialized brain module that constantly generates plausible explanations for behavior, even without access to the true cause (recall the classic split-brain experiment involving images of a chicken claw and a snowy landscape). This is a normal mode of brain function, not a malfunction.&lt;/p&gt;

&lt;p&gt;An analogy can be drawn to Large Language Models (LLMs) based on the OpenAI paper "Why Language Models Hallucinate" (Kalai, Nachum, Vempala, Zhang; September 2025). The paper formally proves that any model capable of generalizing beyond its training data will inevitably either hallucinate or "collapse" by refusing to provide diverse answers - there is no other way. The situation is exacerbated by the fact that training and evaluation processes reward confident guesses rather than an honest "I don't know." An important caveat: this strips the term "glitch" of its moral baggage but don't erase the distinction between subjective categories (where there are no wrong answers) and facts (where reality is singular, and confidence calibration matters regardless of the mechanism).&lt;/p&gt;

&lt;h2&gt;
  
  
  8. How this actually gets built: RAG today.
&lt;/h2&gt;

&lt;p&gt;The standard pipeline as of 2025-2026 looks like this: data chunks (typically 512-1024 tokens) are fed into a vector store that supports similarity search (using HNSW indices). Since pure vector search is unreliable when dealing with specific facts and names, a hybrid approach has become the standard: combining embeddings with the BM25 algorithm and merging results via Reciprocal Rank Fusion (RRF). Next, a cross-encoder (a re-ranking model) steps in to select the most relevant candidates from a pool of fifty (boosting accuracy by approximately 17%). The major trend of the past year has been "agentic RAG": instead of the system pre-determining what information to retrieve, the LLM itself formulates the query, evaluates whether the retrieved data is sufficient, and repeats the search if necessary.&lt;/p&gt;

&lt;p&gt;Anthropic has already implemented this solution in a real-world setting - it is not merely a concept: Claude.ai summarizes chat history daily, extracting key facts that are carried over into new conversations (with each project maintaining its own isolated memory). At the API level, a memory management tool is available: the model autonomously creates, reads, and updates files in persistent storage that survives across sessions, loading only the necessary data as needed without overloading the context window.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. An entity graph built on the fly.
&lt;/h2&gt;

&lt;p&gt;The Zep/Graphiti architecture embodies a working prototype of the model where "a node is created upon the first interaction, while edges accumulate through repeated interactions". It comprises three levels: episodes (raw observations), entities (extracted entities, deduplicated across sessions), and communities (clusters of related entities). Upon receiving new data, the LLM determines whether the information refers to an existing node (an update or expansion) or a new one (creation) - a decision made not based on a rigid distance threshold, but through contextual assessment. Each edge is annotated with bitemporal metadata - information indicating when the fact was true and when the agent acquired the data. Without this, the node would eventually accumulate contradictory edges and become useless.&lt;/p&gt;

&lt;p&gt;The linking problem itself represents the classic stability-plasticity dilemma - a challenge around which Carpenter and Grossberg built an entire architecture in the 1980s: Adaptive Resonance Theory. This theory employs a vigilance parameter (ρ) that dictates how closely a new stimulus must match an existing node to trigger an update rather than the creation of a new node. A setting that is too loose leads to the merging of distinct entities and node distortion, while one that is too strict causes a single entity to spawn fragmented duplicates. There is no universally correct threshold.&lt;/p&gt;

&lt;p&gt;A biological parallel to this three-stage model is the Complementary Learning Systems theory (McClelland, McNaughton, &amp;amp; O'Reilly, 1995): the hippocampus rapidly encodes each episode individually (employing pattern separation to preserve detail), whereas the neocortex slowly integrates these episodes into a generalized semantic node through repeated replay during sleep.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. A node as a response, not an image.
&lt;/h2&gt;

&lt;p&gt;A key shift lies in the fact that a node is defined not by the stimulus itself, but by the convergent pattern of response to it; this approach is characteristic of functionalism in the philosophy of mind and is supported by direct empirical evidence. Examples include "concept cells" - such as the so-called "Jennifer Aniston neuron" (Quiroga et al., &lt;em&gt;Nature&lt;/em&gt;, 2005) -neurons in the human medial temporal lobe that activate upon perceiving a specific individual, regardless of the information channel: whether through various photographs, a written name, or even a name spoken by a computer-generated voice. A similar phenomenon was artificially replicated in the CLIP model (Goh et al., OpenAI, 2021), where a "Spider-Man neuron" responds to a photograph of a spider, the word "spider," or an image of the Spider-Man costume; the authors explicitly state that they were inspired by Quiroga’s discovery.&lt;/p&gt;

&lt;p&gt;The "signal flow → weighted network → node key" mechanism closely resembles the VQ-VAE architecture (Van den Oord et al., 2017): an encoder transforms a signal into a vector, which is matched to the nearest element in a learnable discrete codebook, yielding a discrete index as output, while the codebook itself continues to learn during operation. A historical precedent for the concept of "space reorganization during sleep" is the wake-sleep algorithm (Hinton, Dayan, Frey, Neal, 1995), which incorporates a distinct "sleep" phase: the network generates its own data (in a top-down direction) to adjust weights offline.&lt;/p&gt;

&lt;p&gt;Individual elements of this picture already exist and have been validated.  The open engineering question lies not in any single one of them, but in how to combine them into a unified, continuously functioning structure, rather than a system fixed at the pre-training stage.&lt;/p&gt;




&lt;p&gt;Key sources&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2310.08560" rel="noopener noreferrer"&gt;MemGPT: Towards LLMs as Operating Systems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2501.13956" rel="noopener noreferrer"&gt;Zep: A Temporal Knowledge Graph Architecture for Agent Memory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2501.00663" rel="noopener noreferrer"&gt;Titans: Learning to Memorize at Test Time&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ai.meta.com/research/publications/large-concept-models-language-modeling-in-a-sentence-representation-space/" rel="noopener noreferrer"&gt;Large Concept Models: Language Modeling in a Sentence Representation Space (Meta)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2304.08467" rel="noopener noreferrer"&gt;Learning to Compress Prompts with Gist Tokens&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2405.07987" rel="noopener noreferrer"&gt;The Platonic Representation Hypothesis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2209.15430" rel="noopener noreferrer"&gt;Relative representations enable zero-shot latent space communication&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2003.02979" rel="noopener noreferrer"&gt;"Other-Play" for Zero-Shot Coordination&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/pdf/2509.04664" rel="noopener noreferrer"&gt;Why Language Models Hallucinate (OpenAI)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Left-brain_interpreter" rel="noopener noreferrer"&gt;Left-brain interpreter/confabulation (Gazzaniga)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt; &lt;a href="https://www.nature.com/articles/nature03687" rel="noopener noreferrer"&gt;Invariant visual representation by single neurons in the human brain (Quiroga et al.)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://distill.pub/2021/multimodal-neurons/" rel="noopener noreferrer"&gt;Multimodal neurons in artificial neural networks (OpenAI/Distill)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC5124075/" rel="noopener noreferrer"&gt;Complementary learning systems within the hippocampus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://scholarpedia.org/article/Adaptive_resonance_theory" rel="noopener noreferrer"&gt;Adaptive resonance theory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pubmed.ncbi.nlm.nih.gov/27096882/" rel="noopener noreferrer"&gt;Language and thought are not the same thing (Fedorenko)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://kathane.substack.com/p/not-speaking-english-to-chatgpt-costs" rel="noopener noreferrer"&gt;Not speaking English to ChatGPT costs you millions of tokens&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>memory</category>
    </item>
    <item>
      <title>AI-Native Software Delivery</title>
      <dc:creator>Vladimir Panov</dc:creator>
      <pubDate>Sun, 10 May 2026 04:43:15 +0000</pubDate>
      <link>https://dev.to/vladimirpanov/ai-native-software-delivery-30f6</link>
      <guid>https://dev.to/vladimirpanov/ai-native-software-delivery-30f6</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;A process for shipping software when AI generates most of the implementation and humans own intent, validation, and behavioral guarantees.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is a rewrite of an earlier article that introduced the idea. The thinking has matured into a working, clone-and-go process toolkit:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/DominicTylor/ai-software-process" rel="noopener noreferrer"&gt;github.com/DominicTylor/ai-software-process&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What follows is the methodology in its current shape — what changed since the first article, why those changes mattered, and what the resulting process looks like in practice.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem this process exists to solve
&lt;/h2&gt;

&lt;p&gt;AI now generates a meaningful share of production code, and that share keeps growing. Engineers spend less time writing code and more time reviewing what was generated, often less deeply than we like to admit. The question stops being "is this code well-written?" and becomes "does this code do what we said it should do?" — and the systems most engineering teams use today were designed for a world where humans wrote everything by hand.&lt;/p&gt;

&lt;p&gt;A few pains, all interlocking:&lt;/p&gt;

&lt;p&gt;Story trackers describe intent in plain English in one place; the code that implements it lives somewhere else; the tests that verify it live somewhere else again. Each drifts at its own rate. A Jira ticket that was current six months ago describes behavior that the code no longer has — and nobody knows when it stopped being accurate.&lt;/p&gt;

&lt;p&gt;BDD frameworks tried to bridge this. They duplicated scenarios in Gherkin (&lt;code&gt;.feature&lt;/code&gt; files) alongside the executable code. Two artifacts that say the same thing must be kept in sync, and they never are. The feature file becomes documentation; the code becomes the truth; the gap between them is where bugs live.&lt;/p&gt;

&lt;p&gt;Decision history fragments across chat threads, old tickets, deleted documents, and the heads of senior engineers. When a new contributor asks "why did we drop password authentication?", the answer is gone. The current code shows what we do; nothing shows why.&lt;/p&gt;

&lt;p&gt;Tests check internal state — Redis keys, database rows, in-memory variables — instead of customer-observable behavior. They pass while the user-visible flow is broken, because they were written to be easy rather than honest.&lt;/p&gt;

&lt;p&gt;And underneath all of this: when AI generates the implementation, none of the safeguards a team built around human authorship apply the same way. Code review catches less because reviewers skim more. Linters and type checks don't catch semantic drift from the spec. The acceptance criteria — if they exist — are in a ticket nobody reads.&lt;/p&gt;

&lt;p&gt;The standard practices were designed for a different world. The reshape is not a productivity tweak; it is a structural change in where trust lives.&lt;/p&gt;




&lt;h2&gt;
  
  
  What changes
&lt;/h2&gt;

&lt;p&gt;Implementation-centric engineering optimized for the people writing code. Result-centric engineering optimizes for the property the product must hold. The shift is concrete:&lt;/p&gt;

&lt;p&gt;Humans increasingly own intent, validation, and behavioral guarantees. AI increasingly generates the code that satisfies them. The center of human attention moves from "did I write this well?" to "have I described what must hold, and is it verified?"&lt;/p&gt;

&lt;p&gt;This is not a productivity argument. The argument is that as AI-generated code becomes the default, the only durable form of trust is &lt;strong&gt;executable behavioral validation&lt;/strong&gt;: a machine-checkable through-line from customer intent to verified behavior. Everything in the process below serves that through-line.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stories are folders, not tickets
&lt;/h2&gt;

&lt;p&gt;The unit of work is not a row in a tracker. It is a folder in git.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stories/auth/user-signup/
  user-spec.md           ← what the user wants, who they are, what must hold
  e2e/
    signup-via-github.spec.ts
    signup-via-magic-link.spec.ts
  perf/                  (optional)
  security/              (optional)
  a11y/                  (optional)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user-spec describes intent, personas, high-level user goals, functional constraints, architect tech notes, quality-gate notes, scenario references, and which platform invariants the Story is subject to. What it deliberately does not contain: step-by-step scenarios in prose (those are in the commented tests), changelogs (in git), resolved-question annotations (in git), future-state plans (elsewhere), implementation choices (in tech specs in code repos).&lt;/p&gt;

&lt;p&gt;A Story always describes &lt;strong&gt;current behavior&lt;/strong&gt; — not history, not roadmap, not the journey of how it got here. If the system behaves this way right now, it goes in the spec. Otherwise, it goes elsewhere. That single rule eliminates a class of drift that older spec formats accept by default.&lt;/p&gt;




&lt;h2&gt;
  
  
  A scenario, concretely
&lt;/h2&gt;

&lt;p&gt;Acceptance criteria live as commented executable tests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;User signs up via GitHub for the first time&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// # User opens the signup page&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;opensSignupPage&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// # User sees three auth options with GitHub marked as recommended&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seesAuthOptions&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;recommended&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;github&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// # User clicks "Continue with GitHub"&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clicksContinueWithGitHub&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// # System completes OAuth and lands the user on an empty workspace dashboard&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;expectsDashboardWithEmptyWorkspace&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Owner writes the comments first. An AI helper, or the Implementer, fills in executable code under each comment without removing it. The comment stays as documentation; the code stays as verification; they live in the same file for the life of the test.&lt;/p&gt;

&lt;p&gt;Three properties fall out of this shape with no extra effort:&lt;/p&gt;

&lt;p&gt;The Owner reads only the comments to verify acceptance criteria. The Quality Gate Specialist reads comments and code together to verify they agree. Drift between described behavior and verified behavior is &lt;strong&gt;physically impossible&lt;/strong&gt; — they share a file and appear in the same diff. There is no second artifact to keep synchronized.&lt;/p&gt;

&lt;p&gt;A scenario that has only comments and no code under them is wrapped in &lt;code&gt;test.todo()&lt;/code&gt;. It appears in the test runner as TODO. The Story's observable state — what's drafted, what's verified, what's still pending — is whatever the test runner reports. Not a status field; the report.&lt;/p&gt;




&lt;h2&gt;
  
  
  Frameworks as bilateral contracts
&lt;/h2&gt;

&lt;p&gt;Notice the test above. No selectors. No &lt;code&gt;data-testid&lt;/code&gt; strings. No sleeps. No mocks. The verbs — &lt;code&gt;user.opensSignupPage()&lt;/code&gt;, &lt;code&gt;user.clicksContinueWithGitHub()&lt;/code&gt; — come from a framework owned by the Quality Gate Specialist.&lt;/p&gt;

&lt;p&gt;The framework's PageObjects hold one end of a &lt;strong&gt;bilateral contract&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// frameworks/e2e/page-objects/login-page.ts&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LoginPage&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;entersEmail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;[data-testid="login-email"]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The test-id &lt;code&gt;"login-email"&lt;/code&gt; lives in this file, in plain code. Code-perimeter implementation reads this file to know what the UI must expose. There is no separate test-id registry; the framework's PageObjects &lt;strong&gt;are&lt;/strong&gt; the registry.&lt;/p&gt;

&lt;p&gt;When the Story uses &lt;code&gt;user.entersLoginEmail(...)&lt;/code&gt;, the framework declares that the UI must render &lt;code&gt;data-testid="login-email"&lt;/code&gt; on the login form's email input. The code-perimeter team reads the framework to know what to build. The team building the test reads the framework to know what verbs are available.&lt;/p&gt;

&lt;p&gt;Same pattern for other frameworks. A &lt;code&gt;probe.scanTable('account', { where: 'password IS NOT NULL' })&lt;/code&gt; call in the security framework rests on a database probe helper that names the table and column explicitly; code that creates the schema reads that helper to know what names are expected.&lt;/p&gt;

&lt;p&gt;This is the through-line from intent to verified code: scenarios consume framework verbs, framework PageObjects declare identifiers, code-perimeter implementation honors those identifiers. The contract is not implicit. It is readable code on both sides.&lt;/p&gt;




&lt;h2&gt;
  
  
  Constitution: platform-wide invariants
&lt;/h2&gt;

&lt;p&gt;Stories describe features. Some rules apply across all features — they are invariants of the platform, not properties of any one Story. Those live in a separate &lt;code&gt;constitution.md&lt;/code&gt; document at the repository root.&lt;/p&gt;

&lt;p&gt;A constitution is short, prose, self-contained:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;No user password is ever stored, transmitted, or accepted by any code path.&lt;/p&gt;

&lt;p&gt;The SMTP capture service never originates outgoing connections on ports 25, 465, or 587.&lt;/p&gt;

&lt;p&gt;No request authenticated by tenant A can read or modify data belonging to tenant B. Cross-tenant access attempts return 404 or 403, never 500 and never partial data.&lt;/p&gt;

&lt;p&gt;All persistent data is encrypted at rest under platform-managed keys.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The constitution declares; it does not enumerate enforcement. Each rule has an &lt;strong&gt;owning Story&lt;/strong&gt; under &lt;code&gt;stories/&lt;/code&gt; — same shape as any other Story, but with an attacker or system-probe persona, scenarios that attempt to violate the rule, and assertions that the violation is refused.&lt;/p&gt;

&lt;p&gt;A Story may reference a constitution rule it depends on (&lt;code&gt;enforces: no-passwords&lt;/code&gt; in frontmatter) for traceability. The constitution itself does not reference back. It does not need to know how each rule is verified or by which Story.&lt;/p&gt;

&lt;p&gt;The Architect holds final word over &lt;code&gt;constitution.md&lt;/code&gt;. When a Story's design would conflict with a rule, the Architect-review skill blocks the PR. If the team genuinely wants to change the rule, that goes through a separate constitution PR — explicit, reviewed, recorded.&lt;/p&gt;




&lt;h2&gt;
  
  
  Decision history in commit messages
&lt;/h2&gt;

&lt;p&gt;Every behavioral change in the system is captured in a structured git commit message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;behavior: drop password auth from auth flow

Why: minimize attack surface; password storage adds risk for a developer-tool
audience that is comfortable with OAuth and magic links.
Considered: keep with bcrypt, move to passkeys only, drop entirely.
Chose: drop entirely; OAuth + magic link cover all signup and login paths.
Affects: stories/auth/user-signup/, constitution.md §3.2.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A &lt;code&gt;commit-msg&lt;/code&gt; git hook enforces the &lt;code&gt;Why / Considered / Chose / Affects&lt;/code&gt; shape on every commit whose subject starts with &lt;code&gt;behavior:&lt;/code&gt;. Non-behavioral commits (chore, fix, docs) are not gated. The hook is local; a matching CI check protects main against commits that bypassed the hook.&lt;/p&gt;

&lt;p&gt;Significant decisions are tagged for direct addressability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git tag decision/no-password-auth &amp;lt;sha&amp;gt;
git tag &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="s2"&gt;"decision/*"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To answer "why did we decide X?", the workflow is &lt;code&gt;git log --grep="X"&lt;/code&gt;, follow tags, read the structured sections of the relevant commit. There is a &lt;code&gt;/decision-search&lt;/code&gt; skill that makes this queryable in natural language, but the storage layer is just git.&lt;/p&gt;

&lt;p&gt;This rule has one specific consequence: &lt;strong&gt;no artifact in the repository contains a changelog&lt;/strong&gt;. No "Resolved on 2026-05-11" annotations inside specs. No version-history blocks. Specs describe current behavior. The history of why behavior is current lives in commit messages, addressable through tags.&lt;/p&gt;




&lt;h2&gt;
  
  
  State is observed, not declared
&lt;/h2&gt;

&lt;p&gt;A Story is not a thing that has a &lt;code&gt;status: "approved"&lt;/code&gt; field. The tracked unit is a &lt;strong&gt;change vector&lt;/strong&gt; — a &lt;code&gt;(branch, PR)&lt;/code&gt; pair against the master repository — and its state is whatever git and the forge say it is.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Branch&lt;/th&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;th&gt;Vector state&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Doesn't exist (or merged into main)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Doesn't exist, or is live&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exists, no PR&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Private work in progress&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exists, PR open (any state)&lt;/td&gt;
&lt;td&gt;Draft / Open&lt;/td&gt;
&lt;td&gt;In review or iteration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Exists, PR open, all approvals + CI green&lt;/td&gt;
&lt;td&gt;Open&lt;/td&gt;
&lt;td&gt;Ready to merge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Merged&lt;/td&gt;
&lt;td&gt;Merged&lt;/td&gt;
&lt;td&gt;Live&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;There is no status field anywhere. No external state store. &lt;code&gt;git branch -a&lt;/code&gt; plus the forge's PR list answers "who's working on what, what's in review, what's blocked, what just shipped" — without a separate tool.&lt;/p&gt;

&lt;p&gt;Parallel work follows from this naturally. Multiple branches mean multiple vectors in flight. Two vectors touching the same area are a coordination signal — usually a sign that two people are attempting the same change without realizing it. Tooling on top of the process (a dashboard, a query) can surface this; the process itself does not block it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Master and code perimeters: asymmetric awareness
&lt;/h2&gt;

&lt;p&gt;A non-trivial project usually has two physical territories: the &lt;strong&gt;master repository&lt;/strong&gt; where Stories, frameworks, and constitution live; and one or more &lt;strong&gt;code repositories&lt;/strong&gt; where implementation, tech specs, and code review live. In small projects these can coexist in one physical repo; the boundary remains conceptual.&lt;/p&gt;

&lt;p&gt;Awareness flows in one direction only.&lt;/p&gt;

&lt;p&gt;The master perimeter never queries, inspects, or coordinates with code repositories. Its references to them are descriptive ("this Story affects services A and B"), not operational. A skill running in the master perimeter never opens a code repository's pull request list, never reads code-side tech specs to decide what to do, never coordinates code-side merges.&lt;/p&gt;

&lt;p&gt;The code perimeter, in contrast, reads the master perimeter as its source of truth. To produce a tech spec or implementation, an engineering agent in a code repository reads the corresponding Story's user-spec, the architect tech notes inside it, the quality gates, the constitution rules it must respect, and the framework's PageObjects to know what identifiers to expose.&lt;/p&gt;

&lt;p&gt;This asymmetry keeps responsibilities clean. The master perimeter never has to know about deployment topology, CI runners, or how many code repositories the company has. The code perimeter never has to argue with product about what a feature should do.&lt;/p&gt;




&lt;h2&gt;
  
  
  Roles as areas of final word
&lt;/h2&gt;

&lt;p&gt;The process names six roles: Owner, Architect, Quality Gate Specialist, Implementer, Code Owner, UI/UX Specialist. These are &lt;strong&gt;responsibilities, not positions&lt;/strong&gt;. In a one-person project, one human holds all six and switches modes consciously. In a fifteen-person team, they tend toward distinct people. The role structure stays the same regardless of headcount.&lt;/p&gt;

&lt;p&gt;Stories are written &lt;strong&gt;collaboratively&lt;/strong&gt;. Anyone with something to contribute writes into the Story — Owner sets intent, Architect adds tech notes, Quality Gate Specialist refines scenarios, UI/UX Specialist adds accessibility constraints. A Story does not "belong" to a role.&lt;/p&gt;

&lt;p&gt;What roles hold is &lt;strong&gt;areas of final word&lt;/strong&gt; — domains where, when a decision is contested, that role's approval is required to ship. These are enforced through CODEOWNERS rules on path patterns, not through social convention. Owner over &lt;code&gt;stories/**/user-spec.md&lt;/code&gt;, Architect over &lt;code&gt;constitution.md&lt;/code&gt;, Quality Gate Specialist over &lt;code&gt;frameworks/**&lt;/code&gt; and scenario folders, and so on.&lt;/p&gt;

&lt;p&gt;A second pattern governs how non-final-word roles still carry real weight: &lt;strong&gt;mandatory review with required engagement&lt;/strong&gt;. Horizontal roles — Architect, Quality Gate Specialist, UI/UX Specialist — review every relevant Story automatically. Their comments are blocking. The Owner is free to overrule their advice in the Owner's own product domain, but only by &lt;strong&gt;explicit acknowledgment&lt;/strong&gt; — a written statement that the risk is read, accepted, and carried. Silent dismissal is not allowed.&lt;/p&gt;

&lt;p&gt;When a horizontal role believes an Owner's acknowledgment underestimates a systemic risk, the open path is to escalate by opening a constitution PR. The discussion moves from Story scope (Owner's call) to platform scope (Architect's call). The right argument resolves at the right altitude.&lt;/p&gt;




&lt;h2&gt;
  
  
  Skills and sub-agents
&lt;/h2&gt;

&lt;p&gt;Skills are AI-assisted helpers — focused operations a contributor invokes like commands: &lt;code&gt;/spec-brainstorm&lt;/code&gt;, &lt;code&gt;/architect-review&lt;/code&gt;, &lt;code&gt;/scenario-implement&lt;/code&gt;, &lt;code&gt;/decision-search&lt;/code&gt;. Sub-agents are role-specialists with deeper expertise in a single domain (&lt;code&gt;spec-spec&lt;/code&gt; for product consistency, &lt;code&gt;architect-spec&lt;/code&gt; for constitution and system invariants, &lt;code&gt;quality-spec&lt;/code&gt; for frameworks and coverage, &lt;code&gt;ui-ux-spec&lt;/code&gt; for visible-state completeness, &lt;code&gt;decision-historian&lt;/code&gt; for git history). Skills invoke sub-agents when judgment in a domain is required.&lt;/p&gt;

&lt;p&gt;The operational principle is: &lt;strong&gt;skills are soft assistance inside a branch, hard gates on the pull request&lt;/strong&gt;. While work is private, every skill is invokable and ignorable. An author can use them or not. Once a PR is opened, the horizontal-role review skills run automatically and their comments are blocking. Resolution takes one of two forms — fix the concern, or write an explicit acknowledgment of the risk. There is no third option.&lt;/p&gt;

&lt;p&gt;This separation matters. A contributor who prefers to write everything by hand is not punished. A contributor who relies heavily on AI is not given a shortcut around the gates. The path is different; the destination is the same.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this process is not
&lt;/h2&gt;

&lt;p&gt;To prevent drift toward familiar models:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not Jira-as-spec.&lt;/strong&gt; A Story is not a ticket. Stories live in git and always describe current behavior. There is no parallel ticket queue describing the same work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not BDD with Gherkin.&lt;/strong&gt; Scenarios are first-class TypeScript inside test files, not a parallel &lt;code&gt;.feature&lt;/code&gt; layer that has to be kept in sync. One source of truth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not test-after.&lt;/strong&gt; Acceptance criteria are written before or alongside implementation, as commented executable tests. Code is generated to satisfy them, not the other way around.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not waterfall, despite the milestone names.&lt;/strong&gt; Stages cycle. Implementation can expose gaps in scenarios; review can redirect back to spec. The process recognizes legitimate return points instead of pretending the flow is linear.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not vendor-locked.&lt;/strong&gt; The methodology shape is independent of any specific AI tool. The current incarnation uses Claude Code; the same shape works with any AI environment given equivalent primitives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not headcount-prescriptive.&lt;/strong&gt; Roles are responsibilities. One person can hold all six. A team of fifteen can split them across people. The process does not require any specific organizational chart.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where to start
&lt;/h2&gt;

&lt;p&gt;The process lives at &lt;strong&gt;&lt;a href="https://github.com/DominicTylor/ai-software-process" rel="noopener noreferrer"&gt;github.com/DominicTylor/ai-software-process&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The repository is a clone-and-go Claude Code toolkit. &lt;code&gt;process.md&lt;/code&gt; is the full canon. &lt;code&gt;constitution.md&lt;/code&gt; is a working template adopters rewrite in place. &lt;code&gt;templates/story/&lt;/code&gt; holds the per-Story scaffolds skills consume when creating new artifacts. &lt;code&gt;.claude/skills/&lt;/code&gt; and &lt;code&gt;.claude/agents/&lt;/code&gt; ship a complete master-perimeter toolkit (twelve skills, five sub-agents) plus a code-perimeter starter pack drawn from a real TypeScript/Node monorepo. &lt;code&gt;.githooks/commit-msg&lt;/code&gt; enforces the structured commit format locally; &lt;code&gt;.github/workflows/&lt;/code&gt; enforces it (and spec validation, and ai-review) on the PR.&lt;/p&gt;

&lt;p&gt;Adopt by forking, rewriting the constitution and high-level project description in place, and piloting a Story end-to-end. MIT No Attribution license — no permission, no obligation to credit, no friction. Attribution back is appreciated but never required.&lt;/p&gt;

&lt;p&gt;If anything here resonated, that's where it goes from idea to applied.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>softwareengineering</category>
      <category>software</category>
    </item>
  </channel>
</rss>
