<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Brent Fowler</title>
    <description>The latest articles on DEV Community by Brent Fowler (@brentf_io).</description>
    <link>https://dev.to/brentf_io</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3945011%2Fccac4295-478c-4e45-a188-9297bdee41ed.jpg</url>
      <title>DEV Community: Brent Fowler</title>
      <link>https://dev.to/brentf_io</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/brentf_io"/>
    <language>en</language>
    <item>
      <title>The AI Context Efficiency Experiment: Why Architecture Beat Context Size</title>
      <dc:creator>Brent Fowler</dc:creator>
      <pubDate>Mon, 01 Jun 2026 12:34:32 +0000</pubDate>
      <link>https://dev.to/brentf_io/the-ai-context-efficiency-experiment-why-architecture-beat-context-size-17ae</link>
      <guid>https://dev.to/brentf_io/the-ai-context-efficiency-experiment-why-architecture-beat-context-size-17ae</guid>
      <description>&lt;p&gt;Figure 1: The experiment's central turn: context, compaction, locality, governance, recovery, and conclusion.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Question Was About Context
&lt;/h2&gt;

&lt;p&gt;I thought I was running a context experiment.&lt;/p&gt;

&lt;p&gt;I wasn't.&lt;/p&gt;

&lt;p&gt;I just didn't know it yet.&lt;/p&gt;

&lt;p&gt;The experiment started with a question that seemed straightforward.&lt;/p&gt;

&lt;p&gt;It turned out to be an incomplete question.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;What actually drives AI-assisted software development efficiency?&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That is the question most engineers eventually ask after using AI coding tools for serious work. At first, the obvious variables all live inside the session. How much context is available? How quickly does it burn? What happens after compaction? Does the agent remember enough to keep working? Can a long-running session survive handoffs, summaries, and interruptions?&lt;/p&gt;

&lt;p&gt;Those were the first things worth watching. The work was happening in two real repositories, not in a synthetic benchmark. Financial Portfolio Agent (&lt;code&gt;FPA&lt;/code&gt;) and Market Intelligence Lab (&lt;code&gt;MIL&lt;/code&gt;) had different responsibilities, real validation requirements, and established repository boundaries. FPA handled household execution realism, portfolio intelligence, survivability positioning, and deterministic financial decision support. MIL handled systemic context, macro intelligence, evidence quality, and source lineage governance.&lt;/p&gt;

&lt;p&gt;The early mental model was simple: context is fuel. If the session burns too much of it, efficiency drops. If compaction preserves enough state, efficiency survives.&lt;/p&gt;

&lt;p&gt;That model was not wrong. It was just incomplete.&lt;/p&gt;

&lt;p&gt;The experiment eventually produced a recovered dataset with &lt;code&gt;47&lt;/code&gt; master observations: &lt;code&gt;14&lt;/code&gt; context observations, &lt;code&gt;13&lt;/code&gt; compaction events, &lt;code&gt;6&lt;/code&gt; feature batch observations, &lt;code&gt;7&lt;/code&gt; model observations, and &lt;code&gt;12&lt;/code&gt; recovery observations. That made the story evidence-backed early. It was not just a feeling that the work had changed shape. There was enough telemetry to trace where the explanation started to move.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4b1k31k3ilz5f0rt2qzo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4b1k31k3ilz5f0rt2qzo.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Figure 2: Recovered observation counts across context, compaction, feature batch, model, and recovery logs.&lt;/p&gt;

&lt;p&gt;By the end of the experiment, the strongest preserved signal was not context size. It was architectural locality. The surprising result was not that compaction worked. The surprising result was that the system became easier for AI to work in when the work had a clear architectural home.&lt;/p&gt;

&lt;p&gt;The experiment started as a context study. It ended as an architecture study.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Clue Was The Burn Pattern
&lt;/h2&gt;

&lt;p&gt;The first useful evidence was numerical. Context values were preserved across the experiment, and later reconstructed into a dataset.&lt;/p&gt;

&lt;p&gt;FPA had recovered context observations including &lt;code&gt;26K&lt;/code&gt;, &lt;code&gt;66K&lt;/code&gt;, &lt;code&gt;100K&lt;/code&gt;, &lt;code&gt;133K&lt;/code&gt;, &lt;code&gt;161K&lt;/code&gt;, &lt;code&gt;194K&lt;/code&gt;, and &lt;code&gt;219K&lt;/code&gt;. MIL had recovered observations including &lt;code&gt;38K&lt;/code&gt;, &lt;code&gt;50K&lt;/code&gt;, &lt;code&gt;125K&lt;/code&gt;, &lt;code&gt;153K&lt;/code&gt;, &lt;code&gt;166K&lt;/code&gt;, &lt;code&gt;179K&lt;/code&gt;, &lt;code&gt;192K&lt;/code&gt;, &lt;code&gt;205K&lt;/code&gt;, &lt;code&gt;217K&lt;/code&gt;, &lt;code&gt;228K&lt;/code&gt;, and &lt;code&gt;240K&lt;/code&gt;. MIL also preserved a cleaner Generation 2 rebuild sequence that ran from &lt;code&gt;42.0K&lt;/code&gt; through &lt;code&gt;205.0K&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;At a glance, those numbers look like the whole story. The context grows. The session gets heavy. Compaction resets it. Work continues.&lt;/p&gt;

&lt;p&gt;But the data had to be handled carefully. The recovered FPA values were observed markers, not a complete time series. Some MIL values were ordered, but not all of them were. Exact timestamps were not preserved for every point. The dataset correctly keeps unknown values as &lt;code&gt;UNKNOWN&lt;/code&gt; instead of pretending that a partial transcript is a complete measurement system.&lt;/p&gt;

&lt;p&gt;That distinction matters. Clean charts can lie when the source data is incomplete. The right move was to preserve the observations as observations: enough to tell a technical story, not enough to claim a complete benchmark.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs4aqzqi67s1wufd0apsf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs4aqzqi67s1wufd0apsf.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Figure 3: Recovered context observations. The MIL Generation 2 sequence is ordered; FPA values are preserved as observed markers.&lt;/p&gt;

&lt;p&gt;The duration data had the same character. The transcript preserved observed work times from &lt;code&gt;1m 00s&lt;/code&gt; through &lt;code&gt;6m 27s&lt;/code&gt;, but not a full model-to-duration map for every feature. Again, that made the data useful telemetry, not a controlled comparison.&lt;/p&gt;

&lt;h2&gt;
  
  
  Then The Dataset Had To Be Rebuilt
&lt;/h2&gt;

&lt;p&gt;One of the most important editorial discoveries came before the main technical one: the archive had preserved findings, but not all of the observations behind them.&lt;/p&gt;

&lt;p&gt;That forced a reconstruction pass. The transcript became a primary source. The goal was not to summarize it. The goal was to recover telemetry: context values, compaction events, model observations, feature batch observations, duration observations, and recovery events.&lt;/p&gt;

&lt;p&gt;The scorecard above is the result of that pass. It moved the work from memory to a defensible dataset.&lt;/p&gt;

&lt;p&gt;These are recovered observations, not complete experiment totals. That boundary is the reason the dataset is useful: it says what was preserved and what was not.&lt;/p&gt;

&lt;p&gt;Some of the most valuable values were transcript-only. FPA preserved a compaction transition from &lt;code&gt;219K&lt;/code&gt; to &lt;code&gt;26.7K&lt;/code&gt;. MIL preserved one from &lt;code&gt;240K&lt;/code&gt; to &lt;code&gt;38K&lt;/code&gt;. MIL also preserved a Generation 2 rebuild table from &lt;code&gt;42.0K&lt;/code&gt; through &lt;code&gt;205.0K&lt;/code&gt;. The archive preserved the broader finding that compaction supported continuity, but the transcript gave the story numbers.&lt;/p&gt;

&lt;p&gt;That reconstruction changed the tone of the experiment. It made the next turn harder to dismiss.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compaction Worked
&lt;/h2&gt;

&lt;p&gt;Compaction passed an important test.&lt;/p&gt;

&lt;p&gt;FPA moved from &lt;code&gt;219K&lt;/code&gt; to &lt;code&gt;26.7K&lt;/code&gt;. MIL moved from &lt;code&gt;240K&lt;/code&gt; to &lt;code&gt;38K&lt;/code&gt;. MIL also preserved a related compaction shrink observation of &lt;code&gt;239K -&amp;gt; 42K&lt;/code&gt;, followed by a rebuild span of &lt;code&gt;42K -&amp;gt; 205K&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;That is a real continuity story. A long-running AI development process can shed a large amount of context and still keep going if enough operational state lives outside the chat. Repository artifacts, bootstrap notes, governance records, validation commands, and durable documentation all matter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4lvqs8eyo93k0atjl34.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4lvqs8eyo93k0atjl34.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Figure 5: Compaction visibly reset context size in both repositories, while exact compaction counts remain unknown.&lt;/p&gt;

&lt;p&gt;But this is where the experiment started to turn.&lt;/p&gt;

&lt;p&gt;If compaction were the whole explanation, then the main lesson would be operational: summarize better, compact cleanly, restart carefully. Those are good practices, but they did not explain the strongest throughput signal in the archive.&lt;/p&gt;

&lt;p&gt;Compaction helped the work survive. It was not preserved as the dominant throughput driver.&lt;/p&gt;

&lt;p&gt;That distinction prevented the wrong lesson. The experiment was not saying, “The bigger the context window, the faster the work.” It was not saying, “Compaction is the answer.” Compaction created room to continue, but it did not explain why some work kept feeling cheaper to understand.&lt;/p&gt;

&lt;p&gt;The work stayed efficient when it stayed local.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Turn: Locality Became The Signal
&lt;/h2&gt;

&lt;p&gt;The word &lt;code&gt;locality&lt;/code&gt; became the center of the experiment because it explained what context size alone could not.&lt;/p&gt;

&lt;p&gt;The turning point was subtle. There was no single feature where the experiment suddenly announced a new theory. Instead, the same pattern kept showing up in the work. Adjacent features were easier to complete when they stayed near previous features. Review was easier when the files, tests, and concepts were already part of the active working set. Validation stayed narrower when the feature did not cross into portfolio logic, report contracts, or runtime behavior.&lt;/p&gt;

&lt;p&gt;That pattern was more interesting than the raw context number. A session at a high context count could still move well if the next task belonged to the same architectural neighborhood. A freshly compacted session could still struggle if the next task required reloading too many unrelated domains. The amount of context available mattered, but the shape of the work mattered more.&lt;/p&gt;

&lt;p&gt;When work stayed inside a coherent architectural area, the AI agent had less to rediscover. The relevant files were close together. The concepts were reused. The tests were predictable. The risk surface was bounded. The reviewer did not have to reload the entire system to understand whether the change belonged.&lt;/p&gt;

&lt;p&gt;In FPA, the clearest example was Pipeline Metadata. Repeated work clustered around a small set of files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;app/pipeline/registry.py&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;app/pipeline/planner.py&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;app/pipeline/runner.py&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tests/test_pipeline_registry.py&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tests/test_pipeline_planner.py&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tests/test_pipeline_runner.py&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That cluster supported features around discoverability, ownership visibility, dependency visibility, output visibility, lineage visibility, category inspection, and boundary inspection. Those features were useful, but they did not require report redesign, CSV schema changes, portfolio logic changes, or runtime execution redesign.&lt;/p&gt;

&lt;p&gt;This is the architectural lesson: an AI agent is not only spending context on code. It is spending context on uncertainty.&lt;/p&gt;

&lt;p&gt;When ownership is unclear, context burn rises. When the validation path is unknown, context burn rises. When a feature crosses multiple conceptual boundaries, context burn rises. When the work sits inside a mature locality cluster, the agent can reuse the same mental map.&lt;/p&gt;

&lt;p&gt;The archive does not prove a numeric locality multiplier. It does not show that locality reduced token burn by a specific percentage. The claim is more careful: within the preserved experiment record, architectural locality appeared to be the dominant efficiency multiplier.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flonxdjr4mb5alka7ean1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flonxdjr4mb5alka7ean1.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Figure 6: The context question turned into an architectural locality finding.&lt;/p&gt;

&lt;p&gt;That was the point where the experiment stopped being mostly about context windows.&lt;/p&gt;

&lt;p&gt;It became about codebase shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bounded Contexts Made The Work Cheaper To Think About
&lt;/h2&gt;

&lt;p&gt;Once locality was visible, the next question was obvious: what made some areas local enough to keep producing leverage?&lt;/p&gt;

&lt;p&gt;The answer was bounded contexts.&lt;/p&gt;

&lt;p&gt;Pipeline Metadata was the clearest FPA case. It had coherent ownership, deterministic read-only query behavior, stable adjacency across feature batches, and a clear separation from pipeline execution. It made the system easier to inspect without redesigning how the system ran.&lt;/p&gt;

&lt;p&gt;That difference is subtle but important. A bounded context is not just a directory. It is a place where related questions can be answered without pulling the whole system into view.&lt;/p&gt;

&lt;p&gt;Pipeline Metadata answered questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What owns this output?&lt;/li&gt;
&lt;li&gt;What depends on this category?&lt;/li&gt;
&lt;li&gt;Which modules participate in this pipeline area?&lt;/li&gt;
&lt;li&gt;Which outputs are shared?&lt;/li&gt;
&lt;li&gt;What downstream categories are affected?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each new inspection feature made later inspection features easier. The work compounded because it stayed near its own prior work.&lt;/p&gt;

&lt;p&gt;The experiment record also identifies Export Quality Hardening as a bounded context in MIL. The FPA archive does not contain enough MIL internal evidence to analyze that workstream at the same level of detail, so that boundary needs to stay explicit. The experiment record can say Export Quality Hardening emerged as a bounded context. FPA-local evidence cannot prove MIL internal structure.&lt;/p&gt;

&lt;p&gt;The bigger lesson is that bounded contexts changed the economics of AI-assisted development. The work became easier to start, easier to validate, easier to review, and easier to resume after interruption.&lt;/p&gt;

&lt;p&gt;That is a different kind of efficiency than raw speed. It is structural efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  Governance Was Not Paperwork
&lt;/h2&gt;

&lt;p&gt;Once the work had a shape, the repository needed a way to preserve that shape.&lt;/p&gt;

&lt;p&gt;That is where governance entered the story.&lt;/p&gt;

&lt;p&gt;FPA formalized Feature Tracking Governance. Features were no longer just completed and forgotten. They were classified by category, complexity, burn, leverage, locality, architectural impact, and reassessment value. Work was organized into batches of four completed features before reassessment.&lt;/p&gt;

&lt;p&gt;That sounds procedural until a long-running AI session is involved. Then it becomes memory infrastructure.&lt;/p&gt;

&lt;p&gt;The governance artifacts made the experiment less dependent on conversational memory. They gave future sessions a way to recover what had happened, why it mattered, and where the next work should stay localized. They also helped preserve the difference between confirmed findings, strong evidence, and open questions.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;smcpp&lt;/code&gt; lifecycle served a similar role. Stage, commit, push, PR, merge, and prune became a named operational convention. In ordinary solo development, that might look like hygiene. In this experiment, it became part of survivability.&lt;/p&gt;

&lt;p&gt;The confirmed finding was that governance and discoverability improvements compounded future leverage. That is exactly what happened when Pipeline Metadata kept accumulating inspection surfaces. The system became more legible as work progressed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fogkuphrzue86h4vmeo3j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fogkuphrzue86h4vmeo3j.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Figure 7: The evidence tiers are part of the result. Confirmed findings, strong evidence, and open questions stay separate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Model Result Was Interesting For A Different Reason
&lt;/h2&gt;

&lt;p&gt;The experiment observed &lt;code&gt;GPT-5.5 Reasoning Medium&lt;/code&gt; and &lt;code&gt;GPT-5.4 Reasoning Low&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The tempting conclusion would be to turn that into a model horse race. That would be the wrong reading.&lt;/p&gt;

&lt;p&gt;The preserved record does not prove that &lt;code&gt;GPT-5.4 Low&lt;/code&gt; was more efficient than &lt;code&gt;GPT-5.5 Medium&lt;/code&gt;. That specific claim was explicitly rejected as unproven. There was no benchmark-grade dataset with complete token usage, latency, defect rate, review burden, and quality measurements across both models.&lt;/p&gt;

&lt;p&gt;The useful observation is narrower and more architectural: lower reasoning settings performed better than expected inside mature bounded contexts.&lt;/p&gt;

&lt;p&gt;That is worth paying attention to. If a repository is shaped well enough, the model may need less reasoning power to make useful progress. Not because the task is trivial, but because the environment has fewer unresolved questions. Ownership is clearer. Tests are closer. Patterns are established. The blast radius is smaller.&lt;/p&gt;

&lt;p&gt;This does not prove model equivalence. It suggests a better research question:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;What repository conditions make lower reasoning settings viable?&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That question is more useful to engineering leaders than a generic model ranking. It points back to architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Then The Repository Broke
&lt;/h2&gt;

&lt;p&gt;The experiment also included a power-loss Git corruption event during active development.&lt;/p&gt;

&lt;p&gt;By then, the experiment had already moved from context to architecture and governance. The outage tested whether that shift mattered under pressure.&lt;/p&gt;

&lt;p&gt;The recovery narrative preserved non-destructive repair, branch/ref recovery, working-tree preservation, validation rerun, PR merge, and cleanup. MIL recovery status preserved repository healthy, expected branch, remote aligned, working tree clean, validation passed, and API surface restored. The broader experiment record preserves successful recovery of both repositories.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g1oqqu0sg37xs28drdc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0g1oqqu0sg37xs28drdc.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Figure 8: The power-loss event turned process discipline into a real recovery test.&lt;/p&gt;

&lt;p&gt;This mattered because the outage tested the experiment’s claims under pressure. Governance is easiest to dismiss before something goes wrong. After corruption, the value of recoverable state, validation discipline, branch hygiene, and bounded work becomes much less theoretical.&lt;/p&gt;

&lt;p&gt;The archive does not preserve exact elapsed recovery time. It also does not prove which governance element mattered most. Branch policy, validation policy, bootstrap artifacts, lifecycle discipline, and bounded-context locality may all have contributed.&lt;/p&gt;

&lt;p&gt;The confirmed finding is narrower: strong process discipline improved disaster recovery, and recovery success depended heavily on repository governance and validation discipline.&lt;/p&gt;

&lt;p&gt;That is enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  What The Experiment Actually Supports
&lt;/h2&gt;

&lt;p&gt;The most important thing about this experiment is not the most dramatic story. It is the evidence discipline.&lt;/p&gt;

&lt;p&gt;Confirmed findings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Architectural locality appeared to be the dominant efficiency multiplier.&lt;/li&gt;
&lt;li&gt;Governance and discoverability improvements compounded future leverage.&lt;/li&gt;
&lt;li&gt;Feature Tracking Governance became necessary.&lt;/li&gt;
&lt;li&gt;Pipeline Metadata emerged as a bounded context.&lt;/li&gt;
&lt;li&gt;Strong process discipline improved disaster recovery.&lt;/li&gt;
&lt;li&gt;Recovery success depended heavily on repository governance and validation discipline.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Strong evidence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Low-to-medium burn work is more likely inside mature locality clusters.&lt;/li&gt;
&lt;li&gt;Repeated Pipeline Metadata features produced compounding leverage.&lt;/li&gt;
&lt;li&gt;Compaction was more valuable for continuity than throughput.&lt;/li&gt;
&lt;li&gt;Lower reasoning settings became more viable inside mature bounded contexts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Open questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How much gain came from locality versus familiarity?&lt;/li&gt;
&lt;li&gt;What exact differences existed between &lt;code&gt;GPT-5.5 Reasoning Medium&lt;/code&gt; and &lt;code&gt;GPT-5.4 Reasoning Low&lt;/code&gt;?&lt;/li&gt;
&lt;li&gt;When should Pipeline Metadata be extracted?&lt;/li&gt;
&lt;li&gt;How structurally similar was MIL Export Quality Hardening to FPA Pipeline Metadata?&lt;/li&gt;
&lt;li&gt;What exact context-window sizes, timestamps, and model mappings were not preserved?&lt;/li&gt;
&lt;li&gt;Which governance element mattered most during recovery?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftd4hv69uzzpefkzs589i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftd4hv69uzzpefkzs589i.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Figure 9: The final claims are strongest when the article keeps evidence tiers visible.&lt;/p&gt;

&lt;p&gt;This separation keeps the story honest. It lets the confirmed findings stay strong because they are not carrying unsupported claims.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Lesson For Engineering Leaders
&lt;/h2&gt;

&lt;p&gt;The practical lesson is not “buy more context.” It is not “compact harder.” It is not “use the biggest model for everything.”&lt;/p&gt;

&lt;p&gt;The lesson is that AI-assisted development efficiency depends heavily on the shape of the system around the model.&lt;/p&gt;

&lt;p&gt;A repository that is local, discoverable, governed, and validated gives the agent less ambiguity to resolve. It turns memory into artifacts. It makes compaction survivable because operational truth is not trapped in the chat. It makes lower reasoning settings more plausible because the problem surface is better bounded. It makes recovery more realistic because the work has structure outside the session.&lt;/p&gt;

&lt;p&gt;For senior engineers and architects, the implication is direct: if you want better AI-assisted development, do not only tune prompts. Tune the architecture.&lt;/p&gt;

&lt;p&gt;Create bounded contexts. Keep ownership visible. Make dependencies inspectable. Preserve validation paths. Record feature batches. Separate confirmed findings from strong evidence. Treat governance as part of the engineering system, not as administrative residue.&lt;/p&gt;

&lt;p&gt;The most interesting outcome wasn't discovering that context mattered.&lt;/p&gt;

&lt;p&gt;Most engineers already suspected that.&lt;/p&gt;

&lt;p&gt;The interesting outcome was discovering that architecture appeared to matter more.&lt;/p&gt;

&lt;p&gt;The repositories that became easier to understand, validate, inspect, and recover also became easier for AI to work within.&lt;/p&gt;

&lt;p&gt;That observation may ultimately be more valuable than any individual context-window measurement.&lt;/p&gt;

&lt;p&gt;The experiment began with context growth, context burn, compaction, and survivability. Those were real concerns. They still matter.&lt;/p&gt;

&lt;p&gt;But the deeper result was that context efficiency appeared to be downstream of architectural clarity.&lt;/p&gt;

&lt;p&gt;The experiment started as a context study.&lt;/p&gt;

&lt;p&gt;It ended as an architecture study.&lt;/p&gt;




&lt;p&gt;If you're experimenting with AI-assisted development workflows, I'd be interested in hearing what you've observed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Have context size, model choice, architecture, governance, or something else had the biggest impact on your results?
&lt;/h2&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>productivity</category>
      <category>devops</category>
    </item>
    <item>
      <title>Building a Practical Home Lab Starter Kit for Network Engineers</title>
      <dc:creator>Brent Fowler</dc:creator>
      <pubDate>Fri, 22 May 2026 03:24:00 +0000</pubDate>
      <link>https://dev.to/brentf_io/building-a-practical-home-lab-starter-kit-for-network-engineers-5f43</link>
      <guid>https://dev.to/brentf_io/building-a-practical-home-lab-starter-kit-for-network-engineers-5f43</guid>
      <description>&lt;p&gt;A lot of network engineers learn their best lessons in home labs, especially the lessons that do not fit neatly into certification tracks or production change windows. &lt;/p&gt;

&lt;p&gt;They are also where things can get messy quickly.&lt;/p&gt;

&lt;p&gt;One folder has topology notes. Another has Ansible experiments. A diagram lives somewhere else.&lt;/p&gt;

&lt;p&gt;Remote access was configured once and then forgotten. Screenshots include details that should not be shared publicly.&lt;/p&gt;

&lt;p&gt;The lab works, but it is hard to rebuild, explain, or safely publish.&lt;/p&gt;

&lt;p&gt;I built the Practical Home Lab Starter Kit to make that problem smaller and more repeatable.&lt;/p&gt;

&lt;p&gt;The repo is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://brentf.io/labs" rel="noopener noreferrer"&gt;Practical Home Lab Starter Kit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is not a production network design or a claim that there is one right way to build a lab.&lt;/p&gt;

&lt;p&gt;It is a practical starting point for learning, documenting, validating, and sharing a Linux-based network engineering lab with fewer loose ends.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Built It
&lt;/h2&gt;

&lt;p&gt;I am a network engineer focused on Linux infrastructure, automation, operational workflows, and continuous learning.&lt;/p&gt;

&lt;p&gt;A lot of my best learning happens when I build something, break it in a controlled way, document what happened, and make the next pass cleaner.&lt;/p&gt;

&lt;p&gt;That is the mindset behind this project.&lt;/p&gt;

&lt;p&gt;I enjoy exploring new tools and workflows, but I also want the result to be understandable later. A lab should help you learn today without becoming a mystery system six months from now.&lt;/p&gt;

&lt;p&gt;This starter kit came from a simple observation: many people want to learn network automation, Linux, and lab security, but the first barrier is not always the technology itself.&lt;/p&gt;

&lt;p&gt;Sometimes the barrier is structure.&lt;/p&gt;

&lt;p&gt;Questions come up early:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What should the Linux host look like?&lt;/li&gt;
&lt;li&gt;Where should the topology be documented?&lt;/li&gt;
&lt;li&gt;How should GNS3, management access, and Ansible fit together?&lt;/li&gt;
&lt;li&gt;What should be validated before changing anything?&lt;/li&gt;
&lt;li&gt;What is safe to show publicly?&lt;/li&gt;
&lt;li&gt;How do I keep the lab useful without turning it into a fragile one-off setup?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal of this repo is to give those questions a starting framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem It Solves
&lt;/h2&gt;

&lt;p&gt;A useful home lab should be more than a place where commands happen.&lt;/p&gt;

&lt;p&gt;It should help you practice habits that transfer into real engineering work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;documenting the system before it grows too large to explain&lt;/li&gt;
&lt;li&gt;separating management access from lab experimentation&lt;/li&gt;
&lt;li&gt;using Linux as a stable operations base&lt;/li&gt;
&lt;li&gt;validating before automating&lt;/li&gt;
&lt;li&gt;treating Ansible as a repeatable workflow tool, not just a configuration hammer&lt;/li&gt;
&lt;li&gt;keeping public examples sanitized&lt;/li&gt;
&lt;li&gt;making diagrams and checklists part of the build process&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the value I want this project to provide to the broader community.&lt;/p&gt;

&lt;p&gt;Whether someone is new to network engineering, learning Linux administration, experimenting with GNS3, or trying to get more comfortable with Ansible, the repo should offer a clear path.&lt;/p&gt;

&lt;p&gt;It should not assume a large budget or a production environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  What The Repo Includes
&lt;/h2&gt;

&lt;p&gt;The starter kit includes a public foundation for a small Linux-based network engineering lab, grouped around a few practical areas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lab foundation: Linux host setup guidance, GNS3 setup notes, and example topology documentation&lt;/li&gt;
&lt;li&gt;Security baseline: remote access guidance, SSH hardening notes, and UFW firewall examples&lt;/li&gt;
&lt;li&gt;Automation workflow: sanitized Ansible inventory examples and read-only validation playbooks&lt;/li&gt;
&lt;li&gt;Documentation assets: Mermaid diagrams, technical diagram references, and screenshot/video workflow notes&lt;/li&gt;
&lt;li&gt;Publishing guardrails: local validation scripts plus publication and redaction checklists&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The intent is to keep the repo useful even before someone has built every part of the lab.&lt;/p&gt;

&lt;p&gt;You can read through the architecture, copy the sanitized templates, adapt the checklists, and use the validation approach in your own environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture At A High Level
&lt;/h2&gt;

&lt;p&gt;The reference architecture is intentionally small:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Remote admin workstation
  |
  | SSH over trusted local network or private access path
  v
Linux lab host
  |-- GNS3 server or GNS3 support role
  |-- Ansible control workflow
  |-- UFW firewall baseline
  |-- SSH administration
  |
  +-- Private management network
        |-- virtual router
        |-- virtual switch
        |-- additional lab nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Linux host is the anchor. GNS3 provides the network devices.&lt;br&gt;
Ansible gives you a repeatable way to validate and inspect the lab. Remote access is treated as something to design carefully, not something to bolt on casually.&lt;/p&gt;

&lt;p&gt;The idea is to keep the operational workflow understandable before scaling the topology. One virtual router, one virtual switch, one management network, and a few read-only Ansible checks can teach a lot.&lt;/p&gt;

&lt;p&gt;After that works, you can expand with more vendors, routing protocols, backup workflows, monitoring, or security tooling.&lt;br&gt;
That is intentional. The first version is small because understandable beats complex early on, and repeatable beats large.&lt;/p&gt;

&lt;p&gt;Once the baseline is clear, scaling the lab becomes a deliberate engineering choice instead of a pile of accidental dependencies.&lt;/p&gt;
&lt;h2&gt;
  
  
  Visual References
&lt;/h2&gt;

&lt;p&gt;The README includes a simple overview image for the project:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5avr7a7q6yb9b9k0l0f9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5avr7a7q6yb9b9k0l0f9.png" alt="Practical Home Lab Starter Kit overview" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The repo also includes sanitized technical diagram references:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lab topology layout&lt;/li&gt;
&lt;li&gt;remote access flow&lt;/li&gt;
&lt;li&gt;Ansible control workflow&lt;/li&gt;
&lt;li&gt;validation and documentation flow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is also a local validation screenshot in the repo that shows the basic guardrail workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpv8pk7507ccccl07k6ft.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpv8pk7507ccccl07k6ft.png" alt="Local validation screenshot" width="572" height="155"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Beginner-Friendly Build Path
&lt;/h2&gt;

&lt;p&gt;If you are newer to this kind of lab, I would not start by trying to automate everything.&lt;/p&gt;

&lt;p&gt;Security should not be an afterthought.&lt;/p&gt;

&lt;p&gt;Even in a home lab, remote access, firewall policy, user access, and public screenshots should be considered early.&lt;/p&gt;

&lt;p&gt;I would start here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build or choose a Linux lab host.&lt;/li&gt;
&lt;li&gt;Document the host role, network layout, and intended access model.&lt;/li&gt;
&lt;li&gt;Apply a basic security baseline before exposing or expanding services:

&lt;ul&gt;
&lt;li&gt;update the host&lt;/li&gt;
&lt;li&gt;review local users&lt;/li&gt;
&lt;li&gt;configure SSH intentionally&lt;/li&gt;
&lt;li&gt;define initial UFW or firewall rules&lt;/li&gt;
&lt;li&gt;avoid broad remote access&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Install and test GNS3 with a small local topology.&lt;/li&gt;
&lt;li&gt;Confirm management reachability manually.&lt;/li&gt;
&lt;li&gt;Add a sanitized Ansible inventory.&lt;/li&gt;
&lt;li&gt;Run read-only Ansible checks.&lt;/li&gt;
&lt;li&gt;Capture diagrams and screenshots only after reviewing them for private details.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That sequence keeps the lab understandable.&lt;/p&gt;

&lt;p&gt;It also helps avoid a common failure mode: troubleshooting Linux, GNS3, SSH, firewall rules, inventory files, credentials, network reachability, and automation logic all at the same time.&lt;/p&gt;
&lt;h2&gt;
  
  
  Validation Before Automation
&lt;/h2&gt;

&lt;p&gt;One of the strongest habits I want this repo to reinforce is validation-first work.&lt;/p&gt;

&lt;p&gt;Before publishing changes or sharing examples, the repo uses basic checks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./scripts/validate.sh
./scripts/redaction-check.sh
bash &lt;span class="nt"&gt;-n&lt;/span&gt; scripts/&lt;span class="k"&gt;*&lt;/span&gt;.sh
git diff &lt;span class="nt"&gt;--check&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These checks are intentionally lightweight.&lt;/p&gt;

&lt;p&gt;They do not replace human review, but they create a repeatable baseline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;required files exist&lt;/li&gt;
&lt;li&gt;key documentation terms are present&lt;/li&gt;
&lt;li&gt;shell scripts parse correctly&lt;/li&gt;
&lt;li&gt;obvious sensitive patterns are flagged&lt;/li&gt;
&lt;li&gt;whitespace issues are caught before commit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a public learning repo, that kind of guardrail matters.&lt;/p&gt;

&lt;p&gt;It also makes the project easier for other people to trust, review, and adapt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security And Sanitization Notes
&lt;/h2&gt;

&lt;p&gt;The project is designed to stay sanitized.&lt;/p&gt;

&lt;p&gt;Public examples should not include secrets, tokens, private keys, pre-shared keys, real usernames, hostnames, public IP addresses, account data, or private environment details.&lt;/p&gt;

&lt;p&gt;The examples use placeholder values like these:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lab-host
lab-r1
lab-sw1
labadmin
10.10.10.0/24
lab.example
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This makes the repo easier to share and discuss.&lt;/p&gt;

&lt;p&gt;It also encourages a habit that matters outside of home labs: separate useful technical explanation from private operational detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who This Might Help
&lt;/h2&gt;

&lt;p&gt;I built this with network engineers in mind, but I think it can help a wider group:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;students building their first serious lab&lt;/li&gt;
&lt;li&gt;help desk or systems engineers moving toward networking&lt;/li&gt;
&lt;li&gt;network engineers learning Linux and automation&lt;/li&gt;
&lt;li&gt;Linux admins who want to understand network lab workflows&lt;/li&gt;
&lt;li&gt;security learners who need a controlled place to test tools&lt;/li&gt;
&lt;li&gt;anyone trying to document a home lab without exposing private details&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The common thread is not job title.&lt;/p&gt;

&lt;p&gt;It is the desire to build something practical, repeatable, and safe to explain.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Is Not
&lt;/h2&gt;

&lt;p&gt;This is not a production blueprint.&lt;/p&gt;

&lt;p&gt;It is not a full enterprise lab.&lt;/p&gt;

&lt;p&gt;It is not a promise that one set of tools fits every environment.&lt;/p&gt;

&lt;p&gt;It is a starting kit: opinionated enough to be useful, but small enough to adapt.&lt;/p&gt;

&lt;p&gt;If you want to explore the project, the repo is available here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://brentf.io/labs" rel="noopener noreferrer"&gt;Practical Home Lab Starter Kit&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feedback is welcome. I would especially like to hear from people who are building or improving their own labs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What would make a starter kit like this more useful?&lt;/li&gt;
&lt;li&gt;Which parts of home lab documentation are hardest to keep current?&lt;/li&gt;
&lt;li&gt;When you build a lab, do you start with diagrams, checklists, scripts, or hands-on testing?&lt;/li&gt;
&lt;li&gt;What security baseline do you apply before enabling remote access?&lt;/li&gt;
&lt;li&gt;What would help someone learning Linux, GNS3, Ansible, or remote access for the first time?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My goal is for this to become a practical community resource: useful for beginners, still relevant for working engineers, and careful about security from the start.&lt;/p&gt;

&lt;p&gt;If you have built something similar, I would be interested in what worked, what did not, and what you wish you had documented earlier.&lt;/p&gt;

</description>
      <category>networkengineering</category>
      <category>linux</category>
      <category>ansible</category>
      <category>gns3</category>
    </item>
  </channel>
</rss>
