<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Adedoyinsola Ogungbesan</title>
    <description>The latest articles on DEV Community by Adedoyinsola Ogungbesan (@surfiniaburger).</description>
    <link>https://dev.to/surfiniaburger</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1113528%2F20cecb12-ee73-45cf-b163-19dd2e828e3e.jpeg</url>
      <title>DEV Community: Adedoyinsola Ogungbesan</title>
      <link>https://dev.to/surfiniaburger</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/surfiniaburger"/>
    <language>en</language>
    <item>
      <title>Our Comeback Story</title>
      <dc:creator>Adedoyinsola Ogungbesan</dc:creator>
      <pubDate>Sun, 07 Jun 2026 13:40:35 +0000</pubDate>
      <link>https://dev.to/surfiniaburger/our-comeback-story-19ab</link>
      <guid>https://dev.to/surfiniaburger/our-comeback-story-19ab</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github-2026-05-21"&gt;GitHub Finish-Up-A-Thon Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;This project began life as an Agentbeats platform template for the AgentX-AgentBeats competition (initial commits in Nov 2025). At the time it was a straightforward pro/con debater and judge wired to local models (Ollama). Over the last several months I reworked the harness into &lt;code&gt;silver-one&lt;/code&gt;: a reproducible, auditable pipeline for generating evidence-grounded code-security reasoning data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fby1d82v0kp8kqg7cwum1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fby1d82v0kp8kqg7cwum1.png" alt="Initial rest point" width="800" height="517"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Key ideas implemented:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;BARRED (Boundary Adversarial Reasoning for Reproducible Evaluation and Dataset generation) — an agent debate + verifier architecture that records every changing input for replay and audit.&lt;/li&gt;
&lt;li&gt;Deterministic replay: &lt;code&gt;LLMCassette&lt;/code&gt;/&lt;code&gt;ReplayManager&lt;/code&gt; to record prompts, responses, and run state so outputs are reproducible and debuggable.&lt;/li&gt;
&lt;li&gt;Offline B-gate: &lt;code&gt;offline_b_gate.py&lt;/code&gt; computes quality gates (structural completeness, anchor grounding, predicate aboutness, verifier parse/pass) and produces &lt;code&gt;artifacts/metrics/*.json&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Telemetry and token accounting: per-stage and per-model token totals so we can optimize cost vs quality.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why this matters: synthetic security corpora are easy to generate but hard to trust. &lt;code&gt;silver-one&lt;/code&gt; treats generation settings, verifier outcomes, rejected attempts, and run checkpoints as first-class artifacts — enabling training data to be audited, replayed, and improved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd7p5vnfkypvm86ibt68i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd7p5vnfkypvm86ibt68i.png" alt="Current Workflow" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The repo includes scripts and instructions to run the BARRED stack locally (see &lt;code&gt;scenarios/debate/start_stack.sh&lt;/code&gt; and &lt;code&gt;scenarios/debate/run_batch.py&lt;/code&gt;). Example B-gate computation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# start judge + participants&lt;/span&gt;
./scenarios/debate/start_stack.sh

&lt;span class="c"&gt;# run a clocked batch and export attempts + corpus&lt;/span&gt;
uv run python scenarios/debate/run_batch.py &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--run-id&lt;/span&gt; pilot-v1-calibrated-d &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--seed&lt;/span&gt; 42 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--mode&lt;/span&gt; record &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--clock-now&lt;/span&gt; 2026-06-07T14:12:00Z &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--seeds&lt;/span&gt; scenarios/debate/cve_seeds_test.jsonl &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; training_corpus_calibrated_d.jsonl &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--attempts-out&lt;/span&gt; artifacts/attempts/pilot-v1-calibrated-d.jsonl


&lt;span class="c"&gt;# compute B-gate metrics&lt;/span&gt;
./scripts/run_b_gate.sh &lt;span class="se"&gt;\&lt;/span&gt;
  training_corpus_calibrated_d.jsonl &lt;span class="se"&gt;\&lt;/span&gt;
  artifacts/attempts/pilot-v1-calibrated-d.jsonl &lt;span class="se"&gt;\&lt;/span&gt;
  artifacts/metrics/b_gate-pilot-v1-calibrated-d.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The harness is CLI-first and instrumented&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8puz7nazo1dzycebk950.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8puz7nazo1dzycebk950.png" alt="cli view 1" width="800" height="517"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1lo72diock7rvdb4x6x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1lo72diock7rvdb4x6x.png" alt="cli view 2" width="800" height="517"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Comeback Story
&lt;/h2&gt;

&lt;p&gt;Where it started: the project was a small Agentbeats demo wired to Ollama models (Nov 2025). Over time, the repo grew into a research harness as I discovered that simple debate + label workflows produced many ungrounded labels.&lt;/p&gt;

&lt;p&gt;What I changed and finished up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hardened structured-output parsing and repair to avoid malformed JSON rows.&lt;/li&gt;
&lt;li&gt;Added deterministic replay (&lt;code&gt;ReplayManager&lt;/code&gt;) so the same prompts and responses can be replayed and audited.&lt;/li&gt;
&lt;li&gt;Implemented a verifier agent and wired it into judge gating; verifier reports are now used to improve grounding and reduce hallucinated mechanism claims.&lt;/li&gt;
&lt;li&gt;Built &lt;code&gt;offline_b_gate.py&lt;/code&gt; to compute reproducible quality metrics and surface failure modes (anchor normalization, mechanism grounding, predicate aboutness).&lt;/li&gt;
&lt;li&gt;Added telemetry (per-stage tokens, per-model usage) so we can optimize generator context size without losing auditability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result: the harness now produces a small high-fidelity corpus (examples: &lt;code&gt;training_corpus_calibrated_a.jsonl&lt;/code&gt;…&lt;code&gt;_d.jsonl&lt;/code&gt;) with accompanying &lt;code&gt;artifacts/metrics/b_gate-pilot-v1-calibrated-*.json&lt;/code&gt; showing gate pass and detailed telemetry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inspiration &amp;amp; Context
&lt;/h2&gt;

&lt;p&gt;The redesign was strongly inspired by the Google DeepMind AGI hackathon and the Metacognitive research conversations there: we wanted to observe how models behave under adversarial conditions and make those behaviours auditable. I also leaned on practical tutorials and engineering approaches (e.g., Daily Dose of Data Science, Dave Farley / Modern Software Engineering) to make the system deterministic and "test-easy by design" — small, replayable units, strong structured-output parsing, and clear telemetry.&lt;/p&gt;

&lt;p&gt;This work is directly connected to the Metacognitive Coding Safety Benchmark (MCSB) — which was our submission to the Google DeepMind AGI hackathon. MCSB's multi-tier structure (pilot/core/adversarial) and its focus on directional confidence updates shaped our Tiered experiment design and the offline B-gate evaluation. &lt;/p&gt;

&lt;h3&gt;
  
  
  A note on cassettes — why the metaphor matters
&lt;/h3&gt;

&lt;p&gt;Peter Quill's mixtape in Guardians of the Galaxy is a tiny, precious archive that preserves memory, identity, and the reason a song feels like "the best." Our &lt;code&gt;LLMCassette&lt;/code&gt; serves a similar purpose for &lt;code&gt;silver-one&lt;/code&gt;: it preserves the prompt, model responses, and run state so the project's decisions remain auditable, replayable, and emotionally intelligible. Treating generation traces as a cassette makes the system kinder to debugging and kinder to future researchers — you can rewind to the exact moment a label was created, listen to the context, and understand why a model thought a particular answer was "the best." &lt;/p&gt;

&lt;h2&gt;
  
  
  Kaggle Task &amp;amp; Repo
&lt;/h2&gt;

&lt;p&gt;We published a companion Kaggle benchmark task for predicate-quality evaluation that lets you test multiple models locally and run remote benchmark jobs from the CLI. Task link:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.kaggle.com/benchmarks/tasks/surfiniaburger/silver-one-predicate-quality" rel="noopener noreferrer"&gt;https://www.kaggle.com/benchmarks/tasks/surfiniaburger/silver-one-predicate-quality&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To push and run the task from the repository (example from &lt;code&gt;kaggle_notebooks/silver_one_predicate_quality_task.py&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kaggle b t push silver-one-predicate-quality &lt;span class="nt"&gt;-f&lt;/span&gt; kaggle_notebooks/silver_one_predicate_quality_task.py &lt;span class="nt"&gt;--wait&lt;/span&gt;
kaggle b t run  silver-one-predicate-quality &lt;span class="nt"&gt;-m&lt;/span&gt; gemini-3.5-flash &lt;span class="nt"&gt;--wait&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repository (source): &lt;a href="https://github.com/surfiniaburger/silver-one" rel="noopener noreferrer"&gt;https://github.com/surfiniaburger/silver-one&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  My Experience with GitHub Copilot
&lt;/h2&gt;

&lt;p&gt;GitHub Copilot (and iterative local editing) helped speed up refactors and suggested tests and doc improvements during the finish-up. I used the suggestions as a productivity assist rather than an authoritative change — every structural tweak was followed by running the deterministic smoke path and validating metric artifacts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Files I relied on to finish this up
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;README.md&lt;/code&gt; — project summary and quick start&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pilot_report.md&lt;/code&gt; — pilot run analysis, verifier-era notes, and telemetry interpretation&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;scenarios/debate/*&lt;/code&gt; — implementation of BARRED, judge, verifier, data generator, batch runner&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;src/agentbeats/replay.py&lt;/code&gt; and &lt;code&gt;src/agentbeats/structured_output.py&lt;/code&gt; — deterministic replay and structured-output repair&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;artifacts/metrics/b_gate-pilot-v1-calibrated-*.json&lt;/code&gt; — final calibrated metrics (used to summarize improvements)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where to go from here
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Improve anchor normalization to increase accepted yield without raising predicate failures.&lt;/li&gt;
&lt;li&gt;Add CI guardrails to limit &lt;code&gt;generator_boundary&lt;/code&gt; prompt sizes and prevent runaway token use.&lt;/li&gt;
&lt;li&gt;Expand verifier capabilities toward bit-level data-flow tracing for stronger mechanism grounding.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  References &amp;amp; Related Docs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate&lt;/strong&gt; — Arnon Mazza*, Elad Levi* (Plurai Inc.), Preprint Jan 21, 2026. Role: scenario specification and debate-based synthetic-data generation algorithm; served as the blueprint for the BARRED scenario, gating rules, and the offline B-gate implementation used in this repo. (*equal contribution)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pioneer Agent: Continual Improvement of Small Language Models in Production&lt;/strong&gt; — Dhruv Atreja, Julia White, Nikhil Nayak, Kelton Zhang, Henrijs Princis, George Hurn-Maloney, Ash Lewis, Urchade Zaratiana (Fastino Labs), arXiv:2604.09791, Apr 10, 2026. Role: engineering systems paper that inspired telemetry-driven adaptation loops used in our evaluation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Figures — Key Metrics Visuals
&lt;/h2&gt;

&lt;p&gt;The following figures were generated from &lt;code&gt;artifacts/metrics&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fytpyh8wqbezivdoxqbuj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fytpyh8wqbezivdoxqbuj.png" alt="Attempts vs Accepted" width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Attempts vs Accepted rows by run — shows yield and how many attempts were required per accepted corpus row.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F44jelms42uj61q77pffk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F44jelms42uj61q77pffk.png" alt="Predicate &amp;amp; B2 Fail Rates" width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Predicate-fail and B2 strict-fail by run — highlights quality improvements and failure modes across runs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyoxc9ar8p4xdql0aqdgz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyoxc9ar8p4xdql0aqdgz.png" alt="Verifier Rates" width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verifier called rate and verifier pass rate by run — shows verifier coverage and effectiveness.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6nf2x79hfnjc1dopioqf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6nf2x79hfnjc1dopioqf.png" alt="Cost vs Quality" width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tokens per accepted row vs predicate-fail (point size = accepted rows) — visualizes the cost/quality tradeoff.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft59wgzgct2dwos8batrn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft59wgzgct2dwos8batrn.png" alt="Stage Token Breakdown" width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stacked token usage by stage (&lt;code&gt;generator_boundary&lt;/code&gt;, &lt;code&gt;generator_refine&lt;/code&gt;, &lt;code&gt;judge_adjudication&lt;/code&gt;, &lt;code&gt;verifier_audit&lt;/code&gt;) — identifies where tokens are spent.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9pla98tt9j8nr1xssg8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn9pla98tt9j8nr1xssg8.png" alt="Model Token Share" width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Per-model token share by run — shows which models dominate token costs.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Thanks for reading thus far. Keep an eye on &lt;a href="https://www.in-varia.com" rel="noopener noreferrer"&gt;In-vari&lt;/a&gt; for more updates. &lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
    </item>
    <item>
      <title>Google I/O 2026 Wasn’t About AI Models — It Was About Infrastructure</title>
      <dc:creator>Adedoyinsola Ogungbesan</dc:creator>
      <pubDate>Sun, 24 May 2026 03:02:49 +0000</pubDate>
      <link>https://dev.to/surfiniaburger/google-io-2026-wasnt-about-ai-models-it-was-about-infrastructure-53me</link>
      <guid>https://dev.to/surfiniaburger/google-io-2026-wasnt-about-ai-models-it-was-about-infrastructure-53me</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-io-writing-2026-05-19"&gt;Google I/O Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I actually appointed myself as AGI Police some days back. On account of the number of times I've found myself sipping AI slop milkshake.&lt;/p&gt;

&lt;p&gt;As a reformed individual (with regulated slop intake), I keep listening to Yann LeCun's insights through several notable podcasts. He constantly reminds us that “LLMs in general cannot predict the consequences of their actions.”&lt;/p&gt;

&lt;p&gt;Google I/O 2026 made an impact, with lots of exciting announcements. Training across the largest clusters in the world. Over 7x more tokens processed every month. Bigger infrastructure. Faster inference. More intelligence delivered instantly to billions of people.&lt;/p&gt;

&lt;p&gt;But somewhere in the middle of all the demos and applause, I found myself thinking less about the models and more about the machinery underneath them.&lt;/p&gt;

&lt;p&gt;So I asked Google AI Mode to help calculate the energy consumption behind large-scale token processing and compare it to something human-sized.&lt;/p&gt;

&lt;p&gt;Here is what we found out in less than 30 seconds of processing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzifhpcnay4jq408skp3o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzifhpcnay4jq408skp3o.png" alt="We could power nearly 3 million light bulbs continuously, 24 hours a day, for an entire year" width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We could power nearly 3 million light bulbs continuously, 24 hours a day, for an entire year. Let that sink in.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What struck me wasn’t just the raw number itself. It was the inversion of intuition.&lt;/p&gt;

&lt;p&gt;AI feels weightless.&lt;/p&gt;

&lt;p&gt;You type words into a chat box and receive intelligence back in seconds. No smoke. No factory floor. No visible machinery. Just text appearing instantly on a glowing rectangle in your hand.&lt;/p&gt;

&lt;p&gt;But underneath that interface sits an industrial system consuming electricity, water, cooling infrastructure, and global semiconductor supply chains at unprecedented scale.&lt;/p&gt;

&lt;p&gt;Before I looked away, Gemini made another suggestion that made me even more curious. It suggested comparing the energy consumption with Nvidia infrastructure and also estimating the amount of water required to cool the servers powering the inference workloads.&lt;/p&gt;

&lt;p&gt;I indulged.&lt;/p&gt;

&lt;p&gt;And in less than 5 seconds (which means I was exaggerating when I said 30 seconds earlier), this happened:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6fyqyrdwl1imnkk4yj7b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6fyqyrdwl1imnkk4yj7b.png" alt="457 million litres matches the total annual water footprint of roughly 1,200 average household families." width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;457 million litres matches the total annual water footprint of roughly 1,200 average household families.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgso87ds8re2kt4b34hgg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgso87ds8re2kt4b34hgg.png" alt="Comparism between Nvidia power consumption and TPU power consumption" width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At that point, the conversation stopped feeling like a fun experiment and started feeling like a glimpse into the physical economics of intelligence itself.&lt;/p&gt;

&lt;p&gt;The real takeaway from Google I/O wasn’t simply that models are getting smarter.&lt;/p&gt;

&lt;p&gt;It was that intelligence is becoming infrastructure.&lt;/p&gt;

&lt;p&gt;Every prompt now has a physical cost attached to it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Electricity generation&lt;/li&gt;
&lt;li&gt;Water cooling systems&lt;/li&gt;
&lt;li&gt;Data center expansion&lt;/li&gt;
&lt;li&gt;Semiconductor fabrication&lt;/li&gt;
&lt;li&gt;TPU supply chains&lt;/li&gt;
&lt;li&gt;Thermal management at planetary scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the strange part is that users rarely see any of it.&lt;/p&gt;

&lt;p&gt;What fascinated me most was how Gemini framed the answers. Instead of treating the numbers like an alarming revelation, it immediately contextualized them against broader industry infrastructure. The response was technically useful, but it also revealed something subtle about AI systems: they do not merely answer questions; they shape how scale is emotionally interpreted.&lt;/p&gt;

&lt;p&gt;The answer, although accurate, still felt slightly biased — almost like the system instinctively softened the psychological impact of the numbers by normalizing them within the broader AI race.&lt;/p&gt;

&lt;p&gt;And honestly, I understand why.&lt;/p&gt;

&lt;p&gt;Because the benefit of knowledge being delivered at our fingertips is genuinely incredible.&lt;/p&gt;

&lt;p&gt;A student can learn quantum mechanics from a village with weak infrastructure. A founder can prototype an idea in hours instead of months. A developer can debug systems faster than ever before. The productivity gains are real.&lt;/p&gt;

&lt;p&gt;But so is the cost.&lt;/p&gt;

&lt;p&gt;For years, software scaled mostly through abstraction. AI may be the first mainstream computing paradigm where scaling intelligence also means scaling physical consumption in the real world.&lt;/p&gt;

&lt;p&gt;That may ultimately become the defining tradeoff of this era.&lt;/p&gt;

&lt;p&gt;The question after Google I/O is no longer just:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How intelligent can these systems become?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But also:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“What will it cost to sustain them?”&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>devchallenge</category>
      <category>googleiochallenge</category>
    </item>
    <item>
      <title>The First Modular AI Chain</title>
      <dc:creator>Adedoyinsola Ogungbesan</dc:creator>
      <pubDate>Mon, 04 Mar 2024 00:16:00 +0000</pubDate>
      <link>https://dev.to/surfiniaburger/the-first-modular-ai-chain-jge</link>
      <guid>https://dev.to/surfiniaburger/the-first-modular-ai-chain-jge</guid>
      <description>&lt;p&gt;In the vast expanse of the deep blue sea, where mysteries abound and the unknown beckons, lies a realm ripe for exploration and discovery. Much like intrepid explorers charting uncharted waters, 0G Labs ventures into the depths of innovation, pushing the boundaries of what's possible in the realm of decentralized technology. Join us as we embark on a journey through the depths, guided by 0G's pioneering spirit and modular technology.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Plunging into the Abyss: Understanding 0G's Modular Technology&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Beneath the surface lies a world of complexity and possibility, much like the intricate framework of 0G's modular technology. Here, developers and content creators are equipped with the tools to navigate the depths of decentralized AI applications. Picture it as a sturdy vessel, sturdy yet adaptable, allowing for seamless integration of AI components into the ever-shifting currents of innovation.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Discovering Hidden Treasures: Unleashing Creativity with 0G's Modular Canvas&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Just as the ocean conceals untold treasures within its depths, so too does 0G's modular canvas hold the promise of boundless creativity. Here, developers and content creators are invited to explore uncharted territories, bringing their boldest ideas to life in a realm where imagination knows no bounds. With an intuitive interface akin to a captain's chart and a vast library of modular components as diverse as the ocean's inhabitants, the canvas becomes a playground for innovation.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Navigating the Undercurrents: Building Decentralized AI Applications&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the depths of the ocean, currents ebb and flow, shaping the landscape in unexpected ways. Similarly, within the realm of decentralized AI applications, 0G's modular technology provides a framework that adapts to the ever-changing tides of innovation. Developers navigate these undercurrents with ease, leveraging the flexibility and scalability of 0G's platform to bring their creations to life.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Forging Alliances in the Deep: Fostering Collaboration and Community&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Just as explorers band together to conquer the challenges of the deep, so too does the 0G community come together to tackle the complexities of decentralized technology. Here, developers, artists, and visionaries unite, sharing insights, collaborating on projects, and supporting one another on their journey. Through open dialogue, shared resources, and a spirit of camaraderie, the community becomes a beacon of light in the darkness, guiding explorers through the depths.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Emerging into the Light: Shaping the Future of Innovation&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As explorers resurface from the depths, they bring with them newfound knowledge and discoveries that shape the course of history. Similarly, with 0G's modular technology, developers emerge from the depths of innovation, armed with groundbreaking creations that push the boundaries of what's possible. Together, we chart a course towards a brighter future, where the depths of creativity and the vast expanse of technology converge in harmony.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Conclusion: Charting a Course for Tomorrow's Innovators&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the boundless depths of the deep blue sea, and within the realm of decentralized technology, the journey is as important as the destination. With 0G Labs as our compass, we navigate uncharted waters, guided by a spirit of exploration and a commitment to innovation. Join us as we embark on this journey into the unknown, shaping the future of technology one discovery at a time.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Navigating the Decentralized Cosmos: A Journey with 0G's Modular Starship</title>
      <dc:creator>Adedoyinsola Ogungbesan</dc:creator>
      <pubDate>Sun, 03 Mar 2024 23:57:19 +0000</pubDate>
      <link>https://dev.to/surfiniaburger/navigating-the-decentralized-cosmos-a-journey-with-0gs-modular-starship-162</link>
      <guid>https://dev.to/surfiniaburger/navigating-the-decentralized-cosmos-a-journey-with-0gs-modular-starship-162</guid>
      <description>&lt;p&gt;Embark on a journey like no other with 0G Labs' modular starship, the gateway to the decentralized cosmos. In this blog post, we'll take you on a voyage through space and time, exploring the vast expanse of possibilities that await those who dare to dream and innovate with 0G's modular technology.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Launching into the Unknown: Exploring New Frontiers&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Just as explorers of old set sail for uncharted territories, so too do developers and content creators embark on a journey into the unknown with 0G's modular starship. With its modular architecture and interoperable design, the starship serves as a vessel for exploration, enabling users to navigate the decentralized cosmos with ease and discover new worlds of possibility.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Charting Your Course: Customizing Your Journey&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;No two journeys are alike, and with 0G's modular starship, users have the freedom to chart their own course through the decentralized cosmos. Whether you're exploring the depths of AI-powered virtual reality or charting a course through the blockchain galaxy, the modular starship puts the power of customization in your hands, allowing you to tailor your journey to your unique vision and objectives.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Forging Alliances: Collaborating Across the Cosmos&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As you traverse the decentralized cosmos, you'll encounter fellow explorers, innovators, and visionaries who share your passion for discovery and innovation. With 0G's modular starship, forging alliances and collaborating across the cosmos has never been easier. Whether you're joining forces to tackle a common challenge or pooling resources to explore new frontiers, the modular starship provides a platform for collaboration and cooperation that transcends boundaries and borders.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Pushing the Boundaries: Innovation Without Limits&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Innovation knows no bounds, and with 0G's modular starship, the sky's the limit. Whether you're pushing the boundaries of AI, blockchain, or virtual reality, the starship provides a platform for innovation without limits, empowering users to dream big, think boldly, and explore new horizons. With 0G's modular technology as your guide, the possibilities are limitless, and the journey is yours to define.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Conclusion: Beyond the Stars&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;As we journey through the decentralized cosmos with 0G's modular starship, we're reminded that the true essence of exploration lies not in the destination, but in the journey itself. With its modular architecture, interoperable design, and boundless potential, the starship serves as a beacon of innovation, guiding us toward new frontiers of possibility and paving the way for a brighter, more decentralized future for all.  &lt;a href="https://0g.ai/" rel="noopener noreferrer"&gt;https://0g.ai/&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>ML unification is all about family</title>
      <dc:creator>Adedoyinsola Ogungbesan</dc:creator>
      <pubDate>Wed, 05 Jul 2023 01:52:03 +0000</pubDate>
      <link>https://dev.to/surfiniaburger/ml-unification-is-all-about-family-53j8</link>
      <guid>https://dev.to/surfiniaburger/ml-unification-is-all-about-family-53j8</guid>
      <description>&lt;p&gt;In a never-ending twist and turns, we haven't seen the last of Dom and his ever-growing family.&lt;/p&gt;

&lt;p&gt;Machine learning like the Fast and Furious franchise keeps getting more fascinating as the day goes by. &lt;/p&gt;

&lt;p&gt;ML unification, according to OpenAI ChatGpt refers to the convergence and integration of various machine learning (ML) techniques, methodologies, and frameworks into a unified framework or ecosystem. It aims to create a standardized and cohesive ML development, deployment, and management approach.&lt;/p&gt;

&lt;p&gt;The need for ML unification arises from the growing complexity and diversity of ML models, algorithms, and tools. ML practitioners often work with different frameworks and libraries for specific tasks, such as deep learning, reinforcement learning, or natural language processing. This fragmented landscape can lead to inefficiencies, interoperability challenges, and duplication of efforts.&lt;/p&gt;

&lt;p&gt;ML unification addresses these challenges by providing a unified platform combining multiple ML techniques, frameworks, and tools. It involves creating common standards, interfaces, and protocols that enable seamless integration and collaboration across different ML domains.&lt;/p&gt;

&lt;p&gt;Dominic Toretto is to Fast and Furious as Ivy is to ML unification.&lt;/p&gt;

&lt;p&gt;Ivy is both an ML transpiler and a framework, currently supporting JAX, TensorFlow, PyTorch and Numpy.&lt;/p&gt;

&lt;p&gt;In consonance with Kapa.ai, Ivy unifies all ML frameworks enabling you not only to write code that can be used with any of these frameworks as the backend but also to convert any function, model or library written in any of them to your preferred framework. This makes it broadly applicable to a wide range of applications, from cutting-edge deep learning to more conventional machine learning, general numerical computing, and data analytics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TWIST&lt;/strong&gt;&lt;br&gt;
Dan Fu, Stanford University via Ivy paper reading group talked about FlashAttention.&lt;/p&gt;

&lt;p&gt;As stated by their paper on FlashAttention published a year ago on ArXiv, FlashAttention is an IO-aware exact attention algorithm that uses tiling to reduce the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM.&lt;/p&gt;

&lt;p&gt;Basically, transformer models are known as essential building blocks for natural language processing and image classification. they have grown larger and deeper, but equipping them with a longer context remains difficult,&lt;br&gt;
due to the self-attention module at their core&lt;br&gt;
having time and memory complexity quadratic in sequence length. In other words slow and less memory efficient. &lt;/p&gt;

&lt;p&gt;FlashAttention trains Transformers faster than existing baselines: 15% end-to-end wall-clock speedup on BERT-large (seq. length 512) compared to the MLPerf 1.1 training speed record, 3× speedup on GPT-2 (seq. length 1K), and 2.4× speedup on long-range arena (seq. length 1K-4K).&lt;/p&gt;

&lt;p&gt;FlashAttention and block-sparse FlashAttention enable longer context in Transformers, yielding higher quality models (0.7 better perplexities on GPT-2 and 6.4 points of lift on long-document classification) and entirely new&lt;br&gt;
capabilities: the first Transformers to achieve better-than-chance performance on the Path-X challenge&lt;br&gt;
(seq. length 16K, 61.4% accuracy) and Path-256 (seq. length 64K, 63.1% accuracy).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;EXIT&lt;/strong&gt;&lt;br&gt;
With more and more major advances being reported every day in the vast world of Machine learning. Accelerating your AI with one line of code is here to stay. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Additional Reading&lt;/strong&gt;&lt;br&gt;
unify.ai &lt;/p&gt;

&lt;p&gt;Reference(s)&lt;br&gt;
Dao, T., Fu, D. Y., Ermon, S., Rudra, A., &amp;amp; Ré, C. (2022). FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. ArXiv. /abs/2205.14135&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>machinelearning</category>
      <category>ivy</category>
      <category>github</category>
    </item>
  </channel>
</rss>
