DEV Community: Aashita

The SSE Fragmentation Catastrophe That Took Down CareerPilot AI

Aashita — Wed, 15 Jul 2026 09:51:27 +0000

This is a submission for DEV's Summer Bug Smash: Smash Stories powered by Sentry.

It was 11:14 PM.

My friend DM'd me on Twitter: "Your app just hung for 30 seconds, spun indefinitely, and then completely died."

I opened the browser DevTools console pointed at production and saw it — the screen flooded in red:

GET https://careerpilot-ai.run.app/api/analyze-career
net::ERR_INCOMPLETE_CHUNKED_ENCODING 200 (OK)

The Server-Sent Events stream powering CareerPilot AI was systematically collapsing on Google Cloud Run. And I had no idea why.

Locally on localhost:3000, the agentic pipeline was a masterpiece. The multi-stage reasoning logs streamed gracefully — Step 1 flowed into Step 6, the final structured JSON payload arrived within seconds, the UI lit up with a complete personalized career roadmap. Beautiful.

But once deployed behind Google's Front End (GFE) proxy, the pipeline was a graveyard of broken sockets.

The Architecture Under Fire

CareerPilot AI runs a six-stage agentic pipeline on every career analysis request. Instead of firing a single long-running prompt to Gemini and making the user stare at a blank screen for 20+ seconds, we designed a Server-Sent Events logging stream to broadcast real-time reasoning steps directly to the browser — giving the interface the feel of a live, active mentor thinking out loud.

Once the final stage (Self-Evaluation & Constraint Validation) completed, the backend constructed a massive, nested 15KB JSON payload containing the personalized roadmap: skill weightings, role benchmarks, resource links, and a 30-day milestone calendar.

Here was the delivery mechanism — and the landmine hiding inside it:

// server.ts — The vulnerable streaming channel
app.get("/api/analyze-career", async (req, res) => {
  res.setHeader("Content-Type", "text/event-stream");
  res.setHeader("Cache-Control", "no-cache");
  res.setHeader("Connection", "keep-alive");

  // Stream intermediate reasoning logs per step
  for (let step = 1; step <= 6; step++) {
    const log = await executeAgentStep(step);
    res.write(`data: ${JSON.stringify({ step, text: log })}\n\n`);
  }

  // The fatal moment: sending a 15KB structured JSON payload
  res.write(`data: ${JSON.stringify({ finalResult })}\n\n`);
  res.end();
});

Everything looked fine. Until it wasn't.

The First Fix That Made Everything Worse

Our first instinct was reasonable: the Express response buffer or the compression middleware was throttling chunks and causing a timeout. The fix seemed obvious — force-flush every payload as soon as it was written.

We imported the compression middleware and started manually calling res.flush() after every single log line:

// The "intuitive" (but catastrophic) mistake
res.write(`data: ${JSON.stringify({ step, text: log })}\n\n`);
res.flush(); // Force-flushed chunks into arbitrary TCP packet boundaries!

It made things infinitely worse.

Instead of hanging for 30 seconds and then failing, the client now crashed immediately on almost every single attempt. By aggressively forcing flushes on every minor string, we triggered something far uglier: violent TCP packet fragmentation.

The large final JSON payload was being split mid-byte across arbitrary network packet boundaries. Our client-side reader, which naively assumed that each incoming SSE chunk was a complete, fully-formed JSON object, was receiving this:

// Chunk 1 — Mutilated
{"finalResult":{"skills":[{"name":"Python","category":"foun

// Chunk 2 — The rest, arriving milliseconds later
dational","matched":true}],"weeklyPlan":[]}}

JSON.parse() threw a SyntaxError: Unexpected end of JSON input on Chunk 1. The entire React UI state machine shattered. The app froze. The user saw a broken loading screen.

We had fixed a timeout by turning it into an instant crash. Brilliant.

The Epiphany: We Were Fighting Two Separate Infrastructure Layers

That's when it clicked — we weren't fighting one bug. We were fighting two completely separate infrastructure forces compounding each other in production:

1. Proxy Buffering
Google Cloud Run's GFE proxy holds small SSE packets in a buffer waiting for more data before flushing downstream. Unless explicitly instructed otherwise, it optimizes throughput by batching chunks — not passing them through immediately. This explained the initial 30-second hang: the proxy was collecting all six of our log lines and sitting on them before sending anything to the client.

2. TCP Stream Fragmentation
An SSE stream is not a guaranteed series of atomic messages — it is a continuous raw byte stream. When sending a 15KB payload, the network transport layer is fully entitled — and will — split that across multiple TCP packets. The frontend must treat incoming data as a byte accumulator, not a ready-to-parse event queue.

We had fixed neither. Our flush attempt had only worsened the second while accidentally bypassing the first.

The Real Fix: A Dual-Sided Overhaul

The solution required changes on both sides of the wire simultaneously.

Server Fix: Bypass Proxy Buffering at the Header Level

We added the HTTP header that explicitly instructs modern reverse proxies — including Nginx and GFE — to pass bytes through immediately without buffering:

// server.ts — Patched SSE initialization
res.writeHead(200, {
  "Content-Type": "text/event-stream",
  "Cache-Control": "no-cache, no-transform", // 'no-transform' blocks compression proxies from buffering
  "Connection": "keep-alive",
  "X-Accel-Buffering": "no"                  // Explicit bypass for Nginx/GFE
});

We also disabled global Gzip compression specifically for the /api/analyze-career route, ensuring the compression layer wasn't holding bytes in its own internal buffer before the proxy even saw them.

Client Fix: Throw Away `onmessage`, Build a Byte Accumulator

We abandoned the naive EventSource.onmessage listener entirely. Instead, we wrote a low-level stream reader that accumulates incoming bytes in a staging buffer and only processes complete SSE frames — identified by double-newline (\n\n) boundaries — never attempting to parse a fragment:

// src/App.tsx — Boundary-safe chunk reassembly
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";

while (true) {
  const { value, done } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n\n");

  // Keep the last, potentially incomplete fragment staged in the buffer
  buffer = lines.pop() || "";

  for (const line of lines) {
    if (!line.startsWith("data: ")) continue;
    const jsonStr = line.replace("data: ", "").trim();
    try {
      const parsed = JSON.parse(jsonStr);
      // Safe: only complete, boundary-verified JSON ever reaches state
    } catch (e) {
      // Fragment guard: push back and wait for remaining bytes
      console.warn("Partial chunk detected — staging for next read cycle.");
    }
  }
}

The key insight: lines.pop() pulls the trailing incomplete fragment out of the processing loop and holds it in buffer until the next network read brings the rest of the bytes. No more mid-payload parsing. No more crashes.

The Results

After pushing both fixes to production, we ran a load test simulating 100 concurrent users hitting the full six-stage agentic pipeline simultaneously.

The numbers were stark:

Metric	Before	After
Pipeline Completion Rate	67%	99.2%
Time-to-First Log (FCP)	28 seconds	1.1 seconds
JSON Syntax Crashes	Constant	Zero

I DM'd the user back at 1:30 AM. They tested it again and replied: "okay yeah this is actually really cool now."

The Engineering Takeaway

The real lesson here wasn't about SSE or TCP fragmentation specifically. It was about the gap between local development and containerized production.

Locally, you are your own proxy. There is no GFE, no Nginx, no buffering layer between your Node process and the browser. Every byte arrives instantly. Every test passes. Everything feels perfect.

The moment you deploy behind a reverse proxy, your assumptions about when data arrives collapse. Streaming architectures that rely on chunk atomicity — SSE, WebSockets, chunked transfer encoding — need to be designed defensively from the first line of code, not patched reactively at 11 PM when a user DM slides in.

Build your readers as byte accumulators. Treat every incoming chunk as potentially incomplete. Use X-Accel-Buffering: no from day one on any SSE endpoint that goes near a proxy. And test in an environment that actually mirrors production — because localhost will lie to you every time.

CareerPilot AI is an open-source AI-powered career roadmap agent. You can explore the full codebase at github.com/aashitanegii/-CareerPilotAI and try the live app at careerpilot-ai-667889155113.asia-southeast1.run.app.

Solstice Panic! — Protect the Sun on the Longest Day of the Year

Aashita — Wed, 17 Jun 2026 14:36:45 +0000

This is a submission for the June Solstice Game Jam

What I Built

On June 21 — the summer solstice — the sun is at its most powerful. The longest day. The peak of light. And in Solstice Panic!, shadows are trying to steal it.

Solstice Panic! is a fast-paced arcade game where you defend a golden Sun core against waves of encroaching shadow blobs on the longest day of the year. Tap them, smash them, swipe through them — don't let a single one reach the center or the sun starts to dim. The day is long. The shadows keep coming. Keep the light alive.

But here's what makes it more than a clicker:

Every power-up in the game is a real June celebration. The Summer Solstice. Pride Month. Juneteenth. The FIFA World Cup 2026. International Sushi Day. World Bicycle Day. International Yoga Day. World Music Day. Each one drops into the arena as a collectible, triggers a visual spectacle, and shows an educational banner about the real event that inspired it.

The metaphor is intentional and direct: the sun is light, joy, and freedom. The shadows are everything that tries to extinguish them. On the solstice — the day light wins — you make sure it stays that way. Every June celebration you've ever heard of shows up as a weapon in your hands.

Play it here: https://solstice-panic-750247458250.asia-southeast1.run.app/

Video Demo

Code

aashitanegii / SolsticePanicGame-

☀️ A fast-paced June Solstice arcade game where players defend the Sun using celebration-themed power-ups inspired by real June events.

☀️ Solstice Panic!

Defend the solstice sun light against falling shadows on the longest day of the year!

A fast-paced arcade survival game inspired by the June Solstice and June celebrations.

🎮 About The Game

Solstice Panic! is a fast-paced arcade survival game inspired by the June Solstice and a month full of global celebrations. In this game, players are tasked with defending a glowing Sun from relentless waves of incoming shadows. As you play, you can collect themed power-ups inspired by real-world events that happen in June, providing dynamic and engaging gameplay that also serves to celebrate these occasions!

✨ Features

Three Difficulty Modes: Tailor the challenge with Easy, Medium, and Hard modes.
Eight Unique June-Themed Power-Ups: Collect special abilities inspired by real-world June events.
Educational Event Banners: Learn about historical and global events as you play.
Combo & Multiplier System: Chain successful defenses for…

View on GitHub

How I Built It

Stack

Tool	Role
React 18 + Vite	State-driven menus, difficulty screens, view transitions
TypeScript	Strict interfaces for every game entity — shadows, particles, projectiles, power-up state queues
HTML5 Canvas API	All gameplay rendering: vector movement, glow gradients, rotation, collision math
Tailwind CSS	Warm golden UI palette, HUD modules, responsive layout
Framer Motion	Spring animations on cards, start screen, and slide-in educational banners
Web Audio API	Real-time synthesized sound — zero audio files, everything generated from raw math
LocalStorage	Persistent high score across sessions

No audio files. No sprite sheets. No external game engine. Everything is generated by code.

Eight Power-Up Physics Systems in One Canvas Loop

The real technical challenge was coordinating eight completely distinct active power-up states simultaneously inside a single high-performance canvas rendering loop — without dropping frames.

Each power-up has its own physics:

World Cup Shot — real bounce vectors off arena boundaries, velocity preserved on each wall collision
Sushi Cannon — heat-seeking projectiles that recalculate their target vector every frame toward the nearest shadow
Bike Day Zoom — cursor drag velocity converted into a sweeping collision radius in real time
Pride Wave — gradient arc swept across the full canvas width in a single render pass
Solstice Crown — radial freeze state with simultaneous petal particle system running independently

Getting all eight coexisting — multiple power-ups active at once without corrupting each other's render cycle — required a clean TypeScript state queue where each power-up owns its update and draw functions independently.

The Proudest Feature: The Web Audio Synthesizer

Every sound in the game — smash pops, power-up sparkles, damage thuds, the freeze sweep, combo escalation tones — is generated in real time using the Web Audio API. No audio files anywhere in the project.

The detail I'm happiest with: the combo pitch system. As your streak climbs, the base frequency of each smash sound rises. A 3x combo sounds different from an 8x combo. The game literally sounds faster and more intense as you get better at it, without a single pre-recorded file.

The Decoy System (Why It's Not Just a Clicker)

Purple decoy pulses mix into the shadow waves. Clicking them costs score and breaks your combo streak. This one mechanic is the difference between an arcade game and a spam-clicking toy — it forces players to actually look before they tap. On Hard mode (Endless Twilight), decoy frequency increases significantly. That's what makes Hard genuinely hard rather than just fast.

Three Difficulty Modes as a Design Requirement

A jam game that only works for one skill level loses half its potential audience.

Easy (Golden Morning) — 3 shield charges, slower shadows, frequent drops. Lets casual players experience all eight power-ups before they're overwhelmed.
Medium (Solstice Balance) — Standard everything. The intended experience.
Hard (Endless Twilight) — Fast shadows, rare drops, 1.5x score multiplier. A real reflex test.

Every Power-Up Is a Real June Event

Power-Up	June Event	Effect
🌞 Solstice Crown	Summer Solstice — June 21	Freezes all shadows 5s, golden rays burst, petals drift
🌈 Pride Wave	Pride Month — all June	Rainbow arc clears every shadow on screen
✊ Juneteenth Burst	Juneteenth — June 19	2x score multiplier 10s, confetti explosion
⚽ World Cup Shot	FIFA World Cup 2026	Bouncing soccer ball ricochets, deflects shadows
🍣 Sushi Cannon	International Sushi Day — June 18	Heat-seeking sushi auto-targets shadows for 8s
🧘 Yoga Pose	International Yoga Day — June 21	3s full pause, sun meter fully restored
🎵 World Music Day	Fête de la Musique — June 21	Musical notes radiate outward, destroy on contact
🚲 Bike Day Zoom	World Bicycle Day — June 3	Cursor becomes a high-speed shadow-sweeping trail

June 21 — the jam deadline — is the most power-up-dense day in the calendar. The Solstice, Yoga Day, and World Music Day all fall on the same date. If you're playing Solstice Panic! on submission day, every event you're collecting as a power-up is happening in the real world at the same time.

Prize Category

Best Google AI Usage

Solstice Panic! was built using multiple Google AI tools throughout the entire development lifecycle—from ideation and UI generation to coding, debugging, and deployment.

Google AI Studio + Gemini 2.0 Flash

Google AI Studio served as my primary development environment and Gemini 2.0 Flash acted as a collaborative engineering partner throughout the project.

I used Gemini to:

Design the game architecture and entity systems
Structure TypeScript interfaces and state management
Implement collision detection and power-up mechanics
Debug gameplay logic and rendering issues
Optimize Canvas performance
Generate educational content for June celebration banners
Refine game balance across Easy, Medium, and Hard modes

Rather than generating code once, development was highly iterative. I would describe a gameplay mechanic, test the implementation, provide feedback, and refine the system through multiple cycles until it behaved exactly as intended.

Stitch

I used Stitch to rapidly prototype and generate portions of the game's interface and visual structure.

This accelerated the transition from concept to playable experience, allowing me to spend more time refining gameplay systems, accessibility, animations, and overall player experience.

Gemini CLI

Gemini CLI became part of my development workflow for code generation, debugging, refactoring, and feature implementation.

Complex systems such as:

Multi-state power-up management
Projectile targeting logic
Audio synthesis architecture
Difficulty scaling systems
Persistent score storage

were developed and refined through an iterative CLI-assisted workflow.

Cloud Run Deployment

The final game was deployed using Google Cloud Run, allowing the application to be packaged and delivered as a scalable web experience.

Using Cloud Run simplified deployment and provided a production-ready environment for sharing the game with judges and players.

Why Google AI Was Meaningful

The scope of Solstice Panic! is significantly larger than a typical jam project:

Eight unique power-up systems
Three difficulty modes
Dynamic combo mechanics
Educational event banners
Particle effects
Canvas-based physics systems
Real-time audio synthesis
Persistent high-score tracking

Google AI tools helped transform a large idea into a finished, polished experience within a limited jam timeline.

Most importantly, they accelerated experimentation.

Instead of spending hours searching documentation or rebuilding systems from scratch, I could focus on design decisions, gameplay feel, player experience, and creative direction.

Google AI didn't replace development—it amplified it.

Solstice Panic! is the result of combining creativity, engineering, and Google's AI ecosystem to build a complete game experience in just a few days.

Built for the June Solstice Game Jam 2026. The sun held; WON. The shadows didn't win.

Havendew — I Turned My Childhood Journal into an AI App

Aashita — Sat, 06 Jun 2026 15:04:39 +0000

This is a submission for the GitHub Finish-Up-A-Thon Challenge

What I Built

Havendew is an AI-powered journaling system built around one idea: your journal should learn who you're becoming,
not just store what you wrote.

I've journaled for as long as I can remember, thus this project is very personal to me. It started the way it does for most people — writing about my day, one entry at a time on paper.

But somewhere along the way, it stopped being that.

I started watching self-improvement videos and something shifted. I started writing differently. Not just recording my day but interrogating it. What did I learn today? What patterns am I repeating? What do I actually want? Then gratitude sections. Then goal reviews. Then affirmations written out by hand every single day. Then manifestation -scripting my future in present tense like it had already happened. Then identity tracking. Then shadow work. Then weekly reviews where I'd read
everything back and ask: what was I avoiding? What emotion was actually running things this week?

Every few months I'd tear the system apart and rebuild it. Keep what worked. Remove what felt performative.

Eventually I had something that wasn't really a journal anymore. It was a personal operating system. One I'd spent years building prompt by prompt, section by section. And for years I had one thought sitting quietly in the back of my mind: what if this wasn't trapped in notebooks?

A while ago I built the first version of that idea. It was basic — you could write an entry and save it. That was it. It worked technically but it had none of the soul of the actual system. Life got busy. The project sat there unfinished.

The GitHub Finish-Up-A-Thon gave me the reason to go back.
Not to polish the basic version. To finally build the
real one.

The Comeback Story

Before:
A single text input. A save button. An entries list
that said "No previous entries yet."

That was the entire app.

After:

The design went from plain HTML to a full premium system — warm cream gradient canvas, liquid glass bento cards, forest green Soul Report that feels like a completely different room, editorial Cormorant Garamond
typography, polaroid photo memories, affirmation cards with soft gradients.

The daily entry became a 5-section scrapbook workspace:

Daily Gratitude with quick chips and photo memories
Today's Chapter — the main free-writing canvas with rich text, stickers, and auto-save drafts
What I Learned — lesson extraction
Daily Affirmations — rendered as beautiful gradient cards
Extra Prompts drawer — shadow work, pattern analysis, identity proof, emotional analysis hidden behind a collapsible toggle

The AI layer appears only after you save — never before. It generates: Guidance From Today (warm, specific to your actual words), Growth Pattern, Tomorrow's Invitation, Gentle Affirmation, and a Letter From Your Future Self. All from Google Gemini reading what you actually
wrote, not a generic template.

The Soul Report lives on a full forest green canvas (#24352C). It reads like a letter. Four sections with colored left borders: 🌿 Growth Pattern, 🍃 Hidden Root, 🌾 Gentle Truth, 🌱 New Growth. Three personalized prompts for next week. A closing affirmation in gold Cormorant serif. It feels like receiving a letter from the wisest
version of yourself.

The Character Arc groups entries by week, calls Gemini to name each era in 3-4 words (Quiet Progress, Builder Era, Finding My Rhythm), and displays them as a vertical timeline with proof tags and dominant mood.

The Visual Trajectory shows 4 charts from your actual data: mood flow line chart, journaling rhythm bars, moments of growth area chart, dominant moods.

The Identity tab is a scrapbook profile with your initial, desired identity badge, streak counter, 4 operating modes (Builder, Study, Reset, Career), goals tracker, and JSON export.

The most important decision in the entire build:

Writing first. AI second. Always.

Every AI journal I've tried interrupts the writing with prompts and analysis before you've written one honest sentence. Havendew waits. You write your gratitude, your chapter, your lessons, your affirmations. You close the day. Only then does the AI speak.

That one decision made Havendew feel less like software and more like the journaling system I actually wanted to exist.

Demo

GitHub: https://github.com/aashitanegii/journalAI

Live Demo: https://havendew-881393616978.us-central1.run.app/

Demo YT Video https://youtu.be/ecBR1ldITy8

My Experience with GitHub Copilot

Copilot didn't just help me write code faster. It helped me actually finish this time.

1. The schema architecture & Firebase migration
My entry model needed nested morning/evening blocks, identity proof arrays, pattern intelligence tags, anti-waste metrics, and an auto-calculated efficiency score. I described the structure in plain English. Copilot helped me seamlessly migrate the entire architecture from a local database to a scalable Google Cloud Firestore structure, writing the queries and the efficiency score calculation logic on the first attempt.

2. The AI prompt engineering
To return reliable structured JSON — shadow prompt, pattern note, morning/evening gap detection, affirmation — required very precise system prompting. The tone had to be right: not a therapist, not a chatbot, but the wisest version of the user talking back. Copilot helped iterate the system prompt for gemini-1.5-flash until the output was consistent enough to render directly in the UI.

3. The bento dashboard
Four distinct card types — glass, dark, warm amber, sage — in a responsive grid, each with its own visual language. Copilot generated all four variants and the grid composition cleanly. I spent my time on the
content inside the cards.

4. The weekly report aggregation
Compiling 7 days of nested entry objects into a single weighted prompt for the Soul Report was the most complex logic in the codebase. Copilot wrote the array-mapping function and suggested weighting shadow work entries more heavily than micro-wins when building the summary — which actually improved the AI output quality.

5. The character arc era naming
The entries list groups entries by week and asks to name each era in 3-4 words. Copilot generated the grouping logic and the API call structure in under 10 minutes. It's the feature people will remember most
and it almost didn't make the build.

6. The Visual Trajectory charts
Four stacked Recharts graphs with real data streaming directly from Firestore. Copilot generated the data-mapping functions that convert raw entry arrays into the shape Recharts expects — mood strings mapped to numeric values, entries grouped by week for the consistency bars, and cumulative proof counts for the area chart.

Without Copilot the gap between "basic text box app" and "full AI identity journaling system" would have been too wide to close in 6 days.

Tech Stack

Frontend: React 18 + Vite + TailwindCSS + Recharts
Backend: Node.js + Express.js REST API
Database: Firebase Firestore
Auth: JWT + bcryptjs
AI: Google Gemini API (gemini-1.5-flash)
Deployment: Google Cloud Run
Design: Cormorant Garamond + Inter, custom glass card system, forest green Soul Report palette

I've kept journals in notebooks, apps, notes apps,
Google Docs, voice memos — everything. None of them
did what I actually needed.

Building Havendew felt less like shipping a project
and more like finally giving the system a home it
deserved.

It took years of notebooks to figure out how to
journal well.

It took 6 days and GitHub Copilot to build it.

Google's "Budget" Model Just Beat Its Own Flagship. Here's What That Actually Means for Developers.

Aashita — Fri, 22 May 2026 10:56:30 +0000

This is a submission for the Google I/O Writing Challenge

Flash models were supposed to be the budget option.
Faster and cheaper — that was the deal. You
used Flash when you needed speed and could tolerate a
quality tradeoff. You used Pro when the task actually
mattered.

At Google I/O 2026, Gemini 3.5 Flash beat Gemini 3.1 Pro
on every benchmark that matters for building agents.

That sentence should sound impossible. It isn't. And
understanding why it happened tells you everything about
where AI development is actually heading right now.

The Numbers First

Benchmark	What it measures	3.1 Pro	3.5 Flash
MCP Atlas	Tool-use reliability	78.2%	83.6%
Terminal-Bench 2.1	Agentic coding	70.3%	76.2%
GDPval-AA	Long-horizon tasks	1314 Elo	1656 Elo

Not close. Not one cherry-picked test. Across coding,
tool-use reliability, and long-horizon task completion —
the three things that actually matter if you are building
something real — the cheaper model won.

It also runs 4x faster than comparable frontier models
and costs roughly half as much.

Artificial Analysis put 3.5 Flash alone in the top-right
quadrant of their Intelligence vs Speed index. The only
frontier model right now combining top-tier intelligence
with exceptional speed.

Why This Happened

The old assumption was that intelligence scaled with model
size. Bigger parameters, better reasoning, end of story.

3.5 Flash breaks that assumption because it was not
optimized for general intelligence. It was optimized
specifically for tool use, multi-agent coordination,
and live environment execution.

The benchmarks it beats 3.1 Pro on are exactly those
benchmarks. The one benchmark where 3.1 Pro still leads
is long-context retrieval — passive reading, not active
doing.

Google essentially asked: what does a model need to be
good at to power the agentic era? Then they built
directly toward that target instead of chasing a general
leaderboard. The result is a model that is purpose-built
for the exact moment we are in.

What This Feels Like to Actually Build With

Let me make this concrete.

You are working on a mid-sized Node.js project. You have
route files, a database schema, authentication middleware,
environment config, and architectural decisions documented
in a README from six months ago.

Previously you copy-pasted relevant pieces into a chat
window and hoped the model could infer the missing
connections. You were the context manager, manually
bridging gaps the model could not hold.

3.5 Flash has a 1,048,576 token context window —
roughly 786,000 words. You paste everything once. Then
you just talk:

My checkout flow is failing silently on orders over $500.
It works fine below that. Walk me through what could
cause this given everything you can see.

The model sees your payment middleware, database schema,
route handlers, environment variables, and error logging
simultaneously. It does not need you to guess which file
is relevant. It knows.

That is not faster at the same thing. That is different
in kind.

The FinOps Detail Nobody Is Talking About

Context caching on 3.5 Flash costs $0.15 per million
tokens — a 90% discount from the standard $1.50 input
price.

For agent loops this changes the production math entirely.
The expensive part of running persistent agents is not
generating responses. It is re-sending your system prompt,
tool descriptions, and conversation history on every
single turn.

Turn 1: Send 100k tokens of context     → $0.150
Turn 2: Read same context from cache    → $0.015
Turn 3: Read same context from cache    → $0.015
Turn 4: Read same context from cache    → $0.015

A session that cost $1.50 in context fees costs $0.19
with caching. At production scale that difference is not
marginal — it is what makes a product financially viable
or not.

The API Detail Worth Getting Right

3.5 Flash ships with dynamic thinking on by default.
You can control it explicitly:

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-3.5-flash")

# Fast path for simple tool calls
response = model.generate_content(
    "Extract the order ID from this receipt: ...",
    generation_config={"thinking_level": "minimal"}
)

# Deep reasoning for hard decisions
response = model.generate_content(
    "Given this entire codebase, recommend how to "
    "restructure auth for multi-tenancy",
    generation_config={"thinking_level": "high"}
)

Options: minimal, low, medium (default), high.

Use minimal for deterministic tool calls where you need
speed. Use high for planning steps where you need real
reasoning. Your cost and latency per session shifts
significantly depending on how you tune this.

Why Gemini Spark Could Only Exist Now

This is the part that connects everything.

Gemini Spark — the 24/7 personal agent Google announced
at I/O, running on dedicated Cloud VMs, managing your
inbox and calendar while you sleep — could not have
existed six months ago.

Not because the idea was new. Because the economics did
not work. A persistent personal agent holding full context
and calling tools reliably across millions of users would
have required flagship model pricing and still been too
slow to feel responsive.

3.5 Flash's combination of 1M context window, 83.6%
tool reliability, 4x speed, and 90% caching discount
is precisely what unlocks Spark as a product.

Google did not build Spark and then find a model to run
it. They built 3.5 Flash for this exact workload —
always on, context-heavy, multi-step, running at scale —
and Spark is what becomes possible on top of it.

When Google says I/O 2026 is the shift from "prompting"
to "acting" — 3.5 Flash is the technical foundation that
makes acting affordable enough to actually ship.

What Changes for What You Build

The practical takeaway is simpler than the benchmarks
make it sound.

The model tier you reach for by default just changed.

Previously: start with Flash, upgrade to Pro when quality
is not good enough.

Now: start with Flash, stay on Flash unless you
specifically need deep passive long-context retrieval.

For anything involving tool calls, agents, coding
assistance, or multi-step workflows — which is most of
what developers are building with AI right now — 3.5
Flash is not the budget option anymore.

It is the right option.

Breaking the Stateless Curse: Hermes Agent and the Case for Persistent AI Agents

Aashita — Sat, 16 May 2026 16:58:04 +0000

This is a submission for the Hermes Agent Challenge

The most expensive thing most AI agents forget is not your name. It’s the work they just did.

You let an agent spend thousands of tokens learning your environment, inspecting your repository, debugging issues, figuring out project conventions, and working through a useful engineering task—sometimes even discovering a workflow that reliably works. Then you close the session, come back the next day, and much of that context is gone. The repository is still there, but the agent has forgotten what it learned about it, how it solved the problem, and which workflow actually got the job done.

For short-lived tasks, that’s not a huge issue. If all you need is a summary, a SQL query, or a quick browser automation task, stateless execution works fine. But once agents start touching repeated engineering work, automation, or operational workflows, forcing them to rediscover the same solutions over and over becomes an expensive design flaw.

That’s what makes Hermes Agent from Nous Research interesting.

Hermes is not being pitched as just another coding copilot or a chatbot wrapper with tool access bolted on. The more ambitious idea is that successful execution should create reusable operational knowledge. If an agent solves a meaningful problem once, it should not have to relearn the same workflow from scratch the next time.

If that works reliably, it changes what open-source agents can become.

The Stateless Agent Problem

Most current agent frameworks effectively behave like this:

Goal → Plan → Tool Calls → Execute → Return Result → Forget

That lifecycle works surprisingly well—until the work becomes repetitive.

Imagine asking an agent to scan a repository, identify missing license headers, generate patches, run tests, and commit validated changes. A capable system might spend significant time inspecting the filesystem, inferring project conventions, handling failures, and refining its approach before it gets the task right.

Now run that exact same task a week later. Most agents will start from zero as though the previous execution never happened.

The same thing happens with recurring operational issues. If an agent spends twenty minutes discovering that a flaky CI failure came from one dependency mismatch and a bad environment variable, you would reasonably expect that discovery to be reusable. Instead, most systems replay the entire debugging process.

That’s the inefficiency. A human engineer would either remember the pattern or document the solution. Stateless agents generally do neither.

What Hermes Is Trying to Change

Hermes attempts to change that lifecycle by inserting a learning loop. Instead of behaving like a linear sequence, the intended model looks more like this:

Observe → Plan → Execute → Evaluate → Crystallize Skill → Reuse

The important difference is what happens after execution. Rather than treating task completion as the end of the interaction, Hermes evaluates whether the workflow it just used is worth keeping.

Did the task succeed?
Which actions actually mattered?
Was the solution a one-off workaround, or does it represent a reusable operational pattern?

If the answer is yes, Hermes can retain that workflow as a reusable Skill instead of forcing the model to rediscover the same process later. That’s a much more compelling idea than simply preserving chat history.

What a Skill Actually Looks Like

“Procedural memory” sounds abstract until you think about what is actually being stored. Hermes’ approach to procedural workflows is much closer to inspectable skill artifacts than opaque memory blobs, which is a much healthier model than treating memory as hidden vendor state.

Conceptually, a crystallized skill artifact looks something like this:

# repo-license-remediation
version: 1.2

tags:
  - python
  - repository
  - compliance

inputs:
  - repo_path
  - license_header

required_tools:
  - filesystem
  - regex_match
  - file_write
  - terminal
  - git

steps:
  1. Scan Python files
  2. Detect missing headers
  3. Generate patch
  4. Run tests
  5. Commit validated changes

This is fundamentally different from remembering prior prompts or conversation snippets. This is operational knowledge.

Remembering that I prefer concise responses is personalization. Remembering how to safely repair a repository issue is actual capability. That distinction matters.

Procedural Memory Is More Interesting Than Chat Memory

A lot of AI products advertise memory, but most of the time that means conversation continuity, user preferences, or retained prompt context. That is useful, but it does not necessarily make the system better at doing work.

The more meaningful distinction is between remembering facts and remembering procedures. Humans become effective engineers because they internalize repeatable workflows. You do not memorize every exact command forever, but you do remember how to approach a recurring integration issue or how to remediate a familiar failure pattern.

Hermes is aiming much closer to that model. Its architecture can be thought of in three distinct layers:

Working Memory: Short-lived execution state including the current task context, temporary variables, and recent tool outputs. This is standard agent behavior.
Episodic Memory: Longer-lived contextual recall mapping project metadata, user preferences, and prior historical decisions to improve continuity.
Procedural Memory: The interesting layer. It stores reusable workflows like debugging routines, deployment procedures, remediation pipelines, and integration playbooks.

If this layer works well, the system improves with repetition instead of simply remembering conversations.

The Scaling Problem

Persistent procedural memory sounds great until you hit the obvious question: What happens when the agent accumulates hundreds of workflows? Dumping all of them into the context window every time would be terrible for token efficiency, latency, and reasoning quality. A staged retrieval model makes much more sense:

Discovery Stub (~20 tokens): Start with the minimum—just the skill name and a short description to determine relevance.

Example: Python repository license remediation workflow
Signature Layer (~200 tokens): If the workflow looks useful, retrieve expected inputs, required tools, and configuration assumptions to validate applicability.
Blueprint Layer (~1,000+ tokens): Only when the workflow is actually executed do you load the full steps, command sequences, and tool invocation logic.

This is dramatically more scalable than brute-force memory stuffing. One caveat: if you are evaluating Hermes critically, it is worth checking which parts of this are implemented exactly as described today versus which represent broader architectural direction. But conceptually, this is the right shape of solution.

Trying Hermes Yourself

Setup itself looks straightforward, but the more interesting question is not whether Hermes can run—it’s whether repeated tasks actually become smarter.

The baseline setup follows a quick terminal sequence:

curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
hermes

Hermes also supports running beyond the terminal—across messaging interfaces and isolated execution backends—which makes the architecture feel operational rather than purely conversational.

A worthwhile test is giving it something that actually requires multi-step reasoning, like scanning a test repository, identifying missing headers, and generating a local patch. The real validation is whether the second execution feels noticeably cleaner, faster, and less wasteful.

Where Hermes Fits Compared to Other Frameworks

LangChain: Gives developers raw building blocks and enormous flexibility. That is great if you want full architectural control, but it also means assembling everything yourself. Hermes feels more opinionated out of the box.
AutoGen: AutoGen shines in multi-agent conversational workflows, but conversation-heavy systems can become noisy and expensive fast. Hermes feels less focused on agent dialogue and more focused on raw execution workflows.
OpenDevin: Clearly aligned with software engineering automation and workspace environments. Hermes feels slightly broader, aiming at general operational agent behavior.
OpenClaw: Adjacent rather than directly competitive. OpenClaw is strong around orchestration and communication routing, while Hermes is more interesting around procedural learning and self-improving execution.

Infrastructure Design Matters

A powerful agent without execution isolation is a liability. Giving unrestricted shell access to an LLM is not a serious production strategy.

Hermes supports multiple execution backends, including restricted local execution, Docker containers, SSH environments, Singularity, and remote sandbox environments like Modal. This matters because practical automation requires isolation.

A realistic workflow might involve a Slack alert triggering an isolated environment, Hermes validating deployment state, detecting a known failure pattern, applying a learned remediation workflow, and reporting back. That starts looking much less like a toy demo and much more like something operations teams could actually use.

Where This Can Go Wrong

The risks here are real:

Skill Drift: A workflow that worked six weeks ago may be wrong today because dependencies changed, APIs evolved, or CLI flags broke. Without revalidation, procedural memory becomes stale automation debt.
Faulty Generalization: An agent might incorrectly promote a brittle edge-case fix into a reusable standard workflow. That becomes dangerous quickly because repeated incorrect automation is often worse than forcing fresh reasoning every time.
Security Risk: Persistent procedural memory can preserve unsafe commands, environment assumptions, or patterns that risk credential leakage. Any self-improving execution system needs strict governance.

Practical Safeguards

If anyone plans to use something like this in production, the minimum checklist probably includes:

Human approval before newly created skills are allowed into active reuse.
Verification across multiple successful runs before allowing autonomous reuse.
Automated smoke testing, version control, and strict least-privilege execution boundaries.

Without those controls, procedural memory becomes accumulated operational risk instead of useful automation.

Why This Matters for Open Source

The bigger issue here is ownership.

A lot of proprietary AI systems assume persistent memory belongs inside vendor infrastructure. That creates lock-in, opaque automation logic, poor auditability, and painful migration stories.

If an open agent can retain reusable operational knowledge in inspectable, version-controlled artifacts, teams can audit agent behavior, share playbooks, migrate freely, and actually own the workflows the system learns. If your agent discovers something useful, that knowledge should live in infrastructure you control—not disappear into a hosted memory layer.

That may be the more important shift.

Discussion

Would you trust a self-improving agent for non-critical automation today? Why or why not?
What specific safeguards would you require before letting one touch production infrastructure?

The End of Renting Intelligence? Why Gemma 4 Makes Local AI Feel Viable

Aashita — Fri, 08 May 2026 13:43:46 +0000

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

There’s a specific kind of developer anxiety that has nothing to do with bugs. It’s the mental math that happens while using cloud AI tools.

How many requests have I burned?
Will this hit a usage cap?
Do I really want to send these notes, drafts, or code snippets to a third-party server?

For the past year, that has been part of my workflow.
Cloud AI is undeniably useful. But it often feels less like ownership and more like access that can be throttled, billed, or restricted at someone else’s discretion.
That’s why Gemma 4 caught my attention. Not because it’s another flashy model release but because it made capable local AI feel practical.

For students, indie developers, creators, and curious builders, that’s a meaningful shift. The conversation stops being just about access to intelligence. It starts becoming about ownership.

What Gemma 4 Actually Is

Gemma 4 is Google’s newest open-model family, built using the same research foundation behind Gemini but released openly so developers can download, run, fine-tune, and integrate the models into their own workflows.

The current lineup includes four major variants:

Model	Description
Gemma 4 E2B	Lightweight edge model optimized for phones, Raspberry Pi devices, and low-resource systems
Gemma 4 E4B	Balanced model for laptops and local creator workflows
Gemma 4 26B A4B	Mixture-of-Experts (MoE) model designed for efficient reasoning and fast inference
Gemma 4 31B	Large dense model focused on advanced reasoning, coding, and long-context tasks

One of the most impressive parts of Gemma 4 is that even the smaller models support features that used to feel “enterprise-only,” including:

Native multimodal understanding (text + images)
Massive 128K–256K context windows
Efficient quantization for local deployment
LoRA and QLoRA fine-tuning workflows
Strong reasoning capabilities

Instead of treating local AI as a stripped-down compromise, Gemma 4 treats it as a serious development environment.

Which Gemma 4 Model Should You Actually Use?

The smartest way to approach Gemma 4 is not by asking “Which model is best?” but:

“Which model fits my hardware and workflow?”

Here’s the practical breakdown:

Model	Hardware Sweet Spot	Best Use Cases
E2B	Phones, Raspberry Pi, low-RAM laptops	Fast experimentation, lightweight assistants, offline tools
E4B	Standard laptops (8–16 GB RAM)	Writing, research, social content, local copilots
26B A4B	Strong GPUs or cloud boxes	Multi-step reasoning, coding workflows, agent-style systems
31B Dense	High-end GPUs/workstations	Deep reasoning, long-form generation, advanced coding

For my own testing, I intentionally chose the E4B model instead of jumping straight to the larger variants.

Why?

Because I wanted to evaluate Gemma 4 the way most independent developers, students, and creators realistically would—not on expensive infrastructure, but on hardware that feels accessible.

The 31B model is clearly more powerful, and the 26B A4B MoE variant is especially interesting for heavier reasoning workloads. But for writing workflows, research summarization, screenshot analysis, and lightweight experimentation, E4B felt like the most honest test of whether Gemma 4 is actually practical for everyday builders.

That tradeoff matters.

A model can be impressive on paper and still unusable for the people it claims to empower.

Running Gemma 4 Locally in Minutes

One of the best things about Gemma 4 is how approachable the setup process has become. Using tools like Ollama or LM Studio, you can run a capable AI model locally with almost no friction.

For example, using Ollama:

# Run the Gemma 4 E4B model locally
ollama run gemma4:4b

That’s it.

What surprised me most while testing Gemma 4 locally wasn’t just performance. I threw in a mix of rough research notes, screenshots, and an unfinished content outline to see whether the workflow would feel clunky.

It didn’t. It was useful enough that I immediately understood the appeal. But the bigger difference was psychological.

I wasn’t thinking about token usage, request limits, or whether I should save prompts for later. That kind of friction quietly changes how you work. Local AI felt less like a demo and more like an actual tool I could build around.

Why Long Context Actually Matters

A lot of AI announcements focus on benchmark scores.

But in real-world usage, the larger context window might be the most important feature for creators and developers. With Gemma 4, you can:

feed in long PDFs,
analyze entire research collections,
summarize lecture notes,
process large codebases,
or maintain continuity across long conversations.

For students and indie builders, that changes the workflow completely. Instead of constantly compressing information into smaller prompts, the model can work with larger chunks of context naturally.

That makes the interaction feel less fragmented and significantly more useful.

The Most Interesting Shift: AI Ownership

For years, the AI conversation has mostly been framed around access.

Who has the biggest models? Who has the fastest APIs? Who can afford the most compute?

Gemma 4 points toward a slightly different conversation: ownership.
Running capable models locally means you can experiment more freely, protect sensitive work, and build without every workflow depending on a third-party service.

If you're a student, indie developer, or creator working with personal notes, drafts, experiments, or prototypes, that flexibility matters. It changes the relationship. You’re not just consuming AI anymore. You’re shaping how it fits into your workflow.

Fine-Tuning Feels More Accessible Than Ever

Another reason Gemma 4 stands out is how approachable fine-tuning has become. Using LoRA or QLoRA workflows, developers can adapt models using relatively affordable hardware.
For creators, that opens interesting possibilities:

a writing assistant trained on your content style,
a research copilot specialized for your niche,
a local AI assistant customized for your own workflow.

That kind of personalization used to feel reserved for large AI companies. Now it’s increasingly available to independent developers and curious students.

Why This Feels Different

One thing I learned from exploring Gemma 4 is that benchmark discussions are only part of the story.

What changes real workflows is if a model is technically impressive but expensive to use, hard to integrate, or awkward to experiment with, most independent builders won’t actually build around it. Gemma 4 gets something important right: it lowers that friction.

What I Think Gemma 4 Gets Right

The biggest strength of Gemma 4 isn’t just performance.

It’s accessibility.

What this release gets right is accessibility. The future of AI is not only about bigger cloud systems. It’s also about lightweight, efficient models that people can run, study, and experiment with locally.
That shift lowers the barrier to entry for:

students learning AI,
developers building side projects,
creators experimenting with workflows,
and people outside major tech hubs.

And honestly, that part feels exciting because local AI replaces the cloud entirely, it gives more people the ability to participate.

The most exciting thing about Gemma 4 isn’t that it’s the biggest or most dramatic AI release of the year. It’s that it makes capable local AI feel practical for more people.

Students can experiment without enterprise budgets. Developers can prototype without building everything around API dependency. Creators can explore more private, personalized workflows.

That doesn’t mean cloud AI disappears. But it does mean the balance is shifting. And I think that’s where things get interesting. Not when AI feels distant and infrastructure heavy. When it feels accessible enough that more people can actually build with it.

Decoding Democracy: How ELARA is Transforming Election Education Through Specialized AI

Aashita — Fri, 01 May 2026 12:39:12 +0000

Submission for Virtual: PromptWars
In every democracy, participation matters. Yet for many first-time voters, the biggest barrier is not willingness — it is confusion.

Registration steps, verification timelines, polling-day procedures, deadlines, and unfamiliar terminology can make the election process feel intimidating. Many citizens want to participate, but don’t know where to begin.

That is exactly why I built ELARA — the Election Assistance & Resource Assistant.

ELARA is an AI-powered civic education platform designed to make the election journey clear, interactive, and beginner-friendly. Instead of functioning like a generic chatbot, ELARA acts like a smart civic guide: structured, contextual, and focused on helping users take the next right step.

Why Election Education Needs Better Technology

Millions of people have questions such as:

How do I register to vote?
What documents are required?
What happens during verification?
What should I expect on polling day?
What do election terms like NOTA, EVM, or VVPAT mean?

Traditional sources often bury answers inside dense documents or scattered pages. Generic AI tools may respond, but frequently without context, trust signals, or practical guidance.

ELARA was built to close that gap.

The Core Innovation: Intent-Based AI Guidance

Rather than routing every query through one generic AI prompt, ELARA uses Intent-Based Routing powered by Google Gemini.

This allows the platform to understand what kind of help the user actually needs and respond with focused, relevant guidance.

1. Journey Mode

For users at different stages of the voting process:

Not Registered
Registration in Progress
Ready to Vote

ELARA provides personalized next steps, expected timelines, and required documents.

2. Timeline Mode

Users can explore the full election lifecycle:

Registration
Verification
Polling Day
Counting
Results

Each stage is explained in simple language, including what happens, who is involved, and what comes next.

3. Jargon Mode

Complex civic language becomes understandable instantly.

Examples include:

NOTA
EVM
VVPAT
Constituency
Manifesto
Gerrymandering

4. General Guidance Mode

A neutral, non-partisan assistant for broader election process questions.

Built for Trust, Not Hype

Many AI products create false confidence through vague certainty scores.

ELARA takes a better path.

Instead, it uses meaningful trust indicators such as:

Beginner Friendly
Step-by-Step Guidance
Timeline Included
Neutral Educational Response
Official Resource Support

This helps users understand the type of assistance they are receiving without pretending AI outputs are infallible.

Interactive Learning Experience

ELARA is more than a text box.

Guided Walkthrough

Users can launch a complete walkthrough of the election process — from registration to final results.

Quick Learning Chips

Tap common questions or election terms to get instant simplified explanations.

Context-Aware Responses

Guidance changes depending on whether the user is unregistered, registered, or preparing to vote.

Hindi + English Support

Designed for broader accessibility and inclusion across Indian users.

Built Like a Real Product

ELARA combines thoughtful UX with production-grade engineering.

Frontend

React + Vite
Responsive UI
Accessible semantic components
Keyboard-friendly navigation
Reduced-motion support

Backend

Node.js + Express
Google Gemini integration
Intent routing engine
Graceful fallback responses
Caching for performance

Reliability & Security

Rate limiting
Security headers
Input sanitization
Automated test coverage
Cloud deployment readiness

Google Ecosystem

Gemini API
Firebase / Firestore
Google Cloud Run

Why This Matters

Technology should reduce friction during important civic moments.

When voting feels confusing, participation drops.
When systems feel understandable, confidence rises.

ELARA demonstrates how AI can be used responsibly — not to influence political choices, but to improve civic literacy, clarity, and access.

The Bigger Vision

The future of AI is not about assistants that try to do everything.

It is about focused systems that solve real problems well.

ELARA is one example of how specialized AI can strengthen democratic participation by making complex processes easier to understand and easier to navigate.

Live Prototype

https://elara-app-174971475950.us-central1.run.app/

Final Thought

Democracy works best when people understand how to participate.

ELARA’s mission is simple:

Make the election process easier to understand, easier to navigate, and easier to trust.

So honored to be selected as one of the winners for the Earth Day Challenge! 🌍 It was such a blast building this and seeing all the other incredible sustainable solutions. Huge thanks to the DEV team and the sponsors for putting this together! 🚀

Aashita — Fri, 01 May 2026 03:50:43 +0000

DEV Weekend Challenge: Earth Day

Jess Lee for The DEV Team

Apr 30

Announcing the Winners of the DEV Weekend Challenge: Earth Day Edition 🌍

#devchallenge #weekendchallenge

4 min read

Google Cloud NEXT '26 Challenge Submission

Aashita — Wed, 29 Apr 2026 06:33:47 +0000

Google Cloud NEXT '26 Challenge Submission

Aashita

Apr 23

Vibes Don't Scale: Moving from AI Prototypes to Production-Grade Systems

#devchallenge #cloudnextchallenge #googlecloud

4 min read

Vibes Don't Scale: Moving from AI Prototypes to Production-Grade Systems

Aashita — Thu, 23 Apr 2026 03:05:30 +0000

This is a submission for the Google Cloud NEXT Writing Challenge

Last week, I was at Microsoft R&D diving into agentic workflows. As a 4th-semester CS student, I’ve spent the last few months in that "I’m basically a senior dev now" stage of AI development—using natural language prompting to bridge the gap between my ideas and my actual coding ability. It feels like magic until you try to build something that needs to work in the real world, at scale.

My project is CrowdCommand, a crowd-safety platform designed to monitor thousands of fans to prevent crowd crushes. During early tests, I realized that simple prompting hit a hard ceiling. You can't "vibe" your way out of networking lag or data drift when someone’s safety is on the line.

Watching the Google Cloud NEXT '26 keynotes, it finally clicked: the future isn't about "better AI," it's about Agentic Infrastructure. Here is how the new blueprint is helping me move my project from a classroom demo to something production-ready.

The Latency Problem:

In a stadium, if an AI takes five seconds to process crowd with a camera feed, it’s useless. I used to think the "model" was the bottleneck, but it’s actually the data movement.

The announcement of TPU v8i (Inference-optimized) and the broader AI Hypercomputer architecture is the hardware fix I needed. By keeping model weights on-chip, it eliminates the lag of moving data back and forth. But the real star is the Virgo network.

In an agentic system, you don't just have one "brain"—you have a fleet. I have "Gate Agents" and "Emergency Agents" that need to stay in constant sync. Without Virgo’s high-throughput fabric, they end up lagging or talking over each other. Now it turns a collection of scripts into a synchronized Agentic Taskforce.

Solving "Reasoning Drift" with the Knowledge Catalog

The scariest thing in AI is when an agent makes a decision based on stale data. Imagine a safety agent suggesting an evacuation route that is currently blocked because its last "knowledge update" was ten minutes ago.

The Agentic Data Cloud and the new Knowledge Catalog solve this. Instead of my agents "hallucinating" a path, they are now grounded in a live Knowledge Graph of the venue. I’ll start playing with Firebase Genkit to build these flows locally. It allows me to force the AI to verify real-time sensor data before it acts. By utilizing AlloyDB and Lightning Engine for Apache Spark, we can provide agents with durable, stateful memory. It moves the project from a "chatbot" that talks about safety to a "system" that enforces it.

The Underrated MVP: Cloud Run Billing Caps & Event Compaction

While the headlines are dominated by new models, the announcement I think is most overlooked is the addition of Cloud Run Billing Caps.

Let’s be real as a student, the biggest barrier to entry isn't the code—it’s the credit card bill. Experimenting with agentic fleets is notoriously expensive because agents can be very "chatty" with APIs. One recursive loop or a surprise traffic spike can be financially devastating.

For a student founder, these billing caps are the ultimate "Founder Mode" feature. It lets me deploy specialized models (like Gemma 2 via NVIDIA L4 GPUs in Cloud Run) with a hard financial guardrail. But the Developer Keynote introduced a technical partner to this i.e. Event Compaction. This technique manages token limits during long-running agent workflows by summarizing an agent's reasoning, keeping the "intelligence" high while keeping the API costs (and my stress levels) low.

Security as a Guardrail

When you're handling large crowd data, you can't just hope for the best. The integration of the new security agent, 'Whis' into the Agentic Defense framework is a huge relief. It provides autonomous security scans that watch the agent's code to identify attack paths and suggest remediations in real-time. I can focus on the crowd-safety logic while the infrastructure handles the "autonomous guardrails" for the agent's lifecycle.

Orchestration at Scale: ADK, Genkit, and A2A

Today’s Developer Keynote introduced three things that bridge the gap between "coding" and "architecting": Firebase Genkit, the Agent Development Kit (ADK), and the Agent-to-Agent (A2A) protocol.

Genkit and the ADK allow me to move away from messy prompt strings and into modular, code-first agent development. But the real breakthrough is A2A. In CrowdCommand, my "Gate Agents" and "Emergency Agents" can now use A2A to negotiate priorities autonomously—like deciding which gate to open first during an evacuation without waiting for a central server to mediate.

Even more game-changing is the Agent-to-User Interface (A2UI) standard. It allows agents to dynamically generate their own expressive user interfaces on the fly. It means the system can build a tailored emergency dashboard for stadium staff without me writing a single line of CSS. It’s the difference between a scripted sequence and a living, breathing system.

Moving Forward: From Prompter to Orchestrator

The "Agentic Cloud" has shifted my perspective as I head into my 5th semester. I’m realizing that we aren't just building "apps" anymore; we are orchestrating systems of intelligence.

Google Cloud NEXT '26 provided the missing architectural pieces for my project. If you're still just "prompting," you're building for the past. The future is about building Agentic Enterprises that actually reason, act, and scale.

Note: These are my personal reflections on the Google Cloud NEXT '26 Keynotes. CrowdCommand is my ongoing project exploring the intersection of AI and public safety.

devchallenge #googlecloud #cloudnextchallenge #agenticai #agenticenterprise #csstudent

CrowdCommand — AI Powered System to optimize crowd flow and reduce large-scale event waste

Aashita — Sat, 18 Apr 2026 10:40:41 +0000

This is a submission for Weekend Challenge: Earth Day Edition

🌍 What I Built

I built CrowdCommand — AI that predicts crowd chaos and reduces real-world resource waste, it is a real-time system designed to manage large-scale human movement efficiently, predict congestion before it happens, and enable immediate action.

At large events, crowd movement is rarely optimized. People cluster, queues grow unpredictably, and entry points overload.
This doesn’t just cause inconvenience — it leads to:

unnecessary energy wastage
inefficient crowd routing
operational strain on infrastructure
increased resource consumption at scale

Most existing systems react only after congestion becomes visible.

CrowdCommand changes that.

It introduces a system that:

monitors crowd density in real time
predicts congestion before it escalates
generates AI-driven recommendations
enables operators to take instant action

Real-World Impact Potential:

inefficient crowd movement = wasted time, wasted energy, and unnecessary resource usage

By optimizing how thousands of people move through a space, CrowdCommand contributes to:

smoother flow → reduced operational overhead
faster movement → less idle congestion
smarter decisions → more efficient use of infrastructure

At scale, inefficient crowd movement directly translates into:

higher energy consumption (lighting, cooling, operations)
increased idle congestion and emissions
unnecessary infrastructure strain

CrowdCommand reduces this by improving flow efficiency in real time.

Even small optimizations across thousands of people can lead to measurable reductions in energy usage and operational waste during large-scale events.

This project explores how AI-driven decision systems can make physical environments not just smarter—but more sustainable.

🎥 Demo

🔗 Live Deployment (Google Cloud Run):
https://crowdcommand-866673965866.asia-south1.run.app/

The system simulates a fully operational control center with:

🗺️ Live crowd heatmap across 8 zones
🚪 Smart gate optimization (wait time + throughput)
⏳ Virtual queue system (10 concessions)
🧠 AI recommendations (Critical / Warning / Info)
🎛️ Operator action panel with real-time feedback

💻 Code

🔗 GitHub Repository:
https://github.com/aashitanegii/crowdcommand

⚙️ How I Built It

🧩 Tech Stack

Technology	Purpose
React + Vite	Frontend UI
Node.js + Express	Backend API
Socket.IO	Real-time updates
Google Cloud Run	Deployment
Google Gemini	AI advisory generation

🔄 Real-Time Simulation Engine

The system continuously generates:

crowd density per zone
gate wait times and throughput
queue lengths

Updates are pushed via WebSockets every few seconds, ensuring a live operational view.

🧠 AI Decision Layer (Google Gemini)

CrowdCommand integrates Google Gemini to generate real-time operational advisories based on live system data.

Examples:

“Food Court nearing capacity → reroute crowd + open alternate exits”
“Gate congestion detected → redirect to faster entry point”

These are surfaced in the UI as:

AI Advisory (Generated by Gemini)

This transforms the system from passive monitoring → active decision support.

In addition, Gemini was used during development to:

refine system architecture and logic
accelerate backend/API design
assist in UI interaction planning

⚡ Operator Action Loop

AI detects a risk
Recommendation is generated
Operator applies action
System recalculates crowd distribution
Updated state is broadcast instantly

A complete real-time feedback loop.

🎯 Key Features

Live Heatmap — Real-time occupancy + predictive trends
Smart Gates — Fastest entry recommendations
Virtual Queues — Dynamic wait-time simulation
AI Engine — Multi-level alerts and suggestions
Action Panel — Immediate execution + system feedback

🏆 Prize Categories

✅ Best Use of Google Gemini

Gemini API powers real-time advisory generation
AI outputs are contextual, actionable, and integrated into decision-making
Used across both runtime intelligence and development workflows

✨ What Makes This Different

Most dashboards show data.

CrowdCommand makes decisions.

It doesn’t just answer:

“What is happening?”

It answers:

“What should we do next?”

This project goes beyond building interfaces — it focuses on designing systems that:

analyze
predict
respond

in real time.

CrowdCommand is a step toward environments that are not just monitored — but intelligently controlled and optimized for sustainability.

devchallenge #weekendchallenge #ai #googlecloud #gemini #sustainability #webdev

DEV Community: Aashita

The SSE Fragmentation Catastrophe That Took Down CareerPilot AI

The Architecture Under Fire

The First Fix That Made Everything Worse

The Epiphany: We Were Fighting Two Separate Infrastructure Layers

The Real Fix: A Dual-Sided Overhaul

Server Fix: Bypass Proxy Buffering at the Header Level

Client Fix: Throw Away onmessage, Build a Byte Accumulator

The Results

The Engineering Takeaway

Solstice Panic! — Protect the Sun on the Longest Day of the Year

What I Built

Video Demo

Code

aashitanegii / SolsticePanicGame-

☀️ A fast-paced June Solstice arcade game where players defend the Sun using celebration-themed power-ups inspired by real June events.

☀️ Solstice Panic!

🎮 About The Game

✨ Features

How I Built It

Stack

Eight Power-Up Physics Systems in One Canvas Loop

The Proudest Feature: The Web Audio Synthesizer

The Decoy System (Why It's Not Just a Clicker)

Three Difficulty Modes as a Design Requirement

Every Power-Up Is a Real June Event

Prize Category

Best Google AI Usage

Google AI Studio + Gemini 2.0 Flash

Stitch

Gemini CLI

Cloud Run Deployment

Why Google AI Was Meaningful

Havendew — I Turned My Childhood Journal into an AI App

What I Built

The Comeback Story

Demo

My Experience with GitHub Copilot

Tech Stack

Google's "Budget" Model Just Beat Its Own Flagship. Here's What That Actually Means for Developers.

The Numbers First

Why This Happened

What This Feels Like to Actually Build With

The FinOps Detail Nobody Is Talking About

The API Detail Worth Getting Right

Why Gemini Spark Could Only Exist Now

What Changes for What You Build

Breaking the Stateless Curse: Hermes Agent and the Case for Persistent AI Agents

The Stateless Agent Problem

What Hermes Is Trying to Change

What a Skill Actually Looks Like

Procedural Memory Is More Interesting Than Chat Memory

The Scaling Problem

Trying Hermes Yourself

Where Hermes Fits Compared to Other Frameworks

Infrastructure Design Matters

Where This Can Go Wrong

Practical Safeguards

Why This Matters for Open Source

Discussion

The End of Renting Intelligence? Why Gemma 4 Makes Local AI Feel Viable

What Gemma 4 Actually Is

Which Gemma 4 Model Should You Actually Use?

Running Gemma 4 Locally in Minutes

Why Long Context Actually Matters

The Most Interesting Shift: AI Ownership

Fine-Tuning Feels More Accessible Than Ever

Why This Feels Different

What I Think Gemma 4 Gets Right

Decoding Democracy: How ELARA is Transforming Election Education Through Specialized AI

Why Election Education Needs Better Technology

The Core Innovation: Intent-Based AI Guidance

1. Journey Mode

2. Timeline Mode

3. Jargon Mode

4. General Guidance Mode

Built for Trust, Not Hype

Interactive Learning Experience

Guided Walkthrough

Quick Learning Chips

Client Fix: Throw Away `onmessage`, Build a Byte Accumulator