DEV Community: NITHESH SARAVANAN

Solstice Cipher - A Bletchley Park Tribute to Alan Turing

NITHESH SARAVANAN — Sun, 21 Jun 2026 17:48:34 +0000

This is a submission for the June Solstice Game Jam

What I Built

Solstice Cipher is a browser-based codebreaking puzzle game where you play a WWII-era codebreaker at a Bletchley Park-style station, racing to decrypt intercepted enemy transmissions before daylight runs out.

Each level introduces a different real cryptographic technique — starting with a simple Caesar shift, moving into substitution ciphers, and finishing with a Vigenère cipher — all decoded using an in-game "Decrypt-O-Matic" terminal helper that teaches the underlying logic instead of just asking you to guess.

The solstice theme isn't just cosmetic: every level is timed by a daylight meter that drains at different speeds depending on whether the in-game date falls on an "odd" or "even" day, mirroring how the June solstice means radically different day lengths depending on which hemisphere you're in. The in-game calendar counts up toward Day 21 — June 21, the solstice itself, which is also the final cipher level.

After the last cipher, the game shifts gears entirely: you're presented with four short text passages and asked to judge which were written by a human and which were generated by a machine — a direct nod to Alan Turing's 1950 "Imitation Game" proposal. The passages are intentionally ambiguous (one "machine" passage sounds very natural, one "human" passage sounds a little stiff) so the moment actually makes you think, rather than being an obvious gimme.

The game closes with a short, factual dedication to Alan Turing — his codebreaking work at Bletchley Park, his foundational contributions to computer science, and the persecution he faced for being gay — tying the whole experience back to both the historical "Ode to Turing" prompt and Pride Month.

Play it live: https://red-coder-27.github.io/solstice-cipher/

Video Demo

Code

https://github.com/red-coder-27/solstice-cipher

How I Built It

Solstice Cipher is a single self-contained HTML file — no build step, no backend, just vanilla JavaScript, CSS, and the Web Audio API, so it runs instantly in any browser with nothing to install.

I used Google Antigravity to generate the full initial build from a single detailed prompt covering the concept, all four cipher mechanics, the visual day-to-night theming, and the Turing Test ending. From there, I tested every level by hand-verifying the cipher math against the actual encryption logic, which caught a one-character bug in the Level 2 ciphertext that would have made that level mathematically unsolvable — a good reminder that AI-generated puzzle logic still needs to be checked against itself, not just read for plausibility.

A few design choices I'm happy with:

The daylight timer bar literally shrinks faster on "even" in-game days, which makes the solstice's day-length asymmetry something you feel under time pressure rather than just read about.
The Decrypt-O-Matic helper changes shape per cipher type (a shift slider for Caesar, fill-in-the-blank pairs for substitution, a tabbed lookup grid for Vigenère) so each level teaches its own cipher instead of reusing one generic input.
The Turing Test ending has no timer and gives a score bonus regardless of how many you get right — it's meant as a reflective moment, not a fail state.

Prize Category

Submitting to both:

Best Ode to Alan Turing — the entire back half of the game is a direct tribute: a literal Turing Test mechanic at the climax, followed by a factual dedication to his codebreaking legacy and his place in LGBTQIA+ history during Pride Month.
Best Google AI Usage — the full game was built end-to-end using Google Antigravity from a single structured prompt, then debugged and polished afterward.

I Finally Shipped FlowDesk — My All-in-One Productivity Dashboard Built with GitHub Copilot ⚡

NITHESH SARAVANAN — Fri, 05 Jun 2026 18:35:56 +0000

This is a submission for the GitHub Finish-Up-A-Thon Challenge

What I Built

FlowDesk is a fully offline, production-quality productivity dashboard that combines three tools I always wanted in one place — a habit tracker, a Pomodoro focus timer, and a Kanban task board — all in a single beautiful React app with zero backend and zero accounts required.

🔗 Live Demo: https://flow-desk-lovat.vercel.app/
💻 GitHub: https://github.com/red-coder-27/flow-desk

Everything runs entirely in your browser via localStorage. Your data never leaves your device.

Core Features

🎯 Habit Tracker

GitHub-style 84-day contribution heatmap
Streak tracking with fire badges 🔥
Emoji + color customization per habit
Confetti celebration when you hit 100% for the day 🎉
Daily/Weekdays/Weekends frequency options ⏱️ Pomodoro Focus Timer
Animated SVG countdown ring with glow effect
Web Audio API chimes — no audio files needed
Session history log with weekly focus stats
Keyboard shortcuts: Space / R / S from any page
Auto-switches between work and break sessions 📋 Kanban Task Board
Full drag-and-drop via @dnd-kit (mouse + touch)
Priority badges: 🔴 High / 🟡 Medium / 🟢 Low
Live search + priority filter
Three columns: To Do → In Progress → Done 📊 Unified Dashboard
Real stats pulled from all three modules
Weekly focus bar chart (Recharts)
Daily motivational quote
Quick-action buttons to jump into any module And more: Dark/Light/System theme, PWA installable, full keyboard shortcuts, data export/import, mobile bottom nav, glassmorphism UI.

Demo

🚀 Try FlowDesk Live →

Works best in Chrome. Install as a PWA for the full experience (look for the Install button in the top nav).

Screenshots:

Loom walkthrough video here:
https://www.loom.com/share/f3c750d782694baf876229ab598695dc

The Comeback Story

Where It Started (The "Before")

I originally started FlowDesk about 6 months ago during a weekend hackathon. The idea was simple: I was tired of switching between three different apps — one for habits, one for a Pomodoro timer, one for tasks. I wanted them all in one dashboard.

What I had after that hackathon:

A broken timer that reset on every page navigation
A habit list with no streak logic whatsoever
A Kanban board with hardcoded placeholder tasks that couldn't be deleted
Zero mobile support
An index.css file that was 47 lines of chaos I pushed it to GitHub, opened 11 issues I never closed, and abandoned it. The repo sat untouched for months.

What Changed (The "After")

When the Finish-Up-A-Thon challenge dropped, I knew FlowDesk was the project. Here's what I shipped in this revival:

Before	After
Timer reset on navigation	Timer persists globally via React Context
No streak logic	Bulletproof streak calc across timezone boundaries
Hardcoded tasks	Full drag-and-drop Kanban with localStorage
No heatmap	Pixel-perfect 84-day GitHub-style heatmap
Desktop only	Fully responsive + PWA installable
0 animations	Glassmorphism, micro-animations, confetti
Broken build	Clean Vercel deploy, 0 console errors

The biggest technical challenge was the timer persistence — React state resets on component unmount, so navigating away killed the timer. The fix was lifting all timer state into a React Context that wraps the entire app, using a useInterval custom hook that only lives at the context level. Once I understood that, everything clicked.

The heatmap was the most satisfying piece to build — calculating 84 days of data, mapping completions across all habits per day, and rendering it in an SVG grid with correct month labels and tooltips took way more thought than I expected.

My Experience with GitHub Copilot

GitHub Copilot was the difference between "I'll finish this someday" and "it's shipped."

Where It Helped Most

1. Boilerplate elimination
The moment I described the useHabits hook structure in a comment, Copilot generated the entire localStorage read/write pattern, the streak calculation logic, and the heatmap data transformation in one autocomplete. What would've been 45 minutes of typing was done in 3.

2. The streak algorithm
I described what I wanted in plain English as a comment:

// Calculate current streak: consecutive days ending today or yesterday
// Use local timezone toDateString() comparison, NOT UTC timestamps

Copilot wrote the correct algorithm on the first try, including the edge case where today isn't checked yet (streak = consecutive days ending yesterday). I verified it, it was right.

3. Web Audio API sounds
I had zero experience with the Web Audio API. I described "ascending 3-tone chime using OscillatorNode, no audio files" and Copilot generated a working playWorkComplete() function using AudioContext, GainNode, and scheduled oscillator timing. I tested it — it played a perfect chime.

4. SVG timer ring
The animated stroke-dashoffset trick for the circular countdown was something I knew conceptually but hadn't coded before. Copilot filled in the exact math:

const circumference = 2 * Math.PI * radius
const offset = circumference * (1 - progress)

...and wired it to the timeLeft state automatically.

5. Unsticking moments
Whenever I hit a wall — like the @dnd-kit DragOverlay not rendering correctly, or the confetti only firing on refresh — I described the bug to Copilot in a comment and it suggested the fix. The DragOverlay issue was a missing createPortal wrapper. Copilot caught it immediately.

What I Still Had To Do Myself

Copilot isn't magic. I had to:

Review every generated function for correctness (especially date logic)
Architect the Context structure myself before Copilot could fill it in
Write the validation prompt to catch bugs across the whole app
Make design decisions about layout and UX
Test everything manually and catch edge cases The mental model I settled on: Copilot is a very fast junior developer who needs clear instructions and code review. When I gave it precise comments and clear function signatures, it was extraordinarily fast. When I was vague, it guessed wrong.

Tech Stack

React 18 + Vite — UI + build
@dnd-kit — Drag and drop
Recharts — Weekly analytics chart
Lucide React — Icons
react-hot-toast — Notifications
canvas-confetti — Habit celebrations
vite-plugin-pwa — PWA + offline support
Web Audio API — Timer sounds (no files!)
localStorage — All persistence, zero backend

What's Next

Cloud sync option (optional, privacy-first)
Habit templates (Morning Routine, Fitness, etc.)
Timer integrations with task list (focus on specific task)
Browser extension version

Thanks for reading! If you try FlowDesk, I'd love to hear what you think.
Drop a comment or a ❤️ if the heatmap made you smile.

The Agent That Actually Remembers You: A Deep Dive into Hermes Agent published

NITHESH SARAVANAN — Sun, 31 May 2026 14:00:06 +0000

This is a submission for the Hermes Agent Challenge

I'll be honest — I was skeptical.

Every few weeks something in the AI space gets described as "turning heads quietly" and it turns out to be a wrapper around the OpenAI API with a nicer README. So when Hermes Agent started showing up in my feeds with that kind of language, I filed it under probably fine, not urgent and moved on.

Then it crossed 95,000 GitHub stars in seven weeks. That's not hype-shaped. That's word-of-mouth shaped. So I actually installed it.

This post is about what I found.

The problem it's solving (which is real and annoying)

Here's a thing that happens to me constantly. I open an AI assistant, spend the first few minutes re-establishing context — my stack, my project names, my preferences — get into a groove, do good work, close the session. Next day: blank slate. I'm typing the same paragraph again.

It's not a dealbreaker. It's just... friction that compounds. Over weeks it starts to feel like working with a very talented colleague who has anterograde amnesia. You like them. You just have to brief them every single morning.

Most agents acknowledge this problem by shipping a vector database and calling it "long-term memory." Which is fine. Vector search is genuinely useful. But it's passive — you query it, it retrieves, nothing actually changes. The agent doesn't learn anything. It just has better notes.

Hermes Agent is built around a different idea entirely.

What actually makes it different

The core insight is distinguishing between two kinds of memory:

Episodic — what happened in past conversations. Hermes stores this with SQLite FTS5 (full-text search), not vector embeddings. That sounds like a step backward and I thought so too at first. But keyword search has real precision advantages for the stuff that actually matters in developer workflows: project names, service names, variable names, team-specific terminology. If I mention "Gatekeeper" (our auth service) in session 12, Hermes finds it when I mention it in session 89. Vector search would find semantically similar things, which is sometimes what you want and sometimes really not.

Procedural — how to do things. This is the part I hadn't seen before. After completing a multi-step task, Hermes can convert that workflow into a Skill — a concrete, versioned, human-readable procedure saved to disk. Next time a similar task comes up, it loads the Skill, runs through the steps, and if something goes better or worse than expected, it updates the Skill accordingly.

This is closer to how I actually retain knowledge than any AI memory system I've used. I don't re-derive my Docker deployment process from first principles every time. I have a procedure. I refine it when it breaks. I get faster. Hermes is doing a version of that.

There's also a user modeling layer via something called Honcho that builds a representation of who you are across sessions — your preferences, communication style, work context. I haven't been running it long enough to have strong opinions on this part yet, but the architecture makes sense.

Installation (genuinely fast)

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

That's it for Linux/macOS/WSL2. There's a PowerShell version for Windows in early beta. After installing:

hermes setup --portal

One OAuth flow and you have a model plus web search, image gen, TTS, and browser access. No juggling four separate API keys.

hermes chat

I had it running in under three minutes, which is not always how these things go.

The infrastructure stuff that actually matters

One thing I didn't expect to care about but do: Hermes is designed to run somewhere other than your laptop.

It supports six backends — local, Docker, SSH, Daytona, Singularity, and Modal. The Daytona and Modal options are serverless, meaning the environment hibernates when idle. Practically what this means: you spin it up on a cheap VPS, connect to it from Telegram, and the agent is doing work on a machine you never SSH into. Your laptop is just the interface.

Most agents I've used are tethered to wherever I'm sitting. Close the laptop, the agent stops. Hermes is a persistent process that happens to communicate with you — a subtle but real difference for anything involving longer workflows.

The messaging integrations lean into this: CLI, Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Teams, Google Chat... 20+ platforms. You configure once. I've been using it from Telegram while commuting and it's a genuinely different experience from "AI assistant that only exists when I'm at my desk."

The parallelization thing

Hermes can spawn isolated subagents — separate processes with their own terminals and contexts — to run workstreams in parallel. Research task, data pipeline, and file conversion simultaneously, each sandboxed.

I want to be careful not to oversell this because I've only tested it in limited ways, but the architecture is sound. And it's where the word "agent" starts to feel earned rather than marketing.

The thing that actually tripped me up

One thing worth knowing before you dive in: the first time I let it create a Skill automatically, it over-generalized. I'd asked it to pull a summary of my GitHub issues and it turned that into a Skill called something like "fetch repository data" — which then tried to apply that same approach when I later asked about a completely different repo with different auth. Took me a minute to figure out why it was behaving weirdly.

The fix was easy — Skill files are just readable text on disk, so I went in, renamed it to something specific, and tightened the scope. But it wasn't what I expected. Lesson I'd pass on: watch what it names Skills in the first few sessions and rename the vague ones early. A Skill called "fetch repository data" will haunt you. A Skill called "fetch open issues from acme-api repo" will not.

Where I'd actually use it vs. not

Being straight with you:

Good fit:

You want something that genuinely compounds value over weeks of use
You need a private, self-hosted setup — no telemetry, your data stays on your machine
Long-horizon tasks where context continuity matters
You're already living in Claude Code and wish it had cross-project memory

Probably not the right fit:

You want a quick throwaway session with zero setup cost — just open a chat tab
Your workflow is entirely in-IDE
Domains where you can't reliably judge output quality — and this matters more than it sounds

That last point is worth being honest about. The self-improvement loop is only as good as the feedback it gets. If you're working in a domain where you can't confidently tell when the agent's output is correct, the Skills system can make it faster at doing the wrong thing. Hermes gives you control. It can't make you exercise it.

Why it matters beyond the product

The Stanford HAI AI Index 2026 made a point that stuck with me: agents moved from question answering toward task completion in 2025, but still fail about a third of attempts on structured benchmarks. On OSWorld specifically, accuracy went from ~12% to 66.3% — within six points of human performance.

What that trajectory suggests is that the bottleneck is increasingly not raw model intelligence. It's memory, orchestration, recovery from failure, and repeatability. Which is exactly what Hermes is designed around.

There's also just a values bet embedded in the whole project — MIT license, local-first, readable skills you can inspect and modify, no cloud lock-in. Whether or not Hermes wins the agentic framework wars, that design philosophy being competitive is something worth rooting for.

Try it yourself

# Install
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

# Set up model + tools
hermes setup --portal

# Talk to it
hermes chat

Give it real context about your work on day one. Let it run something non-trivial. Come back two sessions later and notice what it already knows.

That's the test that settled it for me.

Hermes Agent docs · GitHub · Nous Research

Gemma 4 Runs on a Raspberry Pi. Let That Sink In.

NITHESH SARAVANAN — Sat, 23 May 2026 18:52:26 +0000

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

I've been thinking about AI wrong for the past two years.

Not completely wrong. But there's an assumption I'd quietly accepted without realizing it — that serious AI models require serious hardware. That if you wanted something capable enough to reason, handle images, work through multi-step problems, you needed a cloud API, or at minimum a machine with a real GPU. That the tradeoff between capability and accessibility was fixed: more of one meant less of the other.

Gemma 4 broke that assumption for me. And the thing that broke it wasn't a benchmark number or a blog post. It was one sentence buried in the release notes:

The E2B model runs on a Raspberry Pi 5.

What Gemma 4 Actually Is

Google DeepMind released Gemma 4 on April 2, 2026, under an Apache 2.0 license — fully open-weight, commercially usable, no strings attached. It's built from the same research as Gemini 3, which is a meaningful statement: this isn't Google's B-team effort. It's frontier-level research packaged for the open ecosystem.

The family ships as four distinct variants, each targeting a different tier of hardware:

Model	Architecture	Active Params	Target Hardware
E2B	PLE (edge)	~2.3B	Raspberry Pi, smartphones
E4B	PLE (edge)	~4.5B	Mobile devices, laptops
26B A4B	MoE (8 of 128 experts)	~4B active	Consumer GPU
31B	Dense	30.7B	Workstation / multi-GPU

Every single one of them is multimodal from the ground up — text, images, video. The smaller models also handle audio natively. Context window is 128K for the edge models and 256K for the larger ones. All support function calling, multi-step reasoning, and configurable thinking modes.

That's the table stakes. Here's what's actually interesting.

The Part That Stopped Me

The E2B model runs in approximately 1.5 GB of RAM at INT4 quantization.

A Raspberry Pi 5 with 8 GB of RAM can run it. Google published numbers confirming this — it's not theoretical, it's a tested deployment target.

I keep coming back to what that actually means. A Raspberry Pi 5 costs around $80. It's a credit-card-sized single-board computer. And it can now run a multimodal AI model that handles text, images, and audio, with a 128K context window, with reasoning capabilities, offline, with no API calls, no cloud dependency, no subscription.

For the past two years, "run AI locally" has been something that required either a powerful laptop or a desktop with a dedicated GPU. The conversation around local AI models has largely been for people with the hardware to run 7B+ parameter models comfortably. E2B changes who that conversation is for.

Why This Is a Bigger Deal Than the Benchmarks

I want to be honest about something: the benchmarks for Gemma 4 are genuinely impressive, but I don't think they're the most important part of this release.

The 31B model sits at #3 on Arena AI's open model leaderboard (ELO 1452 as of release), scores 89.2% on AIME 2026, and hits 80% on LiveCodeBench. Those are strong numbers that hold up against Qwen 3.5 and Llama 4 Scout in most categories.

But benchmark performance is something you read and nod at. The Raspberry Pi deployment is something that changes what you can build.

Think about the categories of projects that become possible when a capable multimodal model can run locally on $80 hardware:

Privacy-first applications. Medical data, personal journals, private documents — things you'd never send to a cloud API. A model that runs entirely on your own device means sensitive data never leaves it.

Offline-first tooling. Field work, remote locations, environments with unreliable connectivity. A capable AI model that works without internet is a different category of useful.

Embedded systems. NVIDIA Jetson devices, edge computing nodes, IoT hardware with more capability than a microcontroller. Gemma 4 E2B was specifically built with these targets in mind.

Projects for students and developers in contexts where cloud API costs are a real barrier. $80 hardware is still not nothing. But it's a different category of accessible than "pay per token forever."

How the Four Variants Actually Fit Together

After spending time with the model card and the community documentation, here's how I'd think about which variant to reach for:

E2B — for genuine edge deployment. Raspberry Pi, smartphones, embedded hardware. If your constraint is RAM and you need offline capability, this is the one. Don't reach for it if you're on a laptop — you're unnecessarily limiting yourself.

E4B — the sweet spot for most personal projects and local laptop experimentation. Fits in roughly 5 GB RAM. Strong enough for real tasks, accessible enough to run comfortably on most modern machines.

26B A4B (MoE) — deceptively efficient. It has 26 billion total parameters but only activates around 4 billion per token pass thanks to Mixture-of-Experts routing. If you have a consumer GPU with 8-12 GB VRAM, this is where serious capability starts without the full cost of running a dense model. The MoE architecture means inference is faster than the parameter count suggests.

31B Dense — for when you need the ceiling. All 30.7 billion parameters active on every pass. Highest benchmark scores, highest hardware requirements. Realistically a multi-GPU or high-VRAM workstation setup.

One thing I appreciate about how Google structured this: these aren't just size variants of the same model. The E2B and E4B use Per-Layer Embeddings rather than traditional MoE routing — it's a genuinely different architectural approach to making models efficient at the edge, not just a quantized-down version of the larger model.

What I'm Actually Thinking About Building

I'll be honest: my hardware situation right now is a mid-range laptop and a Raspberry Pi 4 I've had sitting around for two years mostly running Pi-hole. Not exactly a GPU workstation.

Before Gemma 4, my options for local AI were limited enough that I mostly reached for APIs. After reading through the E2B specs, I'm genuinely reconsidering that default.

What I want to explore: a local document analysis tool that processes my own notes and files without any of that data leaving my machine. The 128K context window on E2B means I could feed in reasonably sized documents. The multimodal support means I could include images and diagrams. The offline capability means it works whether I'm connected or not.

That's not a groundbreaking project. But it's exactly the kind of project I'd been mentally filing under "not possible without a paid API" — and Gemma 4 moves it into the "actually try it this weekend" category.

The Honest Caveat

The E2B running on a Raspberry Pi is real, but context matters: Google tested with INT4 quantization, which trades some precision for memory efficiency. Performance at INT4 is lower than at higher precision levels. For tasks requiring nuanced reasoning or precise outputs, you'll notice the difference.

The 31B dense model's hardware requirements are also substantial — up to 19 GB RAM in some configurations. "Runs on consumer hardware" is true for the family, but not uniformly true across all variants.

And the MoE architecture of the 26B, while efficient, behaves differently in practice than a dense model of equivalent active parameter count. Worth benchmarking for your specific use case rather than assuming the numbers translate directly.

Why Open-Weight Matters Here

Gemma 4 ships under Apache 2.0. That's not just a licensing detail — it's what makes the edge deployment story meaningful for the long term.

A proprietary model that runs locally is still a dependency on the vendor's continued goodwill, roadmap, and pricing decisions. An Apache 2.0 model is something you can fork, fine-tune, redistribute, and build on without those constraints. The 100,000+ community variants already built on earlier Gemma models exist because the license made that experimentation possible.

For developers building on top of Gemma 4 — fine-tuning it for a specific domain, integrating it into a product, deploying it at the edge — the license is as important as the capability.

Where I Land

I started this thinking about Gemma 4 primarily as another open model release in a busy year of open model releases. I'm ending it thinking about it as something slightly different: a signal that the capability-accessibility tradeoff in AI is more flexible than I'd assumed.

Running a capable multimodal model with a 128K context window offline on an $80 single-board computer is not the ceiling of what Gemma 4 can do. It's the floor.

That reframe matters for what I think about building next.

Wrote this after going through the Gemma 4 model card, release blog, and community documentation. To get started: Gemma 4 on Hugging Face, Ollama setup guide, or the official Google AI docs.

Google Antigravity 2.0: The End of the IDE Era (And What Comes Next)

NITHESH SARAVANAN — Sat, 23 May 2026 18:41:58 +0000

This is a submission for the Google I/O Writing Challenge

I'll be upfront about something: I've been burned by AI coding tool hype before.

Over the past couple of years, I've tried most of them — Copilot, Cursor, a few others. And my honest experience is that they're useful right up until they aren't. They suggest something plausible-looking, I accept it, and then twenty minutes later I'm digging through the mess they made while I wasn't paying close enough attention. I still end up fixing things manually. The "AI handles it" promise always seems to quietly expire somewhere in the middle.

So when Google I/O 2026 rolled around and the keynote was wall-to-wall AI announcements, I watched with that specific kind of developer skepticism you develop after a few too many over-promised demos.

But then something made me stop scrolling.

The Demo I Wasn't Expecting

Google Director of Software Engineering Varun Mohan stood on the I/O stage and, using Antigravity 2.0's parallel agent system, built a working operating system core from scratch — live. Total compute cost: under $1,000. Then, on top of that OS, he ran a Doom clone.

I sat with that for a moment.

Not because I think I'll be building operating systems with AI agents next week. But because it reframed something for me. The tools I'd been frustrated with were all trying to assist me while I coded. This was something else entirely — agents executing in parallel, autonomously, producing something that would have taken a team of engineers weeks.

Whether that's ready for everyday use or not, the direction it points in is hard to ignore. And the more I looked at what Google actually shipped alongside that demo, the more I think Antigravity 2.0 deserves a more careful look than the usual I/O hype cycle gets.

What Antigravity 2.0 Actually Is (And What It Isn't)

The original Antigravity launched in November 2025 as, essentially, Google's answer to Cursor. An AI-powered IDE. Useful, but not something that changed how I thought about development.

Version 2.0 is different in a way that took me a while to articulate. It's not trying to be a better editor. It's trying to replace the editor as the central surface of development altogether.

The 2.0 release ships as five distinct pieces:

1. The Standalone Desktop App

The new desktop app isn't a code editor with AI bolted on. It's built around managing parallel agents — you describe what you want, multiple specialized subagents spin up concurrently, and the app is your dashboard for what they're doing.

The conceptual shift here is real. Every editor I've used puts me at the center — my cursor, my decisions at every line. Antigravity 2.0 puts the agent workflow at the center and positions me as the person reviewing and directing, not typing.

I'm genuinely not sure how I feel about that yet. But it's a different bet than anyone else is making.

2. The Antigravity CLI (`agy`)

Built in Go, brings the full agent harness to your terminal. If you live in the command line — which I do more and more — this matters. It has built-in sandboxing, credential masking, and hardened Git policies baked into the foundation, not added as an afterthought.

3. The Antigravity SDK

This is for teams who want to deploy custom agents on their own infrastructure rather than routing everything through Google's cloud. For anyone with compliance requirements or data concerns, this is what makes the platform actually usable in a professional context.

4. Managed Agents in the Gemini API

One API call. Full agent. Sandboxed Linux environment. Persistent state. More on this in a moment — it's the part I actually tested.

5. The Gemini Enterprise Agent Platform

Managed deployment path for large organizations inside Google Cloud. More restricted access right now, but the direction is clear.

I Actually Tried It (Here's What I Found)

My instinct after any big I/O announcement is to go read the actual docs rather than the blog posts. So that's what I did.

The Managed Agents quickstart is genuinely simple. Here's the call that provisions a full agent environment:

from google import genai

client = genai.Client()

interaction = client.interactions.create(
    agent="antigravity-preview-05-2026",
    environment="remote",
    input="Write a Python script that generates the first 20 Fibonacci numbers and saves them to fibonacci.txt."
)

print(interaction.output_text)

That single call provisions a fresh Linux sandbox, loads Gemini 3.5 Flash, equips the agent with web search, code execution, and file management — and then runs the whole thing autonomously.

What I kept thinking about as I read through it: the setup I used to have to do manually to get anywhere close to this — provisioning containers, wiring up tool access, managing execution state — was genuinely hours of work before I could even start on the actual problem. This collapses that completely.

The part that actually surprised me was multi-turn session continuity:

# Pick up exactly where the last call left off
interaction2 = client.interactions.create(
    agent="antigravity-preview-05-2026",
    previous_interaction_id=interaction.id,
    environment=interaction.environment_id,
    input="Now plot the Fibonacci sequence as a line chart and save it as chart.png."
)

Same sandbox. Same file system. Same state. Just pass the ID. No serialization, no manual context management.

That's the thing I'd been doing manually for months in my own projects — carefully threading state between agent calls so nothing got lost. Seeing it handled by passing a single ID was a quiet "oh" moment.

The honest caveats though, because there are real ones: this is in preview. The only supported base agent is antigravity-preview-05-2026. There's no agent versioning or rollback yet. Subagent nesting isn't supported. The schema may still change. I wouldn't build production-critical workflows on this today — but as a signal of where the API is heading, it's concrete enough to take seriously.

The Idea Behind It All That Most Coverage Is Missing

Here's what I keep coming back to when I think about Antigravity 2.0.

Every AI coding tool I've used — Cursor, Copilot, even Antigravity 1.0 — made the same underlying assumption: AI should live inside the editor. The editor is where developers work, so meet them there.

Antigravity 2.0 is built on a completely different assumption: the editor itself is the wrong primitive for the agentic era.

Think about what an IDE actually is. It's a surface optimized for a human to produce text at human speed. Every feature — syntax highlighting, file trees, the cursor — exists to support you doing the writing.

But if agents can write, test, and refactor code faster than you can read it, you don't need a writing surface anymore. You need something more like a workflow management dashboard — something that tells you what's running, what's done, what needs your input.

That's what the Antigravity 2.0 desktop app is trying to be.

I find this genuinely interesting to think about, even while holding it somewhat skeptically. The editor has been the central tool of software development for four decades. That's a long time for something to calcify into "just how things work." The question of whether it's actually the right abstraction for AI-assisted development is worth asking out loud.

Where I'm Still Not Convinced

I want to be fair here, because I've been in enough hype cycles to know that interesting architecture doesn't automatically mean good outcomes.

Agent reliability is still the hard problem. My frustration with current AI coding tools is precisely that they make plausible-looking mistakes, and I catch them too late. Parallel subagents executing in the background amplifies that risk — one wrong assumption by one agent, built upon by three others, and you have a mess that's much harder to unwind than a single bad autocomplete suggestion. The sandboxing and Git policies help, but they're guardrails around execution, not guarantees of correctness.

The pricing signals who this is really for. The new AI Ultra plan is $100/month for 5x higher usage limits. That's an enterprise line item. The free and low-cost tiers that independent developers and students typically use to build ecosystem familiarity around a platform are not the story here. Google is making a deliberate choice to prioritize enterprise adoption, and that's worth naming explicitly.

Gemini CLI is being retired for consumer users. This is the consolidation that bothers me most personally. A lightweight, free CLI path is being replaced by Antigravity — which costs money. If you're not on a paid plan, your path to the agent harness is the raw API. That's a real narrowing of access.

"Vibe coding" is a phrase I'm suspicious of. It shows up in the Google AI Studio announcement — "native Kotlin support to vibe code Android apps." I understand what they mean. But "vibe coding" is language that frames effortlessness as the goal, when for a lot of us, understanding what our code does is the goal. The complexity doesn't disappear just because the interface hides it. And for someone still building their fundamentals, that gap between "it works" and "I understand why it works" matters a lot.

What's Actually Different Starting This Week

Setting aside the bigger questions, here's what concretely changed at I/O for different kinds of developers:

If you use the Gemini API: Managed Agents is worth exploring now. The quickstart is genuinely accessible and you can have something running in under 15 minutes. The friction reduction on multi-step agent workflows is real.

If you work on Android: The Android CLI is now stable, and the open-sourced Android skills give LLMs verified scaffolding for complex migrations — Jetpack Compose, Jetpack Navigation 3 — rather than the hallucinated patterns that have burned people before. The migration agent for React Native or iOS codebases to native Kotlin is notable if you're carrying that kind of technical debt.

If you work on the web: WebMCP is the slow-burn announcement. A proposed open standard for browser-based AI agents to interact with JavaScript functions and HTML forms is infrastructure-level stuff — currently in origin trial in Chrome 149. Watch this one over the next year.

If you run or work on a team: The SDK and Enterprise Agent Platform are what make this usable in real organizational contexts. The security primitives aren't bolted on.

Where I Land

I came into Google I/O 2026 skeptical about AI development tooling, and I'm leaving it with that skepticism mostly intact — but pointed at more specific things.

The architecture of Antigravity 2.0 is genuinely interesting. The Managed Agents API simplifies something I've been doing manually for a long time. The thesis — that the editor is a legacy abstraction — is a real idea worth taking seriously, not just keynote positioning.

But the pricing, the accessibility concerns, and the unresolved question of agent reliability are all real. Interesting architecture and practical daily usefulness are different things, and only one of them matters when you're trying to ship.

What I'd say is: try the Managed Agents API if you use Gemini. Read the Antigravity 2.0 release notes carefully rather than the blog posts. And if the IDE-as-legacy-abstraction thesis turns out to be right, you'll want to have been paying attention when this shipped.

The direction is clear. Whether the execution catches up to it is the question the rest of 2026 will answer.

Wrote this after spending an evening with the Google I/O 2026 developer keynote and the actual Gemini API docs. Try Managed Agents at the official quickstart and explore Antigravity 2.0 at antigravity.google.

DEV Community: NITHESH SARAVANAN

Solstice Cipher - A Bletchley Park Tribute to Alan Turing

What I Built

Video Demo

Code

How I Built It

Prize Category

I Finally Shipped FlowDesk — My All-in-One Productivity Dashboard Built with GitHub Copilot ⚡

What I Built

Core Features

Demo

The Comeback Story

Where It Started (The "Before")

What Changed (The "After")

My Experience with GitHub Copilot

Where It Helped Most

What I Still Had To Do Myself

Tech Stack

What's Next

The Agent That Actually Remembers You: A Deep Dive into Hermes Agent published

The problem it's solving (which is real and annoying)

What actually makes it different

Installation (genuinely fast)

The infrastructure stuff that actually matters

The parallelization thing

The thing that actually tripped me up

Where I'd actually use it vs. not

Why it matters beyond the product

Try it yourself

Gemma 4 Runs on a Raspberry Pi. Let That Sink In.

What Gemma 4 Actually Is

The Part That Stopped Me

Why This Is a Bigger Deal Than the Benchmarks

How the Four Variants Actually Fit Together

What I'm Actually Thinking About Building

The Honest Caveat

Why Open-Weight Matters Here

Where I Land

Google Antigravity 2.0: The End of the IDE Era (And What Comes Next)

The Demo I Wasn't Expecting

What Antigravity 2.0 Actually Is (And What It Isn't)

1. The Standalone Desktop App

2. The Antigravity CLI (agy)

3. The Antigravity SDK

4. Managed Agents in the Gemini API

5. The Gemini Enterprise Agent Platform

I Actually Tried It (Here's What I Found)

The Idea Behind It All That Most Coverage Is Missing

Where I'm Still Not Convinced

What's Actually Different Starting This Week

Where I Land

2. The Antigravity CLI (`agy`)