DEV Community

Cover image for Everyone's Talking About Gemini 3.5 Flash. The Real Story at Google I/O 2026 Was a Skill File.

Everyone's Talking About Gemini 3.5 Flash. The Real Story at Google I/O 2026 Was a Skill File.

Sreejit Pradhan on May 24, 2026

This is a submission for the Google I/O Writing Challenge Everyone walked away from Google I/O 2026 talking about Gemini 3.5 Flash benchmarks. ...
Collapse
 
ofri-peretz profile image
Ofri Peretz

The part where the agent read your git status and project structure before proposing the hybrid approach (LLM skill + static Python checker + git hook) is exactly how I'd expect a competent engineer to scope the work — not just generate code, but understand the constraint surface first. I've been running static analysis at scale for years, and the hardest part is always the same: getting developers to actually run the checks before pushing. A pre-commit hook that an AI wired up end-to-end, including the enforcement layer, is legitimately useful if it doesn't produce a flood of false positives. The real test is whether that check-a11y.py script is maintainable six months from now when WCAG 2.2 rules change or your component patterns evolve.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Exactly. The impressive part wasn’t the codegen, it was the agent understanding the repo and constraint surface before deciding on architecture. Hybrid enforcement (LLM skill + deterministic checker + git hook) is the only approach that scales realistically. The real benchmark isn’t “does it work today”, it’s whether that checker survives evolving WCAG rules and component drift 6 months later without becoming noise developers bypass.

Collapse
 
unitbuilds profile image
UnitBuilds

I've been loving it, currently porting the SDK to windows (a rather annoying task, thanks Google for the support) to create a swarmer and mobile interface, so it's easier to interface with and manage multiple projects at once, without sitting at my desk. That custom skills system fits perfectly with my framework and what I built over the past year, I had built the system from scratch to utilize Vertex AI, but with the new Gemini Builds, I'm out of the cloud market, so pivoting to Sovereign systems, where those skills make a huge difference for multi-agent management. Orchestrator no longer needs to share the scope of a worker agent and all worker agents dont need to be confined to the same skillset. It's awesome!

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Exactly. The biggest win is capability isolation. Orchestrator no longer needs full worker scope, and workers no longer need bloated shared cognition. Skills turn agents into modular execution contexts instead of monolithic assistants. That changes multi-agent orchestration completely.

Collapse
 
unitbuilds profile image
UnitBuilds

Not to mention for scalability. My previous swarm system had to manually assign skills and the customized MCP protocol at runtime, but it was set skills, set MCP, now it's modular and dynamic, I can have the orchestrator create a custom skill file (Which I highly recommend if you use Gemini 3.5 Flash. I suspect it's a MoE, which explains the speed and Pro level knowledge), which can cut your baseline context window considerably, while improving code quality. Eg. My pipeline uses orchestrator, Tier N managers, then per-file workers. I can now have managers carry skill files that hold the context of their module in the scope of the refactor, resulting in worker skills being written with the end-goal in mind. The other awesome feature they added was the /schedule and /goal. Those 2 mean you can have regular interval actions, eg. checking a discourse db for proposed changes to shared files, so there's no more toe stepping and structural re-alignment at set intervals. Goal means you can set and forget, it'll continue till it's done, like if you want to optimize a system, you set the goal post and you leave it. Stupid example, but theoretically possible, you can tell it 'here's a compression system, improve it until we've reached 90% reduction, while remaining lossless', yes you'll need to put in anti-loop guardrails, but theoretically you could leave that overnight and wake up to a successful algo. Whereas previously it would hit a wall. Combined with scheduled continues, even if it hits a 5h window quota, you've scheduled a restart for a minute after it hits, so it can run continuously, indefinitely.

Thread Thread
 
sreejit_ profile image
Sreejit Pradhan

Exactly. This is the first time these systems actually feel architecturally scalable instead of just “bigger prompts + more agents.” Dynamic skill generation completely changes orchestration because cognition becomes modular and runtime-scoped instead of globally shared. Having managers carry module-level intent and constraints down to per-file workers is far cleaner than stuffing everything into a giant shared context window.

The /goal and /schedule additions are honestly the bigger breakthrough though. That introduces persistence, temporal continuity, and autonomous iteration into agent systems. At that point they stop behaving like session-bound chat assistants and start looking more like distributed execution systems. Continuous scheduled recovery, long-horizon optimization loops, discourse/state synchronization between workers — that’s a very different category of infrastructure than most people realize.

Thread Thread
 
unitbuilds profile image
UnitBuilds

Exactly. Add to that JIT MCP configuration to enable/disable tools for each worker, you have a lean, mean, development machine

Thread Thread
 
sreejit_ profile image
Sreejit Pradhan

Exactly. JIT MCP configuration is a massive part of making this actually scalable in practice. Workers no longer need permanent access to every tool or protocol upfront — capabilities become ephemeral and task-scoped. That keeps agents leaner, reduces unnecessary context/tool exposure, and makes orchestration far more deterministic. Combined with dynamic skills, it starts looking less like “AI agents” and more like distributed cognitive infrastructure.

Collapse
 
syedahmershah profile image
Syed Ahmer Shah

Shifting the focus from raw model benchmarks to the 'skill file' standard highlights what actually matters for production: execution boundaries and context management.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Exactly. Models are becoming commodities.
The real differentiation is shifting toward orchestration, memory, execution boundaries, and how intelligence is packaged into reusable skills.

Collapse
 
shogun444 profile image
shogun 444 • Edited

This was a fascinating read. The most interesting part wasn’t Gemini 3.5 Flash .It was the shift from “AI assistant” to composable agent through SKILL.md. The accessibility-reviewer example made the whole thing feel very real very quickly.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Exactly. That’s the moment it stopped feeling like a demo and started feeling like infrastructure.
SKILL.md turns AI from a chatbot into a composable execution layer and that changes the entire trajectory of agent design.

Collapse
 
vicchen profile image
Vic Chen

Really enjoyed the distinction here between agent config and reusable skills. As someone building AI products, I think that “JSON for behavior, markdown for capability” mental model is much closer to how production systems actually evolve than the keynote version. The hybrid point also landed for me — pairing LLM judgment with deterministic checks and pre-commit enforcement is where these workflows start feeling durable instead of demo-friendly.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Thanks man!
Your take on this system is absolutely correct. Seeing the new agentic coding systems and their enhanced Agentic COT Reasoning now implemented with proper deterministic checks is now extremely powerful. Using AI to build production grade systems is now on a much higher level. We are now very close to this system of "Prompt, Build, Review and Ship", being perfect. Or who knows some company pulls out "Superhuman Coder" and we just hit an enter button and do nothing 😂

Collapse
 
sunychoudhary profile image
Suny Choudhary

This is a good point. The model announcement gets the attention, but the skill file idea may matter more for actual builders.

A stronger model helps, but repeatable behavior comes from giving the AI clearer operating context: project rules, preferences, workflows, constraints, examples, and decision patterns.

That is what most teams are missing. They keep asking for smarter models when the real problem is that every session starts with too much missing context.

Skill files feel like a step toward making AI assistants more consistent inside real work. Not just “answer this prompt,” but “understand how this team or project wants work done.”

The risk is that people treat skill files like another prompt hack. The useful version needs versioning, review, and cleanup, otherwise it becomes stale context that quietly shapes bad outputs.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Exactly. That’s the shift I was trying to point at in the article — intelligence alone doesn’t create consistency. Operational context does. Most failures in real workflows come from missing constraints, patterns, and team-specific expectations, not lack of raw model capability.

And I completely agree on the risk side too. If skill files just become giant unmaintained prompt dumps, they’ll decay fast and start introducing invisible behavioral drift. The useful long-term version probably looks much closer to software infrastructure: versioned, reviewed, modular, testable, and continuously refined alongside the codebase itself.

Collapse
 
xulingfeng profile image
xulingfeng

agent frameworks.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Yess🔥

Collapse
 
xulingfeng profile image
xulingfeng

Totally — the framework landscape moves fast, and picking the wrong one early can be costly. I've been gravitating toward composable, minimal abstractions rather than all-in-one platforms. What's your current go-to when you do reach for a framework?

Collapse
 
xulingfeng profile image
xulingfeng

this landing for teams that aren't already using agent frameworks.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

This especially lands for teams that aren’t already deep into agent frameworks.
SKILL.md makes the shift feel practical instead of experimental.

Collapse
 
xulingfeng profile image
xulingfeng

That's a fair point — agent frameworks can be overkill for simple automation tasks. For my setup, I started with raw function-calling and only introduced a lightweight decision layer when the branching logic got unwieldy. Would be curious what your threshold is for reaching for a framework vs keeping it simple.

Collapse
 
pranay_patikar_2e775de616 profile image
Pranay Patikar

Fantastic description really cool🔥🔥

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Thanks bhaii🥰

Collapse
 
digisal profile image
digiSal

hmm, when i do /skills I only see the 2 I recently created. I dont any of these skills preinstalled that you mentioned.

Collapse
 
sreejit_ profile image
Sreejit Pradhan

Yeah, that’s because there currently aren’t a bunch of preinstalled/public skills exposed by default. The article was more about the underlying architecture and what it enables rather than a marketplace of bundled skills. The real power comes from creating scoped custom skills dynamically for specific workflows, modules, or agents. That’s where the orchestration and context-isolation benefits really start showing up.