This is a submission for the Gemma 4 Challenge: Write About Gemma 4
Every morning before I open Slack, I think about airspace.
Not metaphorically. I'm the Head of Engineering at a drone UTM (Unmanned Traffic Management) platform — software that keeps drones from colliding with each other and with manned aircraft over cities where large-scale commercial drone operators and FAA-regulated fleets are flying missions. Our system processes flight plans, validates airspace conflicts in real time, and communicates with the FAA's DroneZone infrastructure.
Airspace is genuinely complex. A single flight corridor over a city involves Class B shelves, TFRs that pop up with 20 minutes of notice, active NOTAMs, LAANC grid authorizations, and dynamic separation rules that change based on weather and traffic density.
Then Gemma 4 dropped on April 2nd, 2026, and I started rethinking some assumptions.
What I Actually Needed (And Why Previous Models Fell Short)
The pain point isn't data — we have plenty of that. It's translation. Drone operators range from ex-military pilots who know the FARs cold, to delivery companies that just hired someone last week to run a drone route. The same airspace data needs to be communicated completely differently to each audience.
NOTAMs (Notices to Air Missions) are a perfect example. They look like this:
!PHX 05/014 PHX NAV ILS RWY 8 LOC UNUSABLE 140DEG CW 220DEG BEYOND 18NM
BLW 4000FT MSL 2605161400-2605171400
For a Part 107 certified pilot, that's readable. For a logistics coordinator managing 40 simultaneous drone deliveries, that's a 500ms context-switch that shouldn't be their problem.
I'd tried using LLMs for this translation layer before. The issues were consistent:
- Hallucination on technical specifics: Models would confidently describe restrictions that didn't exist, or miss ones that did
- No reasoning about implications: Summarizing the NOTAM is one thing; telling the operator "this means your planned corridor at 350ft AGL through this zone is blocked from 2pm to 4pm" requires multi-step spatial reasoning
- Context window constraints: A real airspace brief for a complex metro area involves dozens of NOTAMs, the full Part 107 ruleset, current TFR data, and the specific flight plan. Earlier models hit walls fast
- Size vs. capability tradeoff: Running a 70B model locally on field hardware wasn't viable. The lighter models weren't smart enough.
Gemma 4 changes this calculus in specific, measurable ways.
The Model I Chose and Why It's the Right Tool
I'm using the Gemma 4 26B MoE (Mixture-of-Experts) variant, and the choice was deliberate.
The 26B MoE activates only ~4 billion parameters per inference pass — meaning it has the intelligence of a much larger model at a fraction of the compute cost. On a single RTX 4090 (24GB VRAM), it runs comfortably at full precision. For an application that needs to serve dozens of operators simultaneously with sub-3-second response times, this matters enormously.
Here's the tradeoff table I worked through:
| Model | Why I considered it | Why I didn't pick it |
|---|---|---|
| Gemma 4 E4B | Runs on mobile / edge hardware | Not enough reasoning depth for multi-NOTAM conflict detection |
| Gemma 4 31B Dense | Maximum capability, 256K context | 20GB+ VRAM, overkill for most queries |
| Gemma 4 26B MoE | Near-31B quality, 12GB VRAM, 256K context | — This is the one |
The 256K context window on the MoE is the other unlock. I can now dump an entire airspace brief — full NOTAM batch, the relevant CFR sections, the operator's specific flight plan, their certification history — into a single prompt and get coherent, cross-referenced reasoning back. No chunking, no retrieval gymnastics, no lost context.
And then there's the benchmark that actually made me sit up: 86.4% on τ2-bench Retail (multi-step agentic tool use). Up from 6.6% on Gemma 3. That's not an incremental improvement. That's a model that can now actually do things in a multi-step workflow, not just describe what should be done.
What I Built: SkyBrief
I built a tool called SkyBrief — a Gemma 4-powered airspace intelligence assistant for drone operators.
🔗 Live app: https://skybrief-nine.vercel.app
🔗 GitHub: https://github.com/ashishjsharda/skybrief
The workflow:
- Operator drops in a NOTAM (PDF, text, or image of a sectional chart)
- Or types a natural language query: "I need to fly at 400ft AGL over downtown Phoenix tomorrow at 10am — what do I need?"
- Gemma 4 26B MoE processes it with a system prompt encoding FAA Part 107 rules, airspace classification knowledge, and LAANC requirements
- Returns a structured response: plain-English summary, specific conflicts identified, and an operator-ready compliance checklist
The key architectural decision: Gemma 4 does the reasoning, not the retrieval. We keep a separate data layer for live NOTAM feeds and TFR data. Gemma's job is to take that data and turn it into something a human operator can act on in under 10 seconds.
The thinking mode is not a gimmick here. For complex multi-NOTAM scenarios, watching the model reason step-by-step through airspace geometry produces outputs that are verifiably correct in ways that single-pass models can't match.
The Shift That Actually Matters
Here's the thing I keep coming back to.
We've built an enormous amount of programmatic logic to parse airspace rules and flag conflicts. That logic is valuable — it's precise, auditable, and fast. But it's also brittle. New airspace designations, new waiver categories, new operator certification types — every change requires engineering cycles to handle.
What Gemma 4 represents is a system that understands the underlying regulatory intent, not just the current ruleset. When I prompt it with a scenario that doesn't fit neatly into our existing rule categories, it reasons through it from first principles.
The Apache 2.0 license matters here too. In aviation and airspace management, you cannot run mission-critical reasoning through an external API with opaque terms of service. The ability to deploy Gemma 4 on-premises, air-gapped if necessary, with full control over the model weights, is not a nice-to-have. It's table stakes.
Honest Limitations
Gemma 4 still hallucinates on specifics. NOTAM coordinates, frequency values, altitude limits — anything that requires exact numerical precision should be validated against ground truth data, not trusted from model output alone.
The 26B MoE still needs real hardware. An RTX 4090 or equivalent isn't cheap. For edge deployment or mobile scenarios, the E4B is a better choice — but you'll trade reasoning depth.
Thinking mode adds latency. For use cases where operators need answers in under two seconds, gate thinking mode based on query complexity signals.
What Local AI Actually Means for High-Stakes Domains
The drone industry is a useful lens because the stakes are concrete. A bad airspace brief doesn't just inconvenience someone — it creates collision risk.
Gemma 4 running locally changes what's possible. Not in a "LLMs can solve everything" way. But in a specific, bounded way: the gap between what rule-based systems can handle and what a human expert would reason through is meaningfully narrowed by a 26B model that fits on hardware we actually own.
That gap is where the interesting engineering work lives right now.
If you're building in aviation, healthcare, legal, defense, or any domain where data sovereignty and auditability aren't optional — this is the moment to start taking local inference seriously.
I built SkyBrief in an evening session. The harder work is integrating it thoughtfully into systems where the outputs matter. That's the engineering problem worth spending time on.
Ashish is Head of Engineering in the drone UTM space and an O'Reilly author and LinkedIn Learning instructor. Find him on DEV and Medium.
Top comments (0)