hiyoyo

Posted on May 20

Is Gemini 3.5 Flash Actually Better at Coding Than 3.1 Pro? I Tested It with Real Rust Code

#rust #gemini #llm #productivity

Background

Gemini 3.5 Flash launched at Google I/O 2026 with a bold claim: it beats Gemini 3.1 Pro on coding and agentic benchmarks — while running 4x faster.

At the same time, X (formerly Twitter) is full of posts saying it hallucinates constantly and doesn't even reach Claude Sonnet level.

So which is it? I ran a real benchmark using code from my actual dev stack to find out.

Who I am

Solo indie Mac app developer (Tauri + Rust + Swift stack)
I use Gemini daily as part of my coding workflow
Built 13 macOS utilities, mostly Android connectivity tools

The Test

Models compared

Gemini 3.1 Pro
Gemini 3.5 Flash (new)

What I tested

I gave both models a ~200-line Rust file (ADB device manager) with 14 intentional bugs and asked them to find and fix everything.

Why 200 lines? Because in my experience:

Under 50 lines: any model gets lucky sometimes
Over 100 lines: older Flash models produce near-unusable code
200 lines: a realistic production task that separates real understanding from pattern matching

Bug breakdown

Category	Count	Examples
Logic bugs (main)	7	Post-execution timeout check, APK install failure not detected
Async bugs	2	CPU-spinning busy loop, potential data race
ADB-specific traps	3	`\r` leftover in line endings, temp file not cleaned up
Missing tests	2	Edge cases not covered

The ADB-specific bugs were key — you need domain knowledge to catch them, not just Rust syntax awareness.

Prompt (no hints)

The following Rust code contains several bugs.
Please identify all bugs and provide the corrected code.
Include an explanation for each bug.

No hints. No scaffolding. Raw capability test.

The Code (buggy version)

adb_device_manager.rs — click to expand

use std::process::{Command, Stdio};
use std::time::{Duration, Instant};
use std::sync::{Arc, Mutex};
use tokio::time::sleep;
use std::collections::HashMap;

#[derive(Debug, Clone)]
pub struct AdbDevice {
    pub serial: String,
    pub state: DeviceState,
    pub properties: HashMap,
}

#[derive(Debug, Clone, PartialEq)]
pub enum DeviceState {
    Online, Offline, Unauthorized, Unknown,
}

pub struct AdbManager {
    devices: Arc&gt;&gt;,
    adb_path: String,
    command_timeout: Duration,
}

impl AdbManager {
    pub fn new(adb_path: String) -&gt; Self {
        AdbManager {
            devices: Arc::new(Mutex::new(Vec::new())),
            adb_path,
            // BUG 1: from_millis(5) — should be from_secs(5)
            command_timeout: Duration::from_millis(5),
        }
    }

    pub fn execute_command(&amp;self, serial: &amp;str, args: &amp;[&amp;str]) -&gt; Result {
        let start = Instant::now();

        // prepend "-s " before command args
        let mut cmd_args = vec!["-s", serial];
        cmd_args.extend_from_slice(args);

        let output = Command::new(&amp;self.adb_path)
            .args(&amp;cmd_args)
            .output()
            .map_err(|e| format!("Command failed: {}", e))?;

        // BUG 4: timeout check AFTER command completes — completely useless
        if start.elapsed() &gt; self.command_timeout {
            return Err("timed out".to_string());
        }
        Ok(String::from_utf8_lossy(&amp;output.stdout).to_string())
    }

    pub async fn wait_for_device(&amp;self, serial: &amp;str, timeout_secs: u64) -&gt; Result&lt;(), String&gt; {
        let deadline = Instant::now() + Duration::from_secs(timeout_secs);
        loop {
            // BUG 8: no sleep → CPU at 100%
            let devices = self.get_connected_devices()?;
            if devices.iter().any(|d| d.serial == serial) {
                return Ok(());
            }
            if Instant::now() &gt;= deadline {
                return Err("timeout".to_string());
            }
        }
    }

    pub fn install_apk(&amp;self, serial: &amp;str, apk_path: &amp;str) -&gt; Result&lt;(), String&gt; {
        let result = self.execute_command(serial, &amp;["install", "-r", apk_path])?;
        // BUG 11: adb install returns exit code 0 even on failure
        // must check stdout for "Success"/"Failure" strings
        if result.contains("Failure") {
            return Err(format!("Install failed: {}", result));
        }
        Ok(())
    }

    pub fn take_screenshot(&amp;self, serial: &amp;str, save_path: &amp;str) -&gt; Result&lt;(), String&gt; {
        let temp_path = "/sdcard/screenshot_temp.png";
        self.execute_command(serial, &amp;["shell", "screencap", "-p", temp_path])?;
        // BUG 12: temp file never deleted from device
        self.execute_command(serial, &amp;["pull", temp_path, save_path])?;
        Ok(())
    }
}

Results

Bug detection

Model	Bugs found	Score
Gemini 3.1 Pro	14	14/14 ✅
Gemini 3.5 Flash	14	14/14 ✅

Both models found every bug. Accuracy: identical.

Speed

Model	Response time
Gemini 3.1 Pro	~40 seconds
Gemini 3.5 Flash	A few seconds (10x+ faster)

This was the most striking difference by far.

Where the models diverged

Same score, but different approaches on a few interesting bugs:

Bug 4 — timeout check after execution

3.1 Pro rewrote it using tokio::time::timeout (fully async)

3.5 Flash used spawn() + try_wait() polling loop (sync-leaning approach)

Both are valid fixes. Different style choices.

Bug 10 — Mutex poison handling

3.1 Pro: into_inner() to safely recover the data

3.5 Flash: expect() for fail-fast behavior

Opposite design philosophies. Neither is wrong — depends on your error handling strategy.

Bug 6 — spaces in remote path

3.1 Pro: correctly noted that Command::new handles args without shell splitting, so no quoting needed — left it as-is (accurate ADB knowledge)

3.5 Flash: added format!("\"{}\"", remote_path) quoting (technically unnecessary, slight overreach)

3.1 Pro showed deeper understanding of how ADB + Rust process spawning actually works.

Pricing reality check

App (free plan)

Model	Cost	Speed
3.1 Pro	Free	Slow (~40s)
3.5 Flash	Free	Fast (few seconds)

API (Pay-as-you-go)

Straight from Google AI Studio's official UI:

Model	Input / 1M tokens	Output / 1M tokens
Gemini 3.5 Flash	$1.50	$9.00
Gemini 3.1 Flash-Lite	$0.25	$1.50
Context Caching	$0.15	—

The $9.00 output price is 3x the previous generation (Gemini 3 Flash at $3.00). Google's "half the price of frontier models" pitch compares against competitors — not their own previous Flash tier.

For indie developers:

Prototyping / testing → Free tier is more than enough
Production / commercial → $1.50/$9.00. Budget carefully for output-heavy workloads.

Bonus: I asked Gemini about its own price. It hallucinated.

During testing I asked Gemini 3.5 Flash directly: "What's the API pricing for Gemini 3.5 Flash?"

It confidently answered:

"Input: ~$0.50 / Output: ~$3.00 per million tokens!"

That's the old Gemini 3 Flash Preview pricing. The actual price is $1.50/$9.00.

When I told it the real number, it immediately replied:

"I sincerely apologize! The information you provided is 100% correct!"

The model that aced a 14-bug Rust challenge couldn't accurately describe its own pricing.

A hallucination detection benchmark article ending with a hallucination felt appropriate.

Conclusion

Category	Result
Coding accuracy (200-line bug fix)	3.1 Pro ≈ 3.5 Flash
Output speed	3.5 Flash wins by 10x+
API cost (output)	3.5 Flash at $9.00/1M tokens (3x previous gen)
Free tier usability	3.5 Flash is the clear winner

For free tier users: switch to 3.5 Flash immediately.

For API cost-conscious production use: consider 3.1 Flash-Lite at $0.25/$1.50.

On the "doesn't reach Claude Sonnet" criticism — at least for Rust bug-fix tasks, both Flash models performed at a level I'd call genuinely useful. The hallucination complaints may apply more to conversational/knowledge tasks than structured code review — though in my limited testing with a single task type, I can't say for certain.

I build macOS utilities for Mac×Android workflows. If you're into Tauri, ADB, or MTP on macOS, feel free to follow.

Top comments (6)

Njx • May 20

Hey - trying to buy Hiyoko Shot but been getting an error at checkout on gumtree last few days now - anywhere else I can pay for this? :)

Comment deleted

hiyoyo • May 20

Hi! Sorry about that — I'm looking into it on my end.
Gumroad should support UK payments, so something else might be going on.
I'll check the dashboard and get back to you. Thanks for letting me know!

hiyoyo • May 20

Hi! I've sent you an email with an update. Sorry again for the trouble!

Some comments may only be visible to logged-in visitors. Sign in to view all comments.