I benchmarked Gemma 4 E4B against the Gemma family — here are the results

harish venkatakrishnan — Wed, 08 Apr 2026 17:36:48 +0000

I've been running real benchmarks on open-source LLMs to test things the big labs don't publish. My latest experiment compares Gemma 4 E4B against the rest of the Gemma family on enterprise tasks.

Full results with methodology and limitations: https://aiexplorer-blog.vercel.app/post/gemma-4-e4b-enterprise-benchmark

I'm building a collection of these experiments at aiexplorer-blog.vercel.app/experiments — covering structured JSON output, context position bias, RAG compliance, and prompt injection defenses.

Feedback welcome — what models or tasks should I benchmark next?

DEV Community: harish venkatakrishnan

I benchmarked Gemma 4 E4B against the Gemma family — here are the results