DEV Community

harish venkatakrishnan
harish venkatakrishnan

Posted on

I benchmarked Gemma 4 E4B against the Gemma family — here are the results

I've been running real benchmarks on open-source LLMs to test things the big labs don't publish. My latest experiment compares Gemma 4 E4B against the rest of the Gemma family on enterprise tasks.

Full results with methodology and limitations: https://aiexplorer-blog.vercel.app/post/gemma-4-e4b-enterprise-benchmark

I'm building a collection of these experiments at aiexplorer-blog.vercel.app/experiments — covering structured JSON output, context position bias, RAG compliance, and prompt injection defenses.

Feedback welcome — what models or tasks should I benchmark next?

Top comments (0)