<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aman Bhargav</title>
    <description>The latest articles on DEV Community by Aman Bhargav (@aman_bhargav_f699ac83671a).</description>
    <link>https://dev.to/aman_bhargav_f699ac83671a</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3917853%2F0dd261e1-e3b1-4e7d-8a6e-6959c72ce7ab.png</url>
      <title>DEV Community: Aman Bhargav</title>
      <link>https://dev.to/aman_bhargav_f699ac83671a</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aman_bhargav_f699ac83671a"/>
    <language>en</language>
    <item>
      <title>After Testing Gemma 4 Locally, I Finally Understand Why MoE Models Matter</title>
      <dc:creator>Aman Bhargav</dc:creator>
      <pubDate>Fri, 08 May 2026 06:57:48 +0000</pubDate>
      <link>https://dev.to/aman_bhargav_f699ac83671a/after-testing-gemma-4-locally-i-finally-understand-why-moe-models-matter-ho0</link>
      <guid>https://dev.to/aman_bhargav_f699ac83671a/after-testing-gemma-4-locally-i-finally-understand-why-moe-models-matter-ho0</guid>
      <description>&lt;p&gt;I’ve tested enough local models at this point to stop trusting benchmark charts.&lt;/p&gt;

&lt;p&gt;Most of them look impressive until you actually give them a real project.&lt;/p&gt;

&lt;p&gt;Then things fall apart:&lt;/p&gt;

&lt;p&gt;context gets messy&lt;br&gt;
reasoning becomes inconsistent&lt;br&gt;
responses drift&lt;br&gt;
code suggestions start contradicting earlier answers&lt;/p&gt;

&lt;p&gt;So when Google released the Gemma 4 models, I wasn’t expecting much beyond another benchmark-heavy launch.&lt;/p&gt;

&lt;p&gt;But after spending a few days testing the 26B MoE model locally, I think this is the first open Mixture-of-Experts model that actually feels stable enough for real development work.&lt;/p&gt;

&lt;p&gt;Not perfect.&lt;/p&gt;

&lt;p&gt;But noticeably different.&lt;/p&gt;

&lt;p&gt;My Test Was Simple&lt;/p&gt;

&lt;p&gt;Instead of synthetic prompts, I used an actual Rails codebase I work on regularly.&lt;/p&gt;

&lt;p&gt;I fed the model:&lt;/p&gt;

&lt;p&gt;Sidekiq workers&lt;br&gt;
service objects&lt;br&gt;
serializers&lt;br&gt;
migrations&lt;br&gt;
API integrations&lt;br&gt;
ActiveRecord scopes&lt;br&gt;
some old messy business logic I never cleaned up&lt;/p&gt;

&lt;p&gt;Around 40+ files total.&lt;/p&gt;

&lt;p&gt;This is usually where smaller or poorly optimized models start losing track of relationships between files.&lt;/p&gt;

&lt;p&gt;Especially once the context gets large.&lt;/p&gt;

&lt;p&gt;Gemma 4 held up longer than I expected.&lt;/p&gt;

&lt;p&gt;At one point it pointed out:&lt;/p&gt;

&lt;p&gt;a stale migration&lt;br&gt;
duplicated validation logic&lt;br&gt;
an unnecessary retry pattern inside a Sidekiq worker&lt;/p&gt;

&lt;p&gt;Not groundbreaking individually.&lt;/p&gt;

&lt;p&gt;But the interesting part was that it maintained consistency while discussing those files together.&lt;/p&gt;

&lt;p&gt;That’s normally where local models start hallucinating.&lt;/p&gt;

&lt;p&gt;The “Thinking” Behavior Felt Different&lt;/p&gt;

&lt;p&gt;I tested the reasoning mode with a few debugging tasks.&lt;/p&gt;

&lt;p&gt;Adding:&lt;/p&gt;

&lt;p&gt;&amp;lt;|think|&amp;gt;&lt;/p&gt;

&lt;p&gt;changed the responses more than I expected.&lt;/p&gt;

&lt;p&gt;Instead of immediately generating code, the model started breaking problems into smaller steps internally first.&lt;/p&gt;

&lt;p&gt;Sometimes it even corrected its own assumptions midway through reasoning.&lt;/p&gt;

&lt;p&gt;That sounds small, but behavior like that makes the model feel far more usable during debugging sessions.&lt;/p&gt;

&lt;p&gt;Less autocomplete.&lt;br&gt;
More actual reasoning.&lt;/p&gt;

&lt;p&gt;Still not comparable to frontier cloud models.&lt;/p&gt;

&lt;p&gt;But much closer than I expected from an open local model.&lt;/p&gt;

&lt;p&gt;The MoE Architecture Finally Clicked for Me&lt;/p&gt;

&lt;p&gt;Before this, most MoE models I tried felt inconsistent.&lt;/p&gt;

&lt;p&gt;You’d get:&lt;/p&gt;

&lt;p&gt;one excellent response&lt;br&gt;
then one completely confused answer&lt;br&gt;
then another strong answer again&lt;/p&gt;

&lt;p&gt;Gemma 4 felt more stable across longer sessions.&lt;/p&gt;

&lt;p&gt;After reading more about the “Always-On Shared Expert” design, that behavior started making sense.&lt;/p&gt;

&lt;p&gt;The responses felt less chaotic between prompts.&lt;/p&gt;

&lt;p&gt;For coding workflows, that matters more than benchmark spikes honestly.&lt;/p&gt;

&lt;p&gt;I care less about leaderboard numbers and more about whether the model stays coherent after 30 minutes of back-and-forth debugging.&lt;/p&gt;

&lt;p&gt;The Context Window Is Actually Useful&lt;/p&gt;

&lt;p&gt;A lot of models advertise huge context windows now.&lt;/p&gt;

&lt;p&gt;Very few stay reliable when you push them hard.&lt;/p&gt;

&lt;p&gt;I tested Gemma 4 with larger chunks of project structure and it handled repository-level understanding surprisingly well.&lt;/p&gt;

&lt;p&gt;Not perfectly.&lt;/p&gt;

&lt;p&gt;But well enough that I’d realistically use it for:&lt;/p&gt;

&lt;p&gt;onboarding into old codebases&lt;br&gt;
tracing Sidekiq flows&lt;br&gt;
understanding legacy service layers&lt;br&gt;
finding duplicated business logic&lt;br&gt;
reviewing migrations&lt;/p&gt;

&lt;p&gt;That’s the first time I’ve seriously considered using a local model regularly for repository analysis.&lt;/p&gt;

&lt;p&gt;Where It Still Struggles&lt;/p&gt;

&lt;p&gt;There are still obvious limitations.&lt;/p&gt;

&lt;p&gt;I noticed weaker performance with:&lt;/p&gt;

&lt;p&gt;deeply nested metaprogramming&lt;br&gt;
very niche gems&lt;br&gt;
long autonomous coding loops&lt;br&gt;
highly abstract architecture discussions&lt;/p&gt;

&lt;p&gt;And realistically, the larger variants still require more hardware than most developers have access to.&lt;/p&gt;

&lt;p&gt;The AI industry keeps saying “local AI for everyone,” but large models are still expensive to run comfortably.&lt;/p&gt;

&lt;p&gt;That part hasn’t magically changed.&lt;/p&gt;

&lt;p&gt;The Apache 2.0 License Might Matter More Than The Model&lt;/p&gt;

&lt;p&gt;Honestly, this may end up being the biggest long-term win.&lt;/p&gt;

&lt;p&gt;The Apache 2.0 licensing removes a lot of hesitation around enterprise adoption.&lt;/p&gt;

&lt;p&gt;A powerful model with unclear licensing is still difficult to use safely inside real products.&lt;/p&gt;

&lt;p&gt;Gemma 4 finally feels deployable without legal uncertainty hanging over it.&lt;/p&gt;

&lt;p&gt;That changes things for startups and internal tooling teams immediately.&lt;/p&gt;

&lt;p&gt;Final Thoughts&lt;/p&gt;

&lt;p&gt;I don’t think Gemma 4 replaces GPT-5 or Claude for difficult engineering work.&lt;/p&gt;

&lt;p&gt;That’s not the point.&lt;/p&gt;

&lt;p&gt;The important shift is this:&lt;/p&gt;

&lt;p&gt;Open local models are finally becoming practical enough that developers can genuinely build around them instead of just experimenting with them.&lt;/p&gt;

&lt;p&gt;And honestly, this is the first open MoE model I’ve used where that future felt believable.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>webdev</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Ran Gemma 4 on an 8GB Laptop Expecting a Toy Model. I Was Completely Wrong.</title>
      <dc:creator>Aman Bhargav</dc:creator>
      <pubDate>Fri, 08 May 2026 06:54:52 +0000</pubDate>
      <link>https://dev.to/aman_bhargav_f699ac83671a/i-ran-gemma-4-on-an-8gb-laptop-expecting-a-toy-model-i-was-completely-wrong-42oo</link>
      <guid>https://dev.to/aman_bhargav_f699ac83671a/i-ran-gemma-4-on-an-8gb-laptop-expecting-a-toy-model-i-was-completely-wrong-42oo</guid>
      <description>&lt;p&gt;**Every AI release claims to be “efficient now.”&lt;/p&gt;

&lt;p&gt;Most of the time, that translates to:&lt;/p&gt;

&lt;p&gt;still needs expensive hardware&lt;br&gt;
still feels slow locally&lt;br&gt;
still breaks on reasoning tasks&lt;/p&gt;

&lt;p&gt;So when Google released Gemma 4 E2B, I honestly assumed it would be another lightweight model that looked good in benchmarks and failed in real usage.&lt;/p&gt;

&lt;p&gt;I tested it anyway.&lt;/p&gt;

&lt;p&gt;And after a week of running it locally, I think small models just crossed an important line.&lt;/p&gt;

&lt;p&gt;My Setup&lt;/p&gt;

&lt;p&gt;Nothing fancy.&lt;/p&gt;

&lt;p&gt;ollama run gemma4:2b&lt;/p&gt;

&lt;p&gt;Hardware:&lt;/p&gt;

&lt;p&gt;MacBook Air M1&lt;br&gt;
8GB RAM&lt;br&gt;
Ollama&lt;br&gt;
No external GPU&lt;/p&gt;

&lt;p&gt;Performance I saw:&lt;/p&gt;

&lt;p&gt;~40 tokens/sec average&lt;br&gt;
First pull took around 3 minutes&lt;br&gt;
RAM usage stayed around 5GB&lt;br&gt;
Fan noise was surprisingly manageable&lt;/p&gt;

&lt;p&gt;Most importantly:&lt;br&gt;
it actually felt responsive enough to use continuously.&lt;/p&gt;

&lt;p&gt;That’s rare for local models on weak hardware.&lt;/p&gt;

&lt;p&gt;The Moment That Changed My Opinion&lt;/p&gt;

&lt;p&gt;I tested a simple logic puzzle first.&lt;/p&gt;

&lt;p&gt;The kind of question smaller models usually fail because they rush into an answer.&lt;/p&gt;

&lt;p&gt;Without reasoning enabled:&lt;br&gt;
wrong answer instantly.&lt;/p&gt;

&lt;p&gt;Then I tested Gemma 4’s Thinking Mode.&lt;/p&gt;

&lt;p&gt;I added:&lt;/p&gt;

&lt;p&gt;&amp;lt;|think|&amp;gt;&lt;/p&gt;

&lt;p&gt;before the task.&lt;/p&gt;

&lt;p&gt;And the behavior changed completely.&lt;/p&gt;

&lt;p&gt;Instead of rushing, the model started breaking the problem into steps internally before responding.&lt;/p&gt;

&lt;p&gt;It literally looked like the model was “thinking out loud.”&lt;/p&gt;

&lt;p&gt;That was the first moment where a 2B local model genuinely felt different.&lt;/p&gt;

&lt;p&gt;Not smarter in a benchmark sense.&lt;/p&gt;

&lt;p&gt;Smarter in behavior.&lt;/p&gt;

&lt;p&gt;The Most Underrated Feature: Native Audio&lt;/p&gt;

&lt;p&gt;This honestly surprised me more than the reasoning.&lt;/p&gt;

&lt;p&gt;I tested raw audio input using a messy voice note where I explained a Rails debugging issue while walking outside.&lt;/p&gt;

&lt;p&gt;No Whisper pipeline.&lt;br&gt;
No speech-to-text preprocessing.&lt;br&gt;
No extra tooling.&lt;/p&gt;

&lt;p&gt;Gemma 4 understood the context directly from audio input.&lt;/p&gt;

&lt;p&gt;That matters more than people realize.&lt;/p&gt;

&lt;p&gt;Most “voice AI” stacks today are still multiple systems stitched together:&lt;/p&gt;

&lt;p&gt;speech recognition&lt;br&gt;
cleanup&lt;br&gt;
context formatting&lt;br&gt;
LLM inference&lt;/p&gt;

&lt;p&gt;Gemma 4 reducing that complexity is a huge deal for local privacy-first apps.&lt;/p&gt;

&lt;p&gt;Especially for:&lt;/p&gt;

&lt;p&gt;offline assistants&lt;br&gt;
internal enterprise tooling&lt;br&gt;
edge devices&lt;br&gt;
mobile AI workflows&lt;br&gt;
I Tried Feeding It Real Code&lt;/p&gt;

&lt;p&gt;Benchmarks are one thing.&lt;/p&gt;

&lt;p&gt;Real projects are another.&lt;/p&gt;

&lt;p&gt;So I gave it pieces of a Rails project:&lt;/p&gt;

&lt;p&gt;Sidekiq jobs&lt;br&gt;
service objects&lt;br&gt;
migrations&lt;br&gt;
serializers&lt;br&gt;
ActiveRecord scopes&lt;/p&gt;

&lt;p&gt;And honestly?&lt;/p&gt;

&lt;p&gt;It performed better than I expected on:&lt;/p&gt;

&lt;p&gt;debugging&lt;br&gt;
explaining legacy code&lt;br&gt;
identifying duplicated logic&lt;br&gt;
finding missing indexes&lt;/p&gt;

&lt;p&gt;Where it still struggles:&lt;/p&gt;

&lt;p&gt;large architectural refactors&lt;br&gt;
very deep Rails metaprogramming&lt;br&gt;
maintaining consistency across long sessions&lt;br&gt;
niche gems with poor documentation&lt;/p&gt;

&lt;p&gt;So no, this is not replacing larger cloud models yet.&lt;/p&gt;

&lt;p&gt;But that’s also the wrong comparison.&lt;/p&gt;

&lt;p&gt;The Real Story Here&lt;/p&gt;

&lt;p&gt;The important part isn’t that Gemma 4 beats massive cloud models.&lt;/p&gt;

&lt;p&gt;It doesn’t.&lt;/p&gt;

&lt;p&gt;The important part is this:&lt;/p&gt;

&lt;p&gt;For the first time, running capable AI locally feels practical without needing expensive hardware.&lt;/p&gt;

&lt;p&gt;That changes who gets access to AI development.&lt;/p&gt;

&lt;p&gt;Students.&lt;br&gt;
Indie hackers.&lt;br&gt;
Developers in low-resource environments.&lt;br&gt;
Privacy-focused teams.&lt;/p&gt;

&lt;p&gt;Small local models used to feel like demos.&lt;/p&gt;

&lt;p&gt;Gemma 4 is the first one I’ve used that feels like an actual tool.&lt;/p&gt;

&lt;p&gt;And honestly, I didn’t expect to say that.**&lt;/p&gt;

</description>
      <category>gemmachallenge</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
