<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Amit Raz</title>
    <description>The latest articles on DEV Community by Amit Raz (@amit_raz_4280cb3a49bb4086).</description>
    <link>https://dev.to/amit_raz_4280cb3a49bb4086</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3841271%2F0c668b0c-cb3d-4bc7-97a5-d9f2e90ae912.jpg</url>
      <title>DEV Community: Amit Raz</title>
      <link>https://dev.to/amit_raz_4280cb3a49bb4086</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/amit_raz_4280cb3a49bb4086"/>
    <language>en</language>
    <item>
      <title>I Ran Google's New Gemma 4 Models Locally (26B and 31B) — Here's What I Found</title>
      <dc:creator>Amit Raz</dc:creator>
      <pubDate>Mon, 06 Apr 2026 11:03:46 +0000</pubDate>
      <link>https://dev.to/amit_raz_4280cb3a49bb4086/i-ran-googles-new-gemma-4-models-locally-26b-and-31b-heres-what-i-found-4o2g</link>
      <guid>https://dev.to/amit_raz_4280cb3a49bb4086/i-ran-googles-new-gemma-4-models-locally-26b-and-31b-heres-what-i-found-4o2g</guid>
      <description>&lt;p&gt;Google dropped Gemma 4 a few days ago and I immediately wanted to know: can&lt;br&gt;
you actually run these things locally on consumer hardware? Not for a research&lt;br&gt;
project. For real use.&lt;/p&gt;

&lt;p&gt;I had two machines to test with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An i9 with 96GB RAM and an RTX 4090&lt;/li&gt;
&lt;li&gt;A 64-core / 128-thread AMD machine (CPU-only)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I ran the 26B and 31B variants. Here's what happened.&lt;/p&gt;


&lt;h2&gt;
  
  
  A quick note on the architecture
&lt;/h2&gt;

&lt;p&gt;Before the numbers, one thing worth knowing: these two models are&lt;br&gt;
architecturally different.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;26B is a Mixture-of-Experts (MoE)&lt;/strong&gt; model with 128 experts, but only&lt;br&gt;
~4B parameters are active at any given time. That's why it's fast and fits&lt;br&gt;
comfortably in VRAM despite the 26B label.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;31B is a dense model&lt;/strong&gt; — all 31 billion parameters are active on every&lt;br&gt;
token. That's why it hits the memory wall hard.&lt;/p&gt;

&lt;p&gt;This distinction explains everything you're about to see in the benchmarks.&lt;/p&gt;


&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;I used &lt;a href="https://ollama.com/" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; to pull and run both models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run gemma4:26b
ollama run gemma4:31b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both support a 256K context window and native function calling out of the box.&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmarks
&lt;/h2&gt;

&lt;p&gt;I ran a mix of prompts: simple factual questions, some reasoning tasks, and&lt;br&gt;
something heavier — a complex trading algorithm that uses AI-based prediction.&lt;br&gt;
I asked the models to explain the logic and suggest improvements.&lt;/p&gt;

&lt;p&gt;I also compared the outputs directly against Claude Code on the same prompts.&lt;/p&gt;

&lt;h3&gt;
  
  
  26B (MoE) on RTX 4090
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt eval rate&lt;/td&gt;
&lt;td&gt;15.56 tokens/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Eval duration&lt;/td&gt;
&lt;td&gt;~10.5s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generation rate&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;149.56 tokens/s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is fast. Like, actually fast. 149 tokens per second means you're not&lt;br&gt;
sitting and watching a cursor blink. It feels close to real-time. The MoE&lt;br&gt;
architecture earns its keep here — only 4B parameters are active, so the&lt;br&gt;
4090's 24GB VRAM handles it cleanly with room to spare.&lt;/p&gt;

&lt;h3&gt;
  
  
  31B (Dense) on RTX 4090
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt eval rate&lt;/td&gt;
&lt;td&gt;26.30 tokens/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Eval duration&lt;/td&gt;
&lt;td&gt;~3m 5s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generation rate&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.84 tokens/s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Big drop. Unlike the 26B, the dense 31B has to load all its parameters for&lt;br&gt;
every token. It doesn't fit cleanly into the 4090's VRAM and spills into&lt;br&gt;
system RAM — you feel every bit of it. For interactive use, this is painful.&lt;/p&gt;

&lt;p&gt;The screenshot below shows what that looks like in Task Manager: GPU Dedicated&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffgjbivos3ngqlmh937dv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffgjbivos3ngqlmh937dv.png" alt=" " width="680" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Memory is maxed out at ~45.9GB out of ~49GB available. The GPU Usage reads&lt;br&gt;
low (around 24%) not because there's no work, but because the GPU spends most&lt;br&gt;
of its time waiting on data coming from system RAM.&lt;/p&gt;

&lt;h3&gt;
  
  
  26B (MoE) on AMD 64-core / 128-thread (CPU only)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt eval rate&lt;/td&gt;
&lt;td&gt;45.33 tokens/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Eval duration&lt;/td&gt;
&lt;td&gt;~3m 20s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generation rate&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;8.80 tokens/s&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Slower for generation, but the prompt eval rate is actually higher than the&lt;br&gt;
4090 — all those cores load context fast. Generation at 8.80 tokens/s is slow&lt;br&gt;
for interactive chat, but more usable than you'd expect for background tasks.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quality
&lt;/h2&gt;

&lt;p&gt;All three runs handled the trading algorithm task well. The output was&lt;br&gt;
structured, accurate, and included reasonable improvement suggestions.&lt;/p&gt;

&lt;p&gt;I compared the responses directly against Claude Code on the same prompts.&lt;br&gt;
They were practically identical. Not "close enough" — genuinely hard to tell&lt;br&gt;
apart on this type of task.&lt;/p&gt;

&lt;p&gt;That surprised me. A model running locally on your own hardware, for free,&lt;br&gt;
producing output indistinguishable from a frontier cloud API on complex&lt;br&gt;
reasoning tasks.&lt;/p&gt;




&lt;h2&gt;
  
  
  The part that surprised me most — and it applies to every setup
&lt;/h2&gt;

&lt;p&gt;Here's the thing that changed how I think about local models, regardless of&lt;br&gt;
whether you're running on a GPU or CPU:&lt;/p&gt;

&lt;p&gt;A local model isn't subject to API limits. No token limits per minute, no cost&lt;br&gt;
per call, no rate limiting. If you're running agents that need to process large&lt;br&gt;
contexts, search through a codebase, analyze documents, or run long autonomous&lt;br&gt;
tasks — you can just let them run overnight. The agent works while you sleep.&lt;/p&gt;

&lt;p&gt;For agentic workflows specifically, this is a bigger deal than the raw token/s&lt;br&gt;
numbers suggest. A 8.80 tok/s model running uninterrupted for 8 hours&lt;br&gt;
processes a lot more work than a faster cloud model that hits rate limits every&lt;br&gt;
few minutes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Verdict
&lt;/h2&gt;

&lt;p&gt;The 26B MoE on a 4090 is the sweet spot right now. It fits cleanly in VRAM,&lt;br&gt;
generates at 149 tok/s, and produces quality that holds up against frontier&lt;br&gt;
models on reasoning tasks. For most local development and agentic use cases,&lt;br&gt;
you won't feel a meaningful gap.&lt;/p&gt;

&lt;p&gt;The 31B dense needs more VRAM than most people have. Unless you have a&lt;br&gt;
multi-GPU setup or an M-series Mac with 64GB+, the memory pressure kills the&lt;br&gt;
speed advantage you'd expect from the larger model.&lt;/p&gt;

&lt;p&gt;The CPU-only path is more viable than I expected for non-latency-sensitive&lt;br&gt;
work. If you have a powerful server without a GPU, the 26B MoE is genuinely&lt;br&gt;
runnable for batch tasks.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;In the next post I'll show how to connect Cursor, VS Code, and Claude Code to&lt;br&gt;
a locally running model like this. That's where it becomes practically useful&lt;br&gt;
for day-to-day development.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Amit Raz, a Software Architect specializing in AI and software&lt;br&gt;
development. I build tools and apps at &lt;a href="https://rzailabs.com" rel="noopener noreferrer"&gt;rzailabs.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>gemma</category>
    </item>
    <item>
      <title>Why Your Android Reminder App Is Silently Failing You (And How to Fix It)</title>
      <dc:creator>Amit Raz</dc:creator>
      <pubDate>Fri, 03 Apr 2026 14:29:54 +0000</pubDate>
      <link>https://dev.to/amit_raz_4280cb3a49bb4086/why-your-android-reminder-app-is-silently-failing-you-and-how-to-fix-it-2ff2</link>
      <guid>https://dev.to/amit_raz_4280cb3a49bb4086/why-your-android-reminder-app-is-silently-failing-you-and-how-to-fix-it-2ff2</guid>
      <description>&lt;p&gt;You set a reminder. You go about your day. The time passes. Nothing fires.&lt;/p&gt;

&lt;p&gt;You open the app and the task is sitting there, overdue, with no notification ever sent. Sound familiar?&lt;/p&gt;

&lt;p&gt;This isn't a bug in any one app. It's a systemic problem with how most reminder apps handle alarms on Android, and once you understand why it happens, you can't unsee it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Root Cause: Android's Battery Optimization
&lt;/h2&gt;

&lt;p&gt;Android has gotten increasingly aggressive about killing background processes to save battery. This is mostly good for users, but it creates a real problem for apps that need to wake up at a specific time and do something.&lt;/p&gt;

&lt;p&gt;The standard approach most apps use is &lt;code&gt;WorkManager&lt;/code&gt; or &lt;code&gt;Handler.postDelayed()&lt;/code&gt; or scheduled jobs. These are perfectly fine for non-time-critical background work. But for a reminder that needs to fire at exactly 9:00am, they're not reliable. Android can, and will, defer or skip them entirely when Doze mode is active or battery saver is on.&lt;/p&gt;

&lt;p&gt;Doze mode kicks in when the device is stationary and unplugged for a while. Battery saver can be triggered manually or automatically. On some manufacturers (Samsung, Xiaomi, OnePlus especially) there's additional proprietary battery optimization on top of Android's own system that makes this even worse.&lt;/p&gt;

&lt;p&gt;The result: your reminder app looks fine. It shows the task. It shows the time. But the alarm never fires.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Right API for Time-Critical Alarms
&lt;/h2&gt;

&lt;p&gt;Android has a specific API designed for exactly this use case: &lt;code&gt;AlarmManager&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;There are several methods, and the difference between them matters a lot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Inexact, deferrable. Android can batch and delay this.&lt;/span&gt;
&lt;span class="n"&gt;alarmManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;AlarmManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;RTC_WAKEUP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;triggerTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pendingIntent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Exact, but still deferrable during Doze mode.&lt;/span&gt;
&lt;span class="n"&gt;alarmManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setExact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;AlarmManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;RTC_WAKEUP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;triggerTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pendingIntent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Exact, fires even during Doze mode. This is what you want.&lt;/span&gt;
&lt;span class="n"&gt;alarmManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setExactAndAllowWhileIdle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;AlarmManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;RTC_WAKEUP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;triggerTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pendingIntent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Treated like a clock alarm by the system. Highest priority.&lt;/span&gt;
&lt;span class="n"&gt;alarmManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setAlarmClock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alarmClockInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pendingIntent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;setAlarmClock()&lt;/code&gt; is the one I ended up using in &lt;a href="https://rzailabs.com/projects/sticky-tasks" rel="noopener noreferrer"&gt;Sticky Tasks&lt;/a&gt;. It shows a clock icon in the status bar (which is actually useful UX, users can see their next alarm is set) and Android won't suppress it. It's the same mechanism the built-in clock app uses.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;setExactAndAllowWhileIdle()&lt;/code&gt; is a solid alternative if you don't want the status bar indicator. Both will fire reliably during Doze mode and with battery saver active.&lt;/p&gt;

&lt;h2&gt;
  
  
  Permissions You Need to Declare
&lt;/h2&gt;

&lt;p&gt;Starting from Android 12 (API 31), you need to explicitly request the &lt;code&gt;SCHEDULE_EXACT_ALARM&lt;/code&gt; permission:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;uses-permission&lt;/span&gt; &lt;span class="na"&gt;android:name=&lt;/span&gt;&lt;span class="s"&gt;"android.permission.SCHEDULE_EXACT_ALARM"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From Android 13 (API 33), you should also handle the case where this permission is revoked by the user. Check it before scheduling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alarmManager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;canScheduleExactAlarms&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// schedule the alarm&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// direct user to settings to grant permission&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;intent&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Intent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ACTION_REQUEST_SCHEDULE_EXACT_ALARM&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startActivity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For full-screen notifications (the kind that show even when the screen is off), you also need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;uses-permission&lt;/span&gt; &lt;span class="na"&gt;android:name=&lt;/span&gt;&lt;span class="s"&gt;"android.permission.USE_FULL_SCREEN_INTENT"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since Android 14, you need to request this at runtime, not just declare it in the manifest.&lt;/p&gt;

&lt;h2&gt;
  
  
  Surviving Reboots
&lt;/h2&gt;

&lt;p&gt;Here's one that catches a lot of developers off guard. &lt;code&gt;AlarmManager&lt;/code&gt; alarms don't survive a device restart. The moment the phone reboots, all your scheduled alarms are gone.&lt;/p&gt;

&lt;p&gt;The fix is a &lt;code&gt;BroadcastReceiver&lt;/code&gt; that listens for &lt;code&gt;BOOT_COMPLETED&lt;/code&gt; and re-registers all pending alarms:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BootReceiver&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;BroadcastReceiver&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;onReceive&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Intent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="p"&gt;==&lt;/span&gt; &lt;span class="nc"&gt;Intent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ACTION_BOOT_COMPLETED&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// fetch all pending tasks from your database&lt;/span&gt;
            &lt;span class="c1"&gt;// re-schedule their alarms&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register it in your manifest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;receiver&lt;/span&gt; &lt;span class="na"&gt;android:name=&lt;/span&gt;&lt;span class="s"&gt;".BootReceiver"&lt;/span&gt; &lt;span class="na"&gt;android:exported=&lt;/span&gt;&lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;intent-filter&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;action&lt;/span&gt; &lt;span class="na"&gt;android:name=&lt;/span&gt;&lt;span class="s"&gt;"android.intent.action.BOOT_COMPLETED"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/intent-filter&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/receiver&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And add the permission:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;uses-permission&lt;/span&gt; &lt;span class="na"&gt;android:name=&lt;/span&gt;&lt;span class="s"&gt;"android.permission.RECEIVE_BOOT_COMPLETED"&lt;/span&gt; &lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, any user who restarts their phone loses all their scheduled reminders silently. They'll never know why.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing This Properly
&lt;/h2&gt;

&lt;p&gt;Manual testing is painful here. You can't just wait for alarms to fire. A few adb commands that help:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Force device into Doze mode immediately&lt;/span&gt;
adb shell dumpsys deviceidle force-idle

&lt;span class="c"&gt;# Check current Doze state&lt;/span&gt;
adb shell dumpsys deviceidle

&lt;span class="c"&gt;# Simulate battery saver on&lt;/span&gt;
adb shell settings put global low_power 1

&lt;span class="c"&gt;# Turn it off&lt;/span&gt;
adb shell settings put global low_power 0

&lt;span class="c"&gt;# Step through Doze states manually&lt;/span&gt;
adb shell dumpsys deviceidle step
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test your alarm fires correctly in each of these states before shipping. It's tedious but worth it. This is exactly the kind of thing that fails silently in production and you'll never see it in your crash logs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Looks Like in Practice
&lt;/h2&gt;

&lt;p&gt;I ran into all of this while building &lt;a href="https://rzailabs.com/projects/sticky-tasks" rel="noopener noreferrer"&gt;Sticky Tasks&lt;/a&gt;, a reminder app I built because I kept missing notifications from other apps. Once I switched to &lt;code&gt;setAlarmClock()&lt;/code&gt; and added the boot receiver, the reliability difference was immediate.&lt;/p&gt;

&lt;p&gt;Alarms fire with battery saver on. They fire after restarts. They fire on Samsung devices with aggressive battery optimization enabled.&lt;/p&gt;

&lt;p&gt;It's not magic. It's just using the right API for the job.&lt;/p&gt;

&lt;p&gt;If you're building anything time-critical on Android, whether it's reminders, medication alerts, scheduled notifications, or anything that has to fire at a specific moment, this is the approach. The standard background job APIs aren't designed for this and they'll let you down.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Amit Raz, a Software Architect and AI consultant based in Israel. I build Android apps under the RZApps brand and write about what I learn along the way. More at &lt;a href="https://rzailabs.com" rel="noopener noreferrer"&gt;rzailabs.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Why I Started Watching My Claude Code Context Window (And Built Something to Track It)</title>
      <dc:creator>Amit Raz</dc:creator>
      <pubDate>Thu, 02 Apr 2026 11:01:12 +0000</pubDate>
      <link>https://dev.to/amit_raz_4280cb3a49bb4086/why-i-started-watching-my-claude-code-context-window-and-built-something-to-track-it-22o2</link>
      <guid>https://dev.to/amit_raz_4280cb3a49bb4086/why-i-started-watching-my-claude-code-context-window-and-built-something-to-track-it-22o2</guid>
      <description>&lt;p&gt;If you're using Claude Code heavily and not paying attention to your context window, you're probably paying more than you need to. Here's why it matters and what I changed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing most people don't realize
&lt;/h2&gt;

&lt;p&gt;Every time you send a message in Claude Code, the entire conversation history gets sent with it. Not just your new question. Everything. Every file you pasted, every response Claude gave, every back-and-forth since you opened the session.&lt;/p&gt;

&lt;p&gt;This means cost doesn't scale with message length. It scales with accumulated context.&lt;/p&gt;

&lt;p&gt;If your context window is at 70% and you ask something simple like "can you rename this variable?", you're paying for the full 70% of history sitting behind that tiny question. The question itself is almost irrelevant to the token count.&lt;/p&gt;

&lt;p&gt;Once this clicked for me, I couldn't unsee it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually drives your token costs
&lt;/h2&gt;

&lt;p&gt;Let's make this concrete.&lt;/p&gt;

&lt;p&gt;Say you've been in a Claude Code session for two hours. You've pasted several files, iterated on a feature, debugged a few things. Your context is sitting at 65%. Now you ask a quick follow-up question.&lt;/p&gt;

&lt;p&gt;That API call includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All the files you pasted earlier&lt;/li&gt;
&lt;li&gt;Every response Claude gave&lt;/li&gt;
&lt;li&gt;All your messages&lt;/li&gt;
&lt;li&gt;Your new question (tiny, almost irrelevant to the total)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The new question might be 20 tokens. The history behind it could be 40,000. That's what you're paying for.&lt;/p&gt;

&lt;p&gt;This is by design, not a bug. The model needs the history to maintain coherence. But it means your costs compound as a session grows, and most people don't notice because there's no obvious signal telling them to pay attention.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix is simple but you have to be deliberate about it
&lt;/h2&gt;

&lt;p&gt;When a session gets long, especially before starting a new feature or a significant refactor, I now do this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open a fresh session&lt;/li&gt;
&lt;li&gt;Write a short handoff note: what we built, current state of the code, what I need next&lt;/li&gt;
&lt;li&gt;Paste only the files relevant to the next task&lt;/li&gt;
&lt;li&gt;Continue from there&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it. The handoff takes maybe two minutes. In exchange, I'm starting the next task with a lean context instead of dragging tens of thousands of tokens of history into every subsequent query.&lt;/p&gt;

&lt;p&gt;The responses often get sharper too. A packed context window can cause the model to lose focus on earlier content. Starting fresh with a tight, relevant context tends to produce more focused answers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I built a custom status bar
&lt;/h2&gt;

&lt;p&gt;The problem is that none of this is visible by default in Claude Code. You're flying blind. There's no indicator telling you how full your context is, how much of your 5-hour session budget you've used, or how much of your 7-day limit remains.&lt;/p&gt;

&lt;p&gt;So I built a custom status bar that shows all three in real time.&lt;/p&gt;

&lt;p&gt;It sits in the terminal and updates as I work. When I see the context creeping up, it's a clear signal: finish this thread, write the handoff, open a new session.&lt;/p&gt;

&lt;p&gt;Before I built it, I had no idea how fast context accumulates during a real coding session. Seeing the number climb in real time changes how you work.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(I shared how I built it in a previous post. Link in the comments.)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The mental model shift
&lt;/h2&gt;

&lt;p&gt;Think of the context window status like a fuel gauge, not a progress bar.&lt;/p&gt;

&lt;p&gt;A progress bar tells you how far you've come. A fuel gauge tells you when to stop and refuel before you run out. The context window is the latter. Watching it helps you make an active decision: keep going, or reset and start lean.&lt;/p&gt;

&lt;p&gt;Most developers I've talked to treat Claude Code sessions like a continuous conversation that they just let run. That works fine for short tasks. For longer sessions, it's quietly expensive.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Every Claude Code query sends the full conversation history&lt;/li&gt;
&lt;li&gt;Cost scales with accumulated context, not message length&lt;/li&gt;
&lt;li&gt;Heavy context also affects response quality&lt;/li&gt;
&lt;li&gt;The fix: start fresh sessions before big new tasks, with a short handoff summary&lt;/li&gt;
&lt;li&gt;Make context visible so you know when to reset&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're using Claude Code for serious development work, the context window is worth paying attention to. It's not just a technical detail. It's directly tied to what you're spending.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Amit Raz is a Software Architect and AI consultant based in Israel. I build AI-powered products and write about developer tools, Android development, and AI workflows at &lt;a href="https://rzailabs.com" rel="noopener noreferrer"&gt;rzailabs.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>productivity</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Stop Using Claude Code for Everything: How I Cut My Token Usage by Being Smarter About Which AI Does What</title>
      <dc:creator>Amit Raz</dc:creator>
      <pubDate>Mon, 30 Mar 2026 07:57:02 +0000</pubDate>
      <link>https://dev.to/amit_raz_4280cb3a49bb4086/stop-using-claude-code-for-everything-how-i-cut-my-token-usage-by-being-smarter-about-which-ai-1dm5</link>
      <guid>https://dev.to/amit_raz_4280cb3a49bb4086/stop-using-claude-code-for-everything-how-i-cut-my-token-usage-by-being-smarter-about-which-ai-1dm5</guid>
      <description>&lt;p&gt;Claude Code is genuinely impressive. It can navigate a large codebase, plan multi-step changes, write and run tests, and iterate on its own output. But it's also expensive to run, and if you're using it the way I was at first, you're probably burning a lot of tokens on tasks that don't need it.&lt;/p&gt;

&lt;p&gt;Here's the workflow shift that made the biggest difference for me.&lt;br&gt;
The problem with using one tool for everything&lt;br&gt;
When Claude Code is open in your terminal, it's tempting to just throw everything at it. Rename this variable. Write a quick helper function. Add a comment here. Fix this typo.&lt;/p&gt;

&lt;p&gt;The thing is, Claude Code sends your full conversation context with every request. That means even a tiny ask like "rename this variable" is consuming tokens proportional to how long your session has been running. It adds up fast.&lt;/p&gt;

&lt;p&gt;Meanwhile, you probably already have a perfectly capable model sitting right there in your editor that's much cheaper to run for simple tasks.&lt;br&gt;
The setup: run Claude Code inside your editor&lt;br&gt;
Before getting into the workflow, there's a small change worth making. If you've been running Claude Code in an external terminal while VSCode or Cursor is open on the same folder, move it into the integrated terminal.&lt;br&gt;
You don't need to configure anything. Just open the terminal panel inside the editor and run Claude Code from there.&lt;/p&gt;

&lt;p&gt;What you gain from this is access to the full editor context while Claude is working. The git diff panel shows you exactly what's being changed in real time. You can stage or revert specific hunks without switching windows. And when something breaks, the debugger is right there. You can set breakpoints, inspect variables, and feed that information back to Claude without ever leaving the editor.&lt;br&gt;
It also makes the two-model workflow I'm about to describe much easier to actually use in practice.&lt;/p&gt;

&lt;p&gt;The workflow: right tool for the right task&lt;br&gt;
Once Claude Code is running inside your editor, you have two AI tools in the same window:&lt;br&gt;
Claude Code in the terminal, for tasks that need real reasoning. Architecting a new feature. Debugging something complex. Refactoring across multiple files. Anything where you need the model to actually think through a problem.&lt;/p&gt;

&lt;p&gt;Cursor composer or GitHub Copilot (depending on your editor), for everything else. Renaming things. Writing a simple util. Adding a docblock. Generating a test for a function that's already clearly defined. These are pattern-matching tasks, and a smaller model handles them just fine.&lt;/p&gt;

&lt;p&gt;The practical rule I use: if I can describe the task in one sentence and I already know roughly what the output should look like, it goes to the composer or Copilot. If I'm not sure how to solve it, or it touches more than one or two files, it goes to Claude Code.&lt;br&gt;
Why this actually matters&lt;/p&gt;

&lt;p&gt;Claude Code's billing is based on tokens, and context accumulates over a session. The longer you've been working, the more expensive each request gets, even simple ones. By handling small tasks in the composer instead, you're keeping Claude Code sessions shorter and more focused, which means less context bloat and lower costs overall.&lt;br&gt;
There's also a quality argument here. Claude Code does its best work on hard problems. If you're constantly interrupting it with trivial requests, you're not getting the most out of it.&lt;br&gt;
What this looks like in practice&lt;br&gt;
A typical session for me now looks something like this:&lt;br&gt;
I open Cursor with Claude Code running in the integrated terminal. I'm building a new feature. I ask Claude Code to plan the implementation and start on the core logic. While it's working I can watch the diffs in the git panel. If it breaks something I use the debugger to understand what happened and give Claude Code the specific info it needs.&lt;br&gt;
For smaller things that come up along the way, like tweaking a component or writing a quick helper, I switch to the composer. It's faster, it's cheaper, and it doesn't interrupt the Claude Code session or bloat the context.&lt;br&gt;
When I'm done with the feature, I start a fresh Claude Code session for the next task rather than letting the context grow indefinitely.&lt;br&gt;
The short version&lt;br&gt;
Claude Code is a powerful tool but it's not free, and not every task needs it. Running it inside your editor instead of an external terminal makes it easier to use it alongside your editor's built-in AI for simpler tasks. Keep Claude Code for the hard stuff. Let the lighter models handle the rest.&lt;br&gt;
It's a small workflow change but it makes a real difference over a full day of coding.&lt;/p&gt;

&lt;p&gt;I'm Amit Raz, a Software Architect specializing in AI and software development based in Haifa, Israel. I build AI-powered products and help businesses integrate AI into their workflows. More at rzailabs.com.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>productivity</category>
      <category>tooling</category>
    </item>
    <item>
      <title>I Made a Command That Documents My Entire Repo Every Time I Take a Break</title>
      <dc:creator>Amit Raz</dc:creator>
      <pubDate>Tue, 24 Mar 2026 07:44:11 +0000</pubDate>
      <link>https://dev.to/amit_raz_4280cb3a49bb4086/i-made-a-command-that-documents-my-entire-repo-every-time-i-take-a-break-44nj</link>
      <guid>https://dev.to/amit_raz_4280cb3a49bb4086/i-made-a-command-that-documents-my-entire-repo-every-time-i-take-a-break-44nj</guid>
      <description>&lt;p&gt;I work with AI coding agents every day. Cursor, Claude Code, sometimes both in the same project. And the thing that used to slow me down the most wasn't writing code. It was re-orienting the agent at the start of every session.&lt;br&gt;
"Here's the folder structure." "We use Provider for state." "Don't put new screens in the root of lib." "Android alarm testing needs a real device."&lt;br&gt;
Over and over. Every session.&lt;br&gt;
So I built a fix. I call it /document-project, and the prompt file is here if you want to skip straight to it: &lt;a href="https://gist.github.com/razamit/b28d7d8b0acaf995969673df47333d58" rel="noopener noreferrer"&gt;https://gist.github.com/razamit/b28d7d8b0acaf995969673df47333d58&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What it actually is&lt;br&gt;
It's a markdown prompt file I keep in my projects. When I type the command in Cursor or Claude Code, the agent reads the file, walks the entire repo, and produces or updates two things:&lt;/p&gt;

&lt;p&gt;AGENTS.md at the root — a machine-oriented map with build commands, tech stack, layout, conventions, and known footguns&lt;br&gt;
Folder-level README.md files — short, only where they add navigation value&lt;/p&gt;

&lt;p&gt;The file tells the agent exactly what to write, what to skip, and how to format it. It prioritizes accurate and short over comprehensive and stale.&lt;/p&gt;

&lt;p&gt;A real example&lt;br&gt;
Here's what AGENTS.md looks like after running on my StickyTasks Flutter app:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## One-line purpose&lt;/span&gt;
Cross-platform Flutter task app with sticky reminders, local
notifications, recurring tasks, optional daily recap, ads + Pro IAP.

&lt;span class="gu"&gt;## Tech stack&lt;/span&gt;
| Area  | Stack                                    |
|-------|------------------------------------------|
| App   | Flutter, Dart SDK ^3.9.0                 |
| State | provider                                 |
| DB    | Hive + hive_flutter                      |
| Notif | flutter_local_notifications, timezone    |

&lt;span class="gu"&gt;## Layout map&lt;/span&gt;
| Path              | Role                                          |
|-------------------|-----------------------------------------------|
| lib/              | screens, widgets, viewmodels, controllers...  |
| android/          | Kotlin native, alarms, manifest               |
| functions/        | Firebase callable sendFeedback                |

&lt;span class="gu"&gt;## Known footguns&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Android alarms: test on real device, not emulator
&lt;span class="p"&gt;-&lt;/span&gt; Hive model changes: run build_runner after
&lt;span class="p"&gt;-&lt;/span&gt; Firebase feedback: needs RESEND_API_KEY configured
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No walls of text. No dependency dumps. Just what an agent needs to orient fast.&lt;/p&gt;

&lt;p&gt;The key design decisions in the prompt&lt;/p&gt;

&lt;p&gt;AGENTS.md comes first. Many tools treat it as the machine-oriented entry point.&lt;br&gt;
Folder READMEs only where they pay off — top-level areas, confusing names, shared entry points. Not everywhere.&lt;br&gt;
Don't paste dependency lists or full directory trees. Summarize and point to files.&lt;br&gt;
Update, don't append. If structure changed, remove outdated sections instead of adding contradictions.&lt;/p&gt;

&lt;p&gt;How to set it up&lt;br&gt;
Download the file from the Gist and drop it in your project.&lt;br&gt;
Cursor: place it in .cursor/ named document-project.md&lt;br&gt;
Claude Code: place it in .claude/commands/ named document-project.md&lt;br&gt;
Then just type /document-project in the chat. That's it.&lt;/p&gt;

&lt;p&gt;When I run it&lt;br&gt;
I run it on a break. Literally: type the command, go make coffee, come back and the whole repo is documented and up to date.&lt;br&gt;
The next session, the agent starts oriented. It puts new features in the right layer. It knows what build_runner is for. It doesn't ask me to re-explain the architecture.&lt;/p&gt;

&lt;p&gt;The bigger principle&lt;br&gt;
AI agents don't have memory. That's not going to change soon. So the question is: how do you give them the context they need without spending 10 minutes typing it every single time?&lt;br&gt;
Good documentation written for machines, not humans, is the answer. Short, scannable, honest about footguns. AGENTS.md is that format.&lt;br&gt;
If you're using Cursor or Claude Code on any project longer than a weekend, this is worth setting up.&lt;br&gt;
Prompt file: &lt;a href="https://gist.github.com/razamit/b28d7d8b0acaf995969673df47333d58" rel="noopener noreferrer"&gt;https://gist.github.com/razamit/b28d7d8b0acaf995969673df47333d58&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm Amit Raz, a Software Architect and AI consultant. I build AI-powered products and help teams integrate AI into their workflows. Check out my work at rzailabs.com&lt;/p&gt;

</description>
      <category>cursor</category>
      <category>claudecode</category>
      <category>productivity</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
