<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: quietpulse</title>
    <description>The latest articles on DEV Community by quietpulse (@quietpulse-social).</description>
    <link>https://dev.to/quietpulse-social</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3836119%2F963f59b9-8b4f-47a2-8cb0-bc3f8fa58c88.png</url>
      <title>DEV Community: quietpulse</title>
      <link>https://dev.to/quietpulse-social</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/quietpulse-social"/>
    <language>en</language>
    <item>
      <title>Rails Scheduled Job Monitoring: How to Catch Missed Jobs Before They Break Production</title>
      <dc:creator>quietpulse</dc:creator>
      <pubDate>Mon, 11 May 2026 06:13:47 +0000</pubDate>
      <link>https://dev.to/quietpulse-social/rails-scheduled-job-monitoring-how-to-catch-missed-jobs-before-they-break-production-1k46</link>
      <guid>https://dev.to/quietpulse-social/rails-scheduled-job-monitoring-how-to-catch-missed-jobs-before-they-break-production-1k46</guid>
      <description>&lt;p&gt;Rails scheduled job monitoring is easy to forget because scheduled work usually lives in the background. Your web app is up, requests are fine, the database is responding, and dashboards look green. Meanwhile, a nightly billing sync, cleanup task, email digest, or data import may have stopped running three days ago.&lt;/p&gt;

&lt;p&gt;That is the dangerous part: scheduled jobs often fail quietly.&lt;/p&gt;

&lt;p&gt;A Rails app can look completely healthy while important recurring work is missing. Users may not notice right away. You may not notice right away. Then suddenly invoices are wrong, trial expirations did not happen, reports are stale, or a queue is full of old data.&lt;/p&gt;

&lt;p&gt;This guide covers how Rails scheduled jobs fail, why logs are not enough, and how to use heartbeat monitoring to catch missed executions before they become production incidents.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Rails apps often rely on scheduled background work for things that are not directly tied to a web request.&lt;/p&gt;

&lt;p&gt;Common examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sending daily or weekly email digests&lt;/li&gt;
&lt;li&gt;charging subscriptions&lt;/li&gt;
&lt;li&gt;syncing data from third-party APIs&lt;/li&gt;
&lt;li&gt;expiring trials or temporary records&lt;/li&gt;
&lt;li&gt;cleaning old sessions, uploads, or audit logs&lt;/li&gt;
&lt;li&gt;generating reports&lt;/li&gt;
&lt;li&gt;enqueueing recurring jobs&lt;/li&gt;
&lt;li&gt;refreshing cached data&lt;/li&gt;
&lt;li&gt;retrying failed external operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These tasks may be implemented with different tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;plain cron&lt;/li&gt;
&lt;li&gt;&lt;code&gt;whenever&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sidekiq-cron&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sidekiq-scheduler&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;good_job&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;solid_queue&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;delayed_job&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Heroku Scheduler&lt;/li&gt;
&lt;li&gt;Kubernetes CronJobs&lt;/li&gt;
&lt;li&gt;systemd timers&lt;/li&gt;
&lt;li&gt;custom Rake tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The implementation changes, but the monitoring problem stays the same.&lt;/p&gt;

&lt;p&gt;A scheduled job can fail in several ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it never starts&lt;/li&gt;
&lt;li&gt;it starts but crashes&lt;/li&gt;
&lt;li&gt;it hangs forever&lt;/li&gt;
&lt;li&gt;it runs on the wrong schedule&lt;/li&gt;
&lt;li&gt;it runs on one environment but not another&lt;/li&gt;
&lt;li&gt;it queues work but workers are down&lt;/li&gt;
&lt;li&gt;it silently skips important records&lt;/li&gt;
&lt;li&gt;it completes locally but fails in production&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most frustrating failure mode is the missing run. Nothing explodes. No exception is raised. No user request fails. The scheduled job simply does not happen.&lt;/p&gt;

&lt;p&gt;That is exactly the kind of issue normal Rails monitoring often misses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it happens
&lt;/h2&gt;

&lt;p&gt;Rails scheduled job failures usually come from small operational details rather than dramatic bugs.&lt;/p&gt;

&lt;p&gt;One common cause is a broken cron environment. Cron does not load the same shell profile as your interactive terminal. Environment variables may be missing. Ruby, Bundler, or Rails paths may be different. A command that works perfectly over SSH may fail when cron runs it.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bundle &lt;span class="nb"&gt;exec &lt;/span&gt;rails runner &lt;span class="s2"&gt;"Billing::SyncJob.perform_now"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;might work in your shell, while cron fails because it cannot find &lt;code&gt;bundle&lt;/code&gt;, does not have &lt;code&gt;RAILS_ENV=production&lt;/code&gt;, or runs from the wrong directory.&lt;/p&gt;

&lt;p&gt;Another common issue is deployment drift. A scheduled task may be configured on an old server, a staging box, or a container that no longer exists. After an infrastructure migration, the app is still online, but the scheduler was never recreated.&lt;/p&gt;

&lt;p&gt;Queue-backed scheduling adds another layer. With Sidekiq, GoodJob, Solid Queue, or Delayed Job, there are two separate things to monitor:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Did the scheduler enqueue the job?&lt;/li&gt;
&lt;li&gt;Did a worker actually execute it?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the scheduler runs but workers are stopped, jobs pile up. If workers run but the scheduler is broken, nothing gets enqueued. Looking at only one side can give you a false sense of safety.&lt;/p&gt;

&lt;p&gt;Rails deployments also make scheduled work easy to accidentally duplicate or disable. You may have multiple app servers, multiple containers, or multiple release directories. If every instance runs the scheduler, the job may execute many times. If none of them run it, the job disappears completely.&lt;/p&gt;

&lt;p&gt;There are also application-level causes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;feature flags disable part of the job&lt;/li&gt;
&lt;li&gt;a database query becomes too slow and times out&lt;/li&gt;
&lt;li&gt;an API token expires&lt;/li&gt;
&lt;li&gt;a lock never releases&lt;/li&gt;
&lt;li&gt;a migration changes a column the job depends on&lt;/li&gt;
&lt;li&gt;the job rescues exceptions too broadly&lt;/li&gt;
&lt;li&gt;a retry loop hides the real failure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In all of these cases, the Rails app can still serve web traffic normally.&lt;/p&gt;

&lt;p&gt;That is why Rails scheduled job monitoring needs to focus on the scheduled work itself, not just the app process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it's dangerous
&lt;/h2&gt;

&lt;p&gt;Silent scheduled job failures can be expensive because they often affect delayed, accumulated, or business-critical work.&lt;/p&gt;

&lt;p&gt;If a cleanup job stops running, the impact may start small. A few old records remain. Disk usage grows a little. Queries become slightly slower. Then, weeks later, storage fills up or a table becomes painfully large.&lt;/p&gt;

&lt;p&gt;If a billing job stops running, the damage is more direct. Customers may not be charged, invoices may not be sent, subscription states may drift, or payment retries may never happen.&lt;/p&gt;

&lt;p&gt;If a sync job stops running, your app may show stale data. Users may make decisions based on old information. Support tickets appear, but the root cause is not obvious.&lt;/p&gt;

&lt;p&gt;If an email digest job stops running, engagement drops quietly. Nobody gets paged. The app is up. But an important product loop is broken.&lt;/p&gt;

&lt;p&gt;The same pattern appears across many Rails systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;failed nightly reports&lt;/li&gt;
&lt;li&gt;missed customer notifications&lt;/li&gt;
&lt;li&gt;stuck import pipelines&lt;/li&gt;
&lt;li&gt;stale search indexes&lt;/li&gt;
&lt;li&gt;broken cache refreshes&lt;/li&gt;
&lt;li&gt;abandoned trial expiration tasks&lt;/li&gt;
&lt;li&gt;missed webhook retry jobs&lt;/li&gt;
&lt;li&gt;incomplete analytics rollups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditional monitoring often does not catch these failures.&lt;/p&gt;

&lt;p&gt;Uptime checks only confirm that an HTTP endpoint responds. Error tracking catches exceptions only if the job raises and reports them. Logs help only if someone searches them or has log-based alerts configured correctly. Queue dashboards show queue state, but not always whether a recurring job was expected and missed.&lt;/p&gt;

&lt;p&gt;The dangerous question is not just “did something fail?”&lt;/p&gt;

&lt;p&gt;It is also:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Did the job run when it was supposed to?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the core question Rails scheduled job monitoring should answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to detect it
&lt;/h2&gt;

&lt;p&gt;The simplest reliable pattern is heartbeat monitoring.&lt;/p&gt;

&lt;p&gt;A heartbeat is a small signal sent by your scheduled job when it runs successfully. An external monitor expects that signal on a schedule. If the signal does not arrive within the expected time window, it alerts you.&lt;/p&gt;

&lt;p&gt;Instead of only watching for errors, you watch for proof of success.&lt;/p&gt;

&lt;p&gt;For example, if a Rails job should run every night at 02:00, the monitor expects one successful ping every 24 hours. If no ping arrives by 02:30, something is wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cron did not run&lt;/li&gt;
&lt;li&gt;the scheduler is misconfigured&lt;/li&gt;
&lt;li&gt;the Rails command crashed&lt;/li&gt;
&lt;li&gt;the job hung before completion&lt;/li&gt;
&lt;li&gt;the worker never processed it&lt;/li&gt;
&lt;li&gt;the server was down&lt;/li&gt;
&lt;li&gt;the deploy broke the task&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The monitor does not need to know which failure happened first. It knows the important outcome: the scheduled job did not complete successfully on time.&lt;/p&gt;

&lt;p&gt;That is the key advantage.&lt;/p&gt;

&lt;p&gt;For Rails apps, a heartbeat should usually be sent at the end of the job, after the important work is complete. This avoids false success signals.&lt;/p&gt;

&lt;p&gt;Bad pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;NightlyBillingJob&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationJob&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;perform&lt;/span&gt;
    &lt;span class="n"&gt;ping_monitor&lt;/span&gt;
    &lt;span class="no"&gt;Billing&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;RunNightlySync&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If billing fails after the ping, the monitor still sees success.&lt;/p&gt;

&lt;p&gt;Better pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;NightlyBillingJob&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationJob&lt;/span&gt;
  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;perform&lt;/span&gt;
    &lt;span class="no"&gt;Billing&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;RunNightlySync&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;
    &lt;span class="n"&gt;ping_monitor&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the heartbeat means the job actually reached the end.&lt;/p&gt;

&lt;p&gt;For jobs with multiple critical steps, you can ping only after all required steps finish. If the job partially completes and then fails, the missing heartbeat tells you something needs attention.&lt;/p&gt;

&lt;p&gt;This is different from logging. Logs describe what happened inside your system. Heartbeats prove that an expected scheduled outcome happened from the outside.&lt;/p&gt;

&lt;p&gt;A good Rails scheduled job monitoring setup usually tracks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;expected frequency&lt;/li&gt;
&lt;li&gt;grace period&lt;/li&gt;
&lt;li&gt;last successful run&lt;/li&gt;
&lt;li&gt;missed runs&lt;/li&gt;
&lt;li&gt;alert channel&lt;/li&gt;
&lt;li&gt;job identity&lt;/li&gt;
&lt;li&gt;production environment only&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The grace period matters. If a job runs every hour, you may allow 10 or 15 extra minutes before alerting. If a nightly job usually takes 20 minutes, do not alert after 2 minutes. Monitor the real expected completion window.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple solution (with example)
&lt;/h2&gt;

&lt;p&gt;Here is a simple Rails example using a heartbeat ping at the end of a scheduled job.&lt;/p&gt;

&lt;p&gt;Imagine you have a job that runs every night and syncs subscription states:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/jobs/nightly_subscription_sync_job.rb&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;NightlySubscriptionSyncJob&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;ApplicationJob&lt;/span&gt;
  &lt;span class="n"&gt;queue_as&lt;/span&gt; &lt;span class="ss"&gt;:default&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;perform&lt;/span&gt;
    &lt;span class="no"&gt;SubscriptionSync&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run!&lt;/span&gt;
    &lt;span class="n"&gt;ping_monitor&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;

  &lt;span class="kp"&gt;private&lt;/span&gt;

  &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ping_monitor&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;unless&lt;/span&gt; &lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;production?&lt;/span&gt;

    &lt;span class="n"&gt;uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;URI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"https://quietpulse.xyz/ping/YOUR_TOKEN"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="no"&gt;Net&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;HTTP&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;host&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;port&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;use_ssl: &lt;/span&gt;&lt;span class="kp"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;read_timeout: &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
      &lt;span class="n"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;Net&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;HTTP&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;Get&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;StandardError&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;
    &lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Heartbeat ping failed: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;class&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;message&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the ping happens after &lt;code&gt;SubscriptionSync.run!&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;it only runs in production&lt;/li&gt;
&lt;li&gt;it has a short timeout&lt;/li&gt;
&lt;li&gt;ping failure is logged but does not break the job&lt;/li&gt;
&lt;li&gt;the URL uses a simple success ping endpoint&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can schedule this job with whichever tool your Rails app already uses.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;sidekiq-cron&lt;/code&gt;, the schedule might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;nightly_subscription_sync&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;cron&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*"&lt;/span&gt;
  &lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NightlySubscriptionSyncJob"&lt;/span&gt;
  &lt;span class="na"&gt;queue&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With &lt;code&gt;whenever&lt;/code&gt;, you may schedule a Rails runner or Rake task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# config/schedule.rb&lt;/span&gt;
&lt;span class="n"&gt;set&lt;/span&gt; &lt;span class="ss"&gt;:environment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"production"&lt;/span&gt;

&lt;span class="n"&gt;every&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;day&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="ss"&gt;at: &lt;/span&gt;&lt;span class="s2"&gt;"2:00 am"&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;runner&lt;/span&gt; &lt;span class="s2"&gt;"NightlySubscriptionSyncJob.perform_later"&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With a Rake task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="c1"&gt;# lib/tasks/subscriptions.rake&lt;/span&gt;
&lt;span class="n"&gt;namespace&lt;/span&gt; &lt;span class="ss"&gt;:subscriptions&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="n"&gt;desc&lt;/span&gt; &lt;span class="s2"&gt;"Run nightly subscription sync"&lt;/span&gt;
  &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="ss"&gt;nightly_sync: :environment&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="no"&gt;SubscriptionSync&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run!&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="no"&gt;Rails&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;production?&lt;/span&gt;
      &lt;span class="n"&gt;uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="no"&gt;URI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"https://quietpulse.xyz/ping/YOUR_TOKEN"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="no"&gt;Net&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;HTTP&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then cron could run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; /var/www/myapp/current &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;RAILS_ENV&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;production bundle &lt;span class="nb"&gt;exec &lt;/span&gt;rake subscriptions:nightly_sync
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A more defensive shell version can make sure the heartbeat only fires after the Rails task succeeds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; /var/www/myapp/current &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;span class="nv"&gt;RAILS_ENV&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;production bundle &lt;span class="nb"&gt;exec &lt;/span&gt;rake subscriptions:nightly_sync &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsS&lt;/span&gt; &lt;span class="nt"&gt;--max-time&lt;/span&gt; 10 https://quietpulse.xyz/ping/YOUR_TOKEN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; matters. It means the ping only runs if the previous command exits successfully.&lt;/p&gt;

&lt;p&gt;If you use a heartbeat monitoring tool like QuietPulse, you create a check with the expected interval, add the generated ping URL to the end of your job, and receive an alert if the job misses its window. You can build something similar yourself, but using a small external monitor is usually simpler and more reliable than having the app monitor its own missing work.&lt;/p&gt;

&lt;p&gt;The main idea is not tool-specific: every important scheduled job should produce an external success signal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Pinging at the start of the job
&lt;/h3&gt;

&lt;p&gt;This is the most common mistake.&lt;/p&gt;

&lt;p&gt;If you ping at the start, you only prove that the job began. You do not prove that it completed.&lt;/p&gt;

&lt;p&gt;For short, simple jobs, that may feel good enough. But for billing, syncs, reports, imports, and cleanup tasks, completion matters much more than startup.&lt;/p&gt;

&lt;p&gt;Ping after the critical work finishes.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Monitoring only the queue
&lt;/h3&gt;

&lt;p&gt;Queue dashboards are useful, but they are not the same as scheduled job monitoring.&lt;/p&gt;

&lt;p&gt;A queue may look healthy while a recurring job is never enqueued. Or the scheduler may enqueue the job successfully while workers are stuck. You need to monitor the expected completion of the scheduled task, not just the presence of a worker process.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Using one heartbeat for many jobs
&lt;/h3&gt;

&lt;p&gt;One generic “daily jobs ran” heartbeat is tempting, but it hides which job failed.&lt;/p&gt;

&lt;p&gt;If you have separate billing, cleanup, report, and sync jobs, give important jobs their own checks. That way, the alert tells you exactly what is missing.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Ignoring time zones
&lt;/h3&gt;

&lt;p&gt;Rails, cron, Sidekiq, Kubernetes, and hosting platforms may use different time zones.&lt;/p&gt;

&lt;p&gt;A job scheduled for “2 AM” may not mean what you think it means. Daylight saving time can also surprise you.&lt;/p&gt;

&lt;p&gt;Use UTC where possible, document the expected schedule, and set heartbeat grace periods based on real execution times.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Swallowing exceptions too broadly
&lt;/h3&gt;

&lt;p&gt;Some Rails jobs rescue everything to avoid retry storms:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;rescue&lt;/span&gt; &lt;span class="no"&gt;StandardError&lt;/span&gt;
  &lt;span class="kp"&gt;nil&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That pattern can hide real failures. If the job also sends a heartbeat after the rescue, monitoring becomes misleading.&lt;/p&gt;

&lt;p&gt;Log exceptions clearly, report them to error tracking, and only send the heartbeat after the required work actually succeeded.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternative approaches
&lt;/h2&gt;

&lt;p&gt;Heartbeat monitoring is the most direct way to detect missed scheduled jobs, but it works best alongside other signals.&lt;/p&gt;

&lt;p&gt;Logs are still useful. Rails logs can show job start times, durations, record counts, API failures, and SQL issues. Structured logs make debugging much easier after an alert fires.&lt;/p&gt;

&lt;p&gt;Error tracking is also important. Tools like Sentry, Honeybadger, Rollbar, or AppSignal can catch exceptions inside jobs. They answer a different question: “Did the job crash with an error?” Heartbeats answer: “Did the job complete on time?”&lt;/p&gt;

&lt;p&gt;Queue monitoring helps too. For Sidekiq, GoodJob, Solid Queue, or Delayed Job, you should watch queue latency, retries, dead jobs, and worker availability. If a scheduled job misses its heartbeat, queue metrics often help explain why.&lt;/p&gt;

&lt;p&gt;Database checks can catch business-level symptoms. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;no invoices created in 24 hours&lt;/li&gt;
&lt;li&gt;no imports completed today&lt;/li&gt;
&lt;li&gt;no reports generated this week&lt;/li&gt;
&lt;li&gt;no webhook retries processed recently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These checks are powerful, but they are usually more custom. A heartbeat is easier to add first.&lt;/p&gt;

&lt;p&gt;Uptime checks are useful for the Rails web app itself, but they are not enough for scheduled work. Your homepage or health endpoint can return &lt;code&gt;200 OK&lt;/code&gt; while every recurring job is broken.&lt;/p&gt;

&lt;p&gt;The best setup is layered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;uptime monitoring for the web app&lt;/li&gt;
&lt;li&gt;error tracking for exceptions&lt;/li&gt;
&lt;li&gt;queue monitoring for background workers&lt;/li&gt;
&lt;li&gt;logs for debugging&lt;/li&gt;
&lt;li&gt;heartbeat monitoring for scheduled job completion&lt;/li&gt;
&lt;li&gt;business checks for critical outcomes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each signal catches a different class of failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Rails scheduled job monitoring?
&lt;/h3&gt;

&lt;p&gt;Rails scheduled job monitoring means tracking whether recurring Rails tasks run successfully on their expected schedule. These tasks may be cron jobs, Rake tasks, Active Job jobs, Sidekiq jobs, GoodJob jobs, or scheduler-triggered background work.&lt;/p&gt;

&lt;p&gt;The goal is to detect missed, failed, delayed, or silently broken jobs before they cause production problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I monitor Rails cron jobs?
&lt;/h3&gt;

&lt;p&gt;The simplest approach is to send a heartbeat ping at the end of each important cron job. An external monitor expects that ping based on the job schedule and alerts you if it does not arrive.&lt;/p&gt;

&lt;p&gt;For example, if a Rails Rake task runs every night, add a success ping after the task completes. If cron fails, Rails crashes, or the job hangs, the ping will be missing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Sidekiq monitoring enough for scheduled jobs?
&lt;/h3&gt;

&lt;p&gt;Sidekiq monitoring is useful, but it is not always enough. It can show retries, dead jobs, queue latency, and worker status. But scheduled job monitoring should also confirm that each expected recurring job completed on time.&lt;/p&gt;

&lt;p&gt;A Sidekiq dashboard may not alert you when a scheduler stops enqueueing a job entirely. Heartbeat monitoring closes that gap.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I ping before or after a Rails job runs?
&lt;/h3&gt;

&lt;p&gt;Usually after.&lt;/p&gt;

&lt;p&gt;A heartbeat should represent successful completion, not just startup. If you ping before the job runs and then the job fails halfway through, your monitor will show a false success.&lt;/p&gt;

&lt;p&gt;Ping only after the critical work finishes.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Rails jobs should have heartbeat monitoring?
&lt;/h3&gt;

&lt;p&gt;Start with jobs where a missed run would hurt users, revenue, data quality, or operations.&lt;/p&gt;

&lt;p&gt;Good candidates include billing syncs, subscription updates, imports, exports, email digests, cleanup tasks, report generation, webhook retries, search indexing, and analytics rollups.&lt;/p&gt;

&lt;p&gt;Not every tiny maintenance task needs its own alert, but important scheduled jobs should be visible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Rails scheduled job monitoring is about proving that important background work actually happened.&lt;/p&gt;

&lt;p&gt;Your Rails app can be online while scheduled jobs are broken. Cron can miss runs. Schedulers can stop. Workers can fail. Environment variables can disappear. Jobs can hang or silently skip work.&lt;/p&gt;

&lt;p&gt;Logs, error tracking, and queue dashboards all help, but they do not fully answer the most important question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Did this scheduled job complete when expected?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Heartbeat monitoring gives you that answer. Add a success ping at the end of each critical Rails scheduled job, set the expected interval, and alert when the signal goes missing.&lt;/p&gt;

&lt;p&gt;That small pattern can save you from discovering a broken billing sync, stale report, or missing cleanup task days too late.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://quietpulse.xyz/blog/rails-scheduled-job-monitoring" rel="noopener noreferrer"&gt;https://quietpulse.xyz/blog/rails-scheduled-job-monitoring&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>rails</category>
      <category>ruby</category>
      <category>cron</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Django Management Command Monitoring: How to Catch Missed Commands Before They Break Production</title>
      <dc:creator>quietpulse</dc:creator>
      <pubDate>Sun, 10 May 2026 06:14:57 +0000</pubDate>
      <link>https://dev.to/quietpulse-social/django-management-command-monitoring-how-to-catch-missed-commands-before-they-break-production-4dol</link>
      <guid>https://dev.to/quietpulse-social/django-management-command-monitoring-how-to-catch-missed-commands-before-they-break-production-4dol</guid>
      <description>&lt;p&gt;Django management command monitoring is easy to overlook.&lt;/p&gt;

&lt;p&gt;A command works when you run it manually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python manage.py sync_invoices
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So you put it in cron, Celery beat, systemd, Kubernetes, or a platform scheduler.&lt;/p&gt;

&lt;p&gt;Then one day it stops running.&lt;/p&gt;

&lt;p&gt;The app is still online. Uptime checks are green. But invoices are missing, reminder emails are not sent, reports are stale, and nobody notices until the data is already wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Django management commands often run outside the normal request/response path.&lt;/p&gt;

&lt;p&gt;They are commonly used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;billing reconciliation&lt;/li&gt;
&lt;li&gt;scheduled emails&lt;/li&gt;
&lt;li&gt;CRM or payment provider syncs&lt;/li&gt;
&lt;li&gt;CSV imports&lt;/li&gt;
&lt;li&gt;cleanup jobs&lt;/li&gt;
&lt;li&gt;search index rebuilds&lt;/li&gt;
&lt;li&gt;report generation&lt;/li&gt;
&lt;li&gt;expired trial handling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These jobs usually run through something outside Django:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0 2 * * * cd /srv/app &amp;amp;&amp;amp; /srv/app/venv/bin/python manage.py sync_invoices
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That creates a monitoring gap.&lt;/p&gt;

&lt;p&gt;Your web app can be healthy while the scheduled command quietly fails.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it happens
&lt;/h2&gt;

&lt;p&gt;Management commands are application code, but they are usually launched by infrastructure.&lt;/p&gt;

&lt;p&gt;That means failures can happen before your Django app has a good chance to report them.&lt;/p&gt;

&lt;p&gt;Common causes include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cron not running&lt;/li&gt;
&lt;li&gt;disabled systemd timers&lt;/li&gt;
&lt;li&gt;stopped Celery beat processes&lt;/li&gt;
&lt;li&gt;missing environment variables&lt;/li&gt;
&lt;li&gt;wrong virtualenv paths&lt;/li&gt;
&lt;li&gt;changed working directories&lt;/li&gt;
&lt;li&gt;expired database credentials&lt;/li&gt;
&lt;li&gt;stuck external API calls&lt;/li&gt;
&lt;li&gt;commands hanging forever&lt;/li&gt;
&lt;li&gt;commands exiting successfully while processing nothing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A command may work perfectly in your shell but fail under cron because cron has a minimal environment.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python manage.py cleanup_expired_trials
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;may work manually, while cron does not know which &lt;code&gt;python&lt;/code&gt; to use or which Django settings module should be loaded.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it's dangerous
&lt;/h2&gt;

&lt;p&gt;Missed Django management commands rarely look like immediate outages.&lt;/p&gt;

&lt;p&gt;They look like slow operational damage.&lt;/p&gt;

&lt;p&gt;A missed billing job means invoices are not generated.&lt;/p&gt;

&lt;p&gt;A missed email job means users are not notified.&lt;/p&gt;

&lt;p&gt;A missed cleanup job means old data piles up until queries slow down.&lt;/p&gt;

&lt;p&gt;A missed sync job means your local database and external system drift apart.&lt;/p&gt;

&lt;p&gt;The painful part is that these failures are often discovered late. By then, you may need to figure out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which records were missed&lt;/li&gt;
&lt;li&gt;whether the command can be safely replayed&lt;/li&gt;
&lt;li&gt;whether duplicate emails or invoices might be created&lt;/li&gt;
&lt;li&gt;how long the job was broken&lt;/li&gt;
&lt;li&gt;whether reports from previous days can be trusted&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why scheduled work needs a direct completion signal.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to detect it
&lt;/h2&gt;

&lt;p&gt;The simplest approach is to monitor completion.&lt;/p&gt;

&lt;p&gt;Not just server uptime.&lt;/p&gt;

&lt;p&gt;Not just whether cron exists.&lt;/p&gt;

&lt;p&gt;Not just whether logs were written.&lt;/p&gt;

&lt;p&gt;Completion.&lt;/p&gt;

&lt;p&gt;The command should send a heartbeat ping after the important work succeeds. If the ping does not arrive within the expected time window, you get an alert.&lt;/p&gt;

&lt;p&gt;The flow is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a heartbeat check for the command.&lt;/li&gt;
&lt;li&gt;Configure the expected schedule.&lt;/li&gt;
&lt;li&gt;Run the Django command normally.&lt;/li&gt;
&lt;li&gt;Send a ping only after the command succeeds.&lt;/li&gt;
&lt;li&gt;Alert if the ping is missing or late.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the scheduler does not run, no ping arrives.&lt;/p&gt;

&lt;p&gt;If Django crashes, no ping arrives.&lt;/p&gt;

&lt;p&gt;If the command hangs, no ping arrives.&lt;/p&gt;

&lt;p&gt;If the server is down during the schedule window, no ping arrives.&lt;/p&gt;

&lt;p&gt;That makes heartbeat monitoring a good fit for Django management command monitoring.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple solution
&lt;/h2&gt;

&lt;p&gt;Start with a normal management command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# billing/management/commands/sync_invoices.py
&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.core.management.base&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseCommand&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;billing.services&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sync_invoices&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseCommand&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;help&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sync invoices from the payment provider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;synced_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sync_invoices&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;style&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SUCCESS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Synced &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;synced_count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; invoices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then schedule it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0 2 * * * cd /srv/app &amp;amp;&amp;amp; /srv/app/.venv/bin/python manage.py sync_invoices
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To monitor successful completion, add a heartbeat ping after the command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0 2 * * * cd /srv/app &amp;amp;&amp;amp; /srv/app/.venv/bin/python manage.py sync_invoices &amp;amp;&amp;amp; curl -fsS https://quietpulse.xyz/ping/YOUR_TOKEN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt; is important.&lt;/p&gt;

&lt;p&gt;It means the ping only runs if the Django command exits successfully.&lt;/p&gt;

&lt;p&gt;For production, add logging and a timeout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0 2 * * * cd /srv/app &amp;amp;&amp;amp; timeout 30m /srv/app/.venv/bin/python manage.py sync_invoices &amp;gt;&amp;gt; /var/log/sync_invoices.log 2&amp;gt;&amp;amp;1 &amp;amp;&amp;amp; curl -fsS https://quietpulse.xyz/ping/YOUR_TOKEN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This catches cases where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the command never starts&lt;/li&gt;
&lt;li&gt;the command fails&lt;/li&gt;
&lt;li&gt;the command hangs&lt;/li&gt;
&lt;li&gt;the final completion signal is missing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can also send the ping from inside Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.conf&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;settings&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;django.core.management.base&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseCommand&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;billing.services&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sync_invoices&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseCommand&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;help&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sync invoices from the payment provider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;synced_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sync_invoices&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SYNC_INVOICES_HEARTBEAT_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;style&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SUCCESS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Synced &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;synced_count&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; invoices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you do this, send the ping after the critical work completes, not before it starts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Sending the heartbeat at the start
&lt;/h3&gt;

&lt;p&gt;This only proves the command started.&lt;/p&gt;

&lt;p&gt;It does not prove the work completed.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Using &lt;code&gt;;&lt;/code&gt; instead of &lt;code&gt;&amp;amp;&amp;amp;&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Avoid this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0 2 * * * python manage.py sync_invoices; curl -fsS https://quietpulse.xyz/ping/YOUR_TOKEN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ping may run even if the command fails.&lt;/p&gt;

&lt;p&gt;Use this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0 2 * * * python manage.py sync_invoices &amp;amp;&amp;amp; curl -fsS https://quietpulse.xyz/ping/YOUR_TOKEN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Relying only on logs
&lt;/h3&gt;

&lt;p&gt;Logs are useful after you know something went wrong.&lt;/p&gt;

&lt;p&gt;They are not always good at telling you that a scheduled command never ran.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Monitoring only the scheduler
&lt;/h3&gt;

&lt;p&gt;Knowing that cron or Celery beat is alive does not prove a specific Django command completed successfully.&lt;/p&gt;

&lt;p&gt;The scheduler can be running while one command fails every day.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Reusing one monitor for every command
&lt;/h3&gt;

&lt;p&gt;Important commands should have separate checks.&lt;/p&gt;

&lt;p&gt;If invoice sync fails, the alert should say invoice sync failed — not “some backend job might be broken.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternative approaches
&lt;/h2&gt;

&lt;p&gt;Heartbeat monitoring works best when combined with other signals.&lt;/p&gt;

&lt;h3&gt;
  
  
  Logs
&lt;/h3&gt;

&lt;p&gt;Good command logs should include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;start time&lt;/li&gt;
&lt;li&gt;finish time&lt;/li&gt;
&lt;li&gt;duration&lt;/li&gt;
&lt;li&gt;processed count&lt;/li&gt;
&lt;li&gt;skipped count&lt;/li&gt;
&lt;li&gt;external API failures&lt;/li&gt;
&lt;li&gt;exceptions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Logs help explain failures, but they still need detection and alerting around them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Error tracking
&lt;/h3&gt;

&lt;p&gt;Error tracking tools are great when a Django command raises an exception.&lt;/p&gt;

&lt;p&gt;But they may not catch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cron never starting&lt;/li&gt;
&lt;li&gt;server downtime during the schedule&lt;/li&gt;
&lt;li&gt;killed processes&lt;/li&gt;
&lt;li&gt;hung commands&lt;/li&gt;
&lt;li&gt;commands that exit successfully but process nothing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scheduler dashboards
&lt;/h3&gt;

&lt;p&gt;Celery, Kubernetes, and platform schedulers may show job history.&lt;/p&gt;

&lt;p&gt;That helps, but the signal is tied to the scheduler.&lt;/p&gt;

&lt;p&gt;A heartbeat ping is portable because it travels with the command.&lt;/p&gt;

&lt;h3&gt;
  
  
  Database audit tables
&lt;/h3&gt;

&lt;p&gt;For critical workflows, writing run metadata to the database can be useful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;command name&lt;/li&gt;
&lt;li&gt;started at&lt;/li&gt;
&lt;li&gt;finished at&lt;/li&gt;
&lt;li&gt;status&lt;/li&gt;
&lt;li&gt;processed count&lt;/li&gt;
&lt;li&gt;error message&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gives you history, but you still need alerting when a run is missing.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Django management command monitoring?
&lt;/h3&gt;

&lt;p&gt;It means tracking whether scheduled Django management commands run and complete successfully. A common pattern is to send a heartbeat ping after the command succeeds and alert if the ping is missing.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I monitor a Django management command in cron?
&lt;/h3&gt;

&lt;p&gt;Run the command normally, then send a heartbeat ping only after success:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0 2 * * * cd /srv/app &amp;amp;&amp;amp; /srv/app/.venv/bin/python manage.py my_command &amp;amp;&amp;amp; curl -fsS https://quietpulse.xyz/ping/YOUR_TOKEN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Should the heartbeat ping happen before or after the command?
&lt;/h3&gt;

&lt;p&gt;Usually after. A ping before the command proves it started. A ping after the command proves it completed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is cron enough for Django scheduled tasks?
&lt;/h3&gt;

&lt;p&gt;Cron can run the task, but it does not reliably tell you when the task was missed, failed, or hung. For production, combine cron with logging, timeouts, and heartbeat monitoring.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does this work with Celery beat or systemd timers?
&lt;/h3&gt;

&lt;p&gt;Yes. The same idea works with cron, Celery beat, systemd timers, Kubernetes CronJobs, GitHub Actions, and platform schedulers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Django management commands often handle important production work quietly in the background.&lt;/p&gt;

&lt;p&gt;That is exactly why they need monitoring.&lt;/p&gt;

&lt;p&gt;If a command syncs data, sends emails, generates invoices, or updates reports, you should know when it stops completing on schedule.&lt;/p&gt;

&lt;p&gt;Logs and error tracking help explain failures. Heartbeat monitoring catches the missing completion signal before stale data turns into an incident.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://quietpulse.xyz/blog/django-management-command-monitoring" rel="noopener noreferrer"&gt;https://quietpulse.xyz/blog/django-management-command-monitoring&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>django</category>
      <category>python</category>
      <category>cron</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>Firebase Scheduled Functions Monitoring: How to Catch Missed Runs Before They Break Production</title>
      <dc:creator>quietpulse</dc:creator>
      <pubDate>Sat, 09 May 2026 07:37:55 +0000</pubDate>
      <link>https://dev.to/quietpulse-social/firebase-scheduled-functions-monitoring-how-to-catch-missed-runs-before-they-break-production-14a1</link>
      <guid>https://dev.to/quietpulse-social/firebase-scheduled-functions-monitoring-how-to-catch-missed-runs-before-they-break-production-14a1</guid>
      <description>&lt;p&gt;Firebase scheduled functions monitoring matters because scheduled backend work is easy to forget until it quietly stops doing its job.&lt;/p&gt;

&lt;p&gt;A Cloud Function might clean old records every night, sync subscription status from a payment provider, send reminder notifications, refresh search indexes, or export analytics data. When that function runs correctly, nobody thinks about it. When it stops running, the app may still look healthy from the outside.&lt;/p&gt;

&lt;p&gt;The website is up. The API responds. Users can log in.&lt;/p&gt;

&lt;p&gt;But the scheduled work is missing.&lt;/p&gt;

&lt;p&gt;That is the dangerous part: Firebase scheduled functions often fail in places that normal uptime monitoring cannot see.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;A Firebase scheduled function is usually created with Cloud Scheduler behind the scenes. Depending on the generation and setup, it may be triggered through Pub/Sub or the newer scheduler integration.&lt;/p&gt;

&lt;p&gt;A typical job might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;onSchedule&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;firebase-functions/v2/scheduler&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nx"&gt;exports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cleanupExpiredSessions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;onSchedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;every 24 hours&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;cleanupExpiredSessions&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It looks simple. Once deployed, you expect it to run forever.&lt;/p&gt;

&lt;p&gt;But production systems are not that clean.&lt;/p&gt;

&lt;p&gt;Scheduled functions can stop working because of deployment mistakes, billing issues, IAM changes, runtime errors, dependency failures, quota problems, region mismatches, or configuration drift. Sometimes the function does run, but exits early before doing the important work. Sometimes it starts failing every night and nobody notices because the rest of the app keeps responding normally.&lt;/p&gt;

&lt;p&gt;The core problem is this:&lt;/p&gt;

&lt;p&gt;Your app can be up while your scheduled work is broken.&lt;/p&gt;

&lt;p&gt;That means uptime checks alone are not enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it happens
&lt;/h2&gt;

&lt;p&gt;Firebase scheduled functions rely on several moving parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cloud Scheduler&lt;/li&gt;
&lt;li&gt;Cloud Functions&lt;/li&gt;
&lt;li&gt;Pub/Sub or scheduler triggers&lt;/li&gt;
&lt;li&gt;IAM permissions&lt;/li&gt;
&lt;li&gt;runtime configuration&lt;/li&gt;
&lt;li&gt;external APIs&lt;/li&gt;
&lt;li&gt;database access&lt;/li&gt;
&lt;li&gt;billing and quota limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If any of those pieces changes, your scheduled task can fail.&lt;/p&gt;

&lt;p&gt;Common causes include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the function was renamed or removed during deployment&lt;/li&gt;
&lt;li&gt;the schedule exists in one region while the function is deployed in another&lt;/li&gt;
&lt;li&gt;an environment variable is missing in production&lt;/li&gt;
&lt;li&gt;the service account lost permission to invoke the function&lt;/li&gt;
&lt;li&gt;Firestore or Realtime Database rules changed&lt;/li&gt;
&lt;li&gt;a third-party API started returning errors&lt;/li&gt;
&lt;li&gt;the job times out on larger data sets&lt;/li&gt;
&lt;li&gt;Firebase billing or Google Cloud quotas block execution&lt;/li&gt;
&lt;li&gt;logs are noisy enough that nobody sees the failure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is also a more subtle failure mode: partial success.&lt;/p&gt;

&lt;p&gt;For example, a scheduled function might begin processing users, update the first 500 records, hit an exception, and stop. From a high level, you may see that the function ran. But the job did not actually complete the work it was responsible for.&lt;/p&gt;

&lt;p&gt;That is why Firebase scheduled functions monitoring should focus on completion, not just invocation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it's dangerous
&lt;/h2&gt;

&lt;p&gt;Missed scheduled functions can create slow, silent damage.&lt;/p&gt;

&lt;p&gt;A broken cleanup job might leave expired sessions, temporary files, or stale documents in your database. A missed billing sync might fail to downgrade unpaid accounts. A failed notification job might leave users waiting for reminders that never arrive. A broken analytics export might create missing reports for several days before anyone notices.&lt;/p&gt;

&lt;p&gt;These failures are dangerous because they often do not produce an obvious incident right away.&lt;/p&gt;

&lt;p&gt;Instead, they accumulate.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;trial users are not converted or expired correctly&lt;/li&gt;
&lt;li&gt;stale Firestore documents keep growing storage costs&lt;/li&gt;
&lt;li&gt;email or push notifications stop being sent&lt;/li&gt;
&lt;li&gt;cache refresh jobs leave users seeing old data&lt;/li&gt;
&lt;li&gt;daily reports are missing&lt;/li&gt;
&lt;li&gt;webhook retry queues are never drained&lt;/li&gt;
&lt;li&gt;database maintenance tasks silently stop&lt;/li&gt;
&lt;li&gt;subscription state becomes inconsistent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the time someone notices, the fix is no longer just “restart the job.”&lt;/p&gt;

&lt;p&gt;You may need to backfill data, repair inconsistent records, explain missing notifications, or manually replay failed work.&lt;/p&gt;

&lt;p&gt;That is why waiting for user complaints is a bad monitoring strategy for scheduled functions.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to detect it
&lt;/h2&gt;

&lt;p&gt;The simplest reliable pattern is heartbeat monitoring.&lt;/p&gt;

&lt;p&gt;Instead of only checking whether the app is online, you check whether the scheduled function completed when expected.&lt;/p&gt;

&lt;p&gt;The idea is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a heartbeat check for the job.&lt;/li&gt;
&lt;li&gt;Give the job a deadline, such as “must complete every 24 hours.”&lt;/li&gt;
&lt;li&gt;At the end of the function, send a ping to the heartbeat URL.&lt;/li&gt;
&lt;li&gt;If the ping does not arrive on time, alert someone.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This detects the thing you actually care about: whether the scheduled function finished successfully.&lt;/p&gt;

&lt;p&gt;For Firebase scheduled functions monitoring, completion-based pings are usually better than start-based pings. A ping at the beginning only proves the function started. It does not prove the work finished.&lt;/p&gt;

&lt;p&gt;A good signal should happen after the important work completes.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;exports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dailyBillingSync&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;onSchedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;every 24 hours&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;syncBillingState&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sendHeartbeatPing&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;syncBillingState()&lt;/code&gt; fails, the heartbeat is not sent.&lt;/p&gt;

&lt;p&gt;That means the missing heartbeat becomes a useful alert.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple solution with example
&lt;/h2&gt;

&lt;p&gt;Here is a practical Firebase scheduled function example using a heartbeat ping.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;onSchedule&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;firebase-functions/v2/scheduler&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;HEARTBEAT_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;QUIETPULSE_HEARTBEAT_URL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;pingHeartbeat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;HEARTBEAT_URL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Missing QUIETPULSE_HEARTBEAT_URL&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;HEARTBEAT_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Heartbeat ping failed: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;syncBillingState&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Example business logic:&lt;/span&gt;
  &lt;span class="c1"&gt;// - fetch active subscriptions from your payment provider&lt;/span&gt;
  &lt;span class="c1"&gt;// - update Firestore user records&lt;/span&gt;
  &lt;span class="c1"&gt;// - expire unpaid accounts&lt;/span&gt;
  &lt;span class="c1"&gt;// - write audit logs&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;exports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dailyBillingSync&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;onSchedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;every 24 hours&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;timeZone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;UTC&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;timeoutSeconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;512MiB&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;syncBillingState&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;pingHeartbeat&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your environment variable would contain a heartbeat URL like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;https://quietpulse.xyz/ping/&lt;span class="o"&gt;{&lt;/span&gt;token&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important detail is placement.&lt;/p&gt;

&lt;p&gt;Put the heartbeat ping after the critical work, not before it.&lt;/p&gt;

&lt;p&gt;If the scheduled function crashes, times out, or exits before finishing, the ping will not be sent. Your monitoring system can then alert you that the expected completion signal is missing.&lt;/p&gt;

&lt;p&gt;You can also use &lt;code&gt;finally&lt;/code&gt;, but be careful. If you always ping inside &lt;code&gt;finally&lt;/code&gt;, you may report success even when the job failed. For scheduled jobs, that is usually the wrong signal.&lt;/p&gt;

&lt;p&gt;This is risky:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;exports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dailyJob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;onSchedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;every 24 hours&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;doImportantWork&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;pingHeartbeat&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That sends a heartbeat even after failure.&lt;/p&gt;

&lt;p&gt;This is usually better:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;exports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;dailyJob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;onSchedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;every 24 hours&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;doImportantWork&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;pingHeartbeat&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the ping means “the job completed,” not just “the job started.”&lt;/p&gt;

&lt;p&gt;Instead of building the alerting layer yourself, you can use a simple heartbeat monitoring tool like QuietPulse. Create a check, copy the ping URL, call it after your Firebase scheduled function completes, and get alerted if the ping is late. The point is not the tool itself — the important part is having an external completion signal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Only checking Firebase logs
&lt;/h3&gt;

&lt;p&gt;Logs are useful when you already know something is wrong.&lt;/p&gt;

&lt;p&gt;They are not enough to tell you that a job never ran.&lt;/p&gt;

&lt;p&gt;If a scheduled function is not invoked, there may be no application log from that function at all. You might need to inspect Cloud Scheduler logs, Pub/Sub delivery, function logs, IAM errors, and deployment history.&lt;/p&gt;

&lt;p&gt;That is a lot to rely on during an incident.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Pinging before the work finishes
&lt;/h3&gt;

&lt;p&gt;A heartbeat at the start of the function proves invocation, not completion.&lt;/p&gt;

&lt;p&gt;For scheduled functions, completion is usually what matters. If the job starts and then fails halfway through, an early ping can hide the failure.&lt;/p&gt;

&lt;p&gt;Put the ping after the important work succeeds.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Using one heartbeat for many jobs
&lt;/h3&gt;

&lt;p&gt;It is tempting to reuse one heartbeat URL for every scheduled function.&lt;/p&gt;

&lt;p&gt;Avoid that.&lt;/p&gt;

&lt;p&gt;A billing sync, cleanup job, report exporter, and notification sender should each have their own check. Otherwise, one healthy job can mask another broken one.&lt;/p&gt;

&lt;p&gt;Use separate heartbeat checks for separate responsibilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Ignoring time zones
&lt;/h3&gt;

&lt;p&gt;Firebase schedules can run with a configured time zone. Your product logic may assume local time, while your monitoring window assumes UTC.&lt;/p&gt;

&lt;p&gt;That mismatch can create false alerts or hide real delays.&lt;/p&gt;

&lt;p&gt;Be explicit about time zones in both the scheduled function and monitoring configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Not testing failure cases
&lt;/h3&gt;

&lt;p&gt;Do not only test the happy path.&lt;/p&gt;

&lt;p&gt;Test what happens when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the function throws an error&lt;/li&gt;
&lt;li&gt;an external API times out&lt;/li&gt;
&lt;li&gt;the heartbeat URL is missing&lt;/li&gt;
&lt;li&gt;the job takes longer than expected&lt;/li&gt;
&lt;li&gt;the scheduled function is disabled&lt;/li&gt;
&lt;li&gt;the deployment removes or renames the function&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Monitoring that has never seen a failure is often monitoring you cannot trust.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternative approaches
&lt;/h2&gt;

&lt;p&gt;Heartbeat monitoring is not the only option. It is just one of the clearest ways to detect missed scheduled work.&lt;/p&gt;

&lt;p&gt;Other useful signals include:&lt;/p&gt;

&lt;h3&gt;
  
  
  Firebase and Google Cloud logs
&lt;/h3&gt;

&lt;p&gt;Cloud Logging can show function errors, execution duration, and scheduler delivery events. This is useful for debugging.&lt;/p&gt;

&lt;p&gt;The downside is that logs are often reactive. Someone still needs to notice the failure, query the right logs, and understand what should have happened.&lt;/p&gt;

&lt;h3&gt;
  
  
  Error tracking
&lt;/h3&gt;

&lt;p&gt;Tools like Sentry can catch exceptions inside scheduled functions.&lt;/p&gt;

&lt;p&gt;That helps when the function runs and throws.&lt;/p&gt;

&lt;p&gt;But error tracking may not catch missed invocations. If the function never starts, there may be no exception inside your application code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloud Scheduler monitoring
&lt;/h3&gt;

&lt;p&gt;You can monitor Cloud Scheduler execution attempts and failures.&lt;/p&gt;

&lt;p&gt;This helps detect trigger-level issues, but it may not prove business-level completion. The scheduler can successfully invoke a function that later fails internally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Database audit records
&lt;/h3&gt;

&lt;p&gt;Some teams write a &lt;code&gt;job_runs&lt;/code&gt; document to Firestore for each scheduled task.&lt;/p&gt;

&lt;p&gt;That can be very useful:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;job_runs&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;job&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;dailyBillingSync&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;success&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;finishedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you a history of runs.&lt;/p&gt;

&lt;p&gt;But you still need something to watch that history and alert you when a run is missing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Custom dashboards
&lt;/h3&gt;

&lt;p&gt;You can build your own dashboard showing the last successful run of each job.&lt;/p&gt;

&lt;p&gt;That works well if you have time to maintain it. For small teams and indie projects, an external heartbeat check is often simpler and less fragile.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Firebase scheduled functions monitoring?
&lt;/h3&gt;

&lt;p&gt;Firebase scheduled functions monitoring is the process of checking whether scheduled Cloud Functions run and complete on time. It helps detect missed executions, runtime failures, delays, and silent scheduled job problems before they affect users or data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are Firebase logs enough to monitor scheduled functions?
&lt;/h3&gt;

&lt;p&gt;Firebase logs are helpful for debugging, but they are not enough by themselves. Logs can show errors after you look for them, but they may not proactively alert you when a scheduled function never runs or never completes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I ping a heartbeat at the start or end of a scheduled function?
&lt;/h3&gt;

&lt;p&gt;For most production jobs, ping at the end. A start ping only proves that the function began. An end ping proves that the important work completed successfully.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can Firebase scheduled functions fail silently?
&lt;/h3&gt;

&lt;p&gt;Yes. They can fail because of permissions, deployment changes, missing environment variables, timeouts, quota issues, external API failures, or scheduler configuration problems. Some failures may not be obvious from normal uptime checks.&lt;/p&gt;

&lt;h3&gt;
  
  
  How often should I monitor a Firebase scheduled function?
&lt;/h3&gt;

&lt;p&gt;Match the monitoring window to the schedule. If a function runs every hour, alert if it does not complete within a reasonable grace period after that hour. If it runs daily, use a daily check with enough grace time for normal delays.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Firebase scheduled functions are great for background work, but they can fail quietly.&lt;/p&gt;

&lt;p&gt;The app may stay online while billing syncs, cleanup tasks, reports, notifications, or maintenance jobs stop running.&lt;/p&gt;

&lt;p&gt;Good Firebase scheduled functions monitoring focuses on completion. Add a heartbeat ping after the important work finishes, give each job its own check, and alert when the expected signal does not arrive.&lt;/p&gt;

&lt;p&gt;That simple pattern catches the failures that uptime checks, dashboards, and logs often miss.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://quietpulse.xyz/blog/firebase-scheduled-functions-monitoring" rel="noopener noreferrer"&gt;https://quietpulse.xyz/blog/firebase-scheduled-functions-monitoring&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>firebase</category>
      <category>serverless</category>
      <category>monitoring</category>
      <category>devops</category>
    </item>
    <item>
      <title>Supabase Scheduled Functions Monitoring: How to Catch Missed Runs Before They Break Production</title>
      <dc:creator>quietpulse</dc:creator>
      <pubDate>Fri, 08 May 2026 06:11:06 +0000</pubDate>
      <link>https://dev.to/quietpulse-social/supabase-scheduled-functions-monitoring-how-to-catch-missed-runs-before-they-break-production-373b</link>
      <guid>https://dev.to/quietpulse-social/supabase-scheduled-functions-monitoring-how-to-catch-missed-runs-before-they-break-production-373b</guid>
      <description>&lt;p&gt;Supabase scheduled functions monitoring matters because scheduled backend work can fail quietly while your app still looks completely healthy.&lt;/p&gt;

&lt;p&gt;Your frontend loads. Your API responds. The database is online. Auth works. But the scheduled function that cleans old rows, syncs billing data, sends reminders, refreshes materialized views, or calls an external API may have stopped running hours ago.&lt;/p&gt;

&lt;p&gt;That is the dangerous part about scheduled work in serverless and database-backed apps: the failure is often invisible until some downstream symptom appears.&lt;/p&gt;

&lt;p&gt;A scheduled function is not “up” in the same way a web endpoint is up. It either ran when expected and completed the important work, or it did not.&lt;/p&gt;

&lt;p&gt;Supabase gives developers a powerful stack for building quickly, including Edge Functions, Postgres, cron-like scheduling patterns, and database automation. But once scheduled tasks become part of production, you need monitoring that answers a very specific question:&lt;/p&gt;

&lt;p&gt;Did this scheduled function actually run and finish successfully?&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Scheduled backend work is easy to add and easy to forget.&lt;/p&gt;

&lt;p&gt;In a Supabase app, you might have scheduled work that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deletes expired sessions or temporary records&lt;/li&gt;
&lt;li&gt;sends daily or weekly email digests&lt;/li&gt;
&lt;li&gt;refreshes reporting tables&lt;/li&gt;
&lt;li&gt;syncs subscription status from a payment provider&lt;/li&gt;
&lt;li&gt;calls an external API on a schedule&lt;/li&gt;
&lt;li&gt;checks usage limits&lt;/li&gt;
&lt;li&gt;exports analytics&lt;/li&gt;
&lt;li&gt;cleans up old files in storage&lt;/li&gt;
&lt;li&gt;recalculates account metrics&lt;/li&gt;
&lt;li&gt;triggers notifications&lt;/li&gt;
&lt;li&gt;verifies backup or data consistency jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some of this work may live in Supabase Edge Functions. Some may be driven by Postgres cron extensions, database triggers, external schedulers, GitHub Actions, Vercel Cron, or another service that calls a Supabase function endpoint.&lt;/p&gt;

&lt;p&gt;The exact implementation varies, but the operational risk is the same:&lt;/p&gt;

&lt;p&gt;If the scheduled job does not run, your main app can still look fine.&lt;/p&gt;

&lt;p&gt;Traditional uptime monitoring might check your homepage or API route. That is useful, but it does not tell you whether yesterday’s cleanup ran, whether today’s digest was sent, or whether the scheduled sync completed.&lt;/p&gt;

&lt;p&gt;This creates a silent failure.&lt;/p&gt;

&lt;p&gt;The system is not fully down. There may be no obvious error on the public surface. But an important background process is missing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it happens
&lt;/h2&gt;

&lt;p&gt;Supabase scheduled functions can fail silently for several reasons.&lt;/p&gt;

&lt;p&gt;One common cause is scheduler drift. The schedule may be configured outside the main application code, or it may depend on infrastructure that someone rarely checks. A cron expression can be wrong, disabled, duplicated, or moved to a different environment. A staging schedule may accidentally replace production behavior, or production may stop receiving triggers after a deployment change.&lt;/p&gt;

&lt;p&gt;Another cause is function deployment drift. Edge Functions are code, and code changes. A function can be renamed, removed, redeployed with different environment variables, or changed in a way that breaks scheduled execution while manual tests still pass.&lt;/p&gt;

&lt;p&gt;For example, a scheduled function might depend on an environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Deno&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;EXTERNAL_API_KEY&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Missing EXTERNAL_API_KEY&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If that secret is missing in production, the function may fail every time the scheduler invokes it.&lt;/p&gt;

&lt;p&gt;Database permissions can also be a problem. Scheduled work often touches tables, service-role operations, storage buckets, or external APIs. A permission change that is safe for normal user requests can still break a privileged background job.&lt;/p&gt;

&lt;p&gt;External dependencies are another source of failure. A scheduled function might call Stripe, Resend, Slack, OpenAI, a webhook endpoint, or an internal API. If that dependency times out, rate limits requests, changes response shape, or rejects credentials, the scheduled job can fail even though Supabase itself is healthy.&lt;/p&gt;

&lt;p&gt;Timeouts and partial completion are especially tricky. A function might start successfully, process half the records, then time out. Logs may contain the failure, but unless someone checks them or receives an alert, the job can keep failing silently.&lt;/p&gt;

&lt;p&gt;Finally, many teams confuse logs with monitoring. Supabase logs are useful when investigating a known issue. But logs do not automatically prove that a scheduled job ran on time and completed successfully.&lt;/p&gt;

&lt;p&gt;Monitoring should detect the missing success signal before a human goes digging through logs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it's dangerous
&lt;/h2&gt;

&lt;p&gt;Missed scheduled functions are dangerous because they usually manage work that users do not directly trigger.&lt;/p&gt;

&lt;p&gt;A failed scheduled function can cause:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stale reports or dashboards&lt;/li&gt;
&lt;li&gt;expired data staying in the database&lt;/li&gt;
&lt;li&gt;email digests not being sent&lt;/li&gt;
&lt;li&gt;payment status not syncing&lt;/li&gt;
&lt;li&gt;usage limits not updating&lt;/li&gt;
&lt;li&gt;notification queues backing up&lt;/li&gt;
&lt;li&gt;old files accumulating in storage&lt;/li&gt;
&lt;li&gt;cleanup tasks never running&lt;/li&gt;
&lt;li&gt;third-party integrations falling behind&lt;/li&gt;
&lt;li&gt;delayed or incorrect customer-facing data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These failures often compound.&lt;/p&gt;

&lt;p&gt;If a daily cleanup misses one run, maybe nothing obvious happens. If it misses seven runs, tables grow unexpectedly, queries slow down, storage costs rise, and users start seeing outdated data.&lt;/p&gt;

&lt;p&gt;If a billing sync stops, customers may keep access after cancellation, lose access after payment, or receive confusing account states.&lt;/p&gt;

&lt;p&gt;If a reporting refresh fails, business dashboards quietly become wrong. That can lead to bad decisions because the data looks normal, just stale.&lt;/p&gt;

&lt;p&gt;The painful part is that the first visible symptom usually appears far away from the root cause.&lt;/p&gt;

&lt;p&gt;Someone might report:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Why didn’t I get the digest?”&lt;/li&gt;
&lt;li&gt;“Why is this dashboard stale?”&lt;/li&gt;
&lt;li&gt;“Why is this user still marked active?”&lt;/li&gt;
&lt;li&gt;“Why is storage growing so fast?”&lt;/li&gt;
&lt;li&gt;“Why did this webhook not update the account?”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then you have to reconstruct what happened:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did the scheduler fire?&lt;/li&gt;
&lt;li&gt;Did Supabase receive the request?&lt;/li&gt;
&lt;li&gt;Did the Edge Function start?&lt;/li&gt;
&lt;li&gt;Did it have the right secrets?&lt;/li&gt;
&lt;li&gt;Did it finish?&lt;/li&gt;
&lt;li&gt;Did it fail halfway through?&lt;/li&gt;
&lt;li&gt;Did it retry?&lt;/li&gt;
&lt;li&gt;Did anyone get alerted?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is exactly the kind of uncertainty good scheduled function monitoring should remove.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to detect it
&lt;/h2&gt;

&lt;p&gt;The most reliable way to detect missed scheduled functions is to make the function send a success signal after the important work completes.&lt;/p&gt;

&lt;p&gt;This is heartbeat monitoring.&lt;/p&gt;

&lt;p&gt;Instead of asking, “Is my Supabase project online?”, heartbeat monitoring asks:&lt;/p&gt;

&lt;p&gt;“Did this specific scheduled function report success inside the expected time window?”&lt;/p&gt;

&lt;p&gt;The pattern is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a heartbeat check for the expected schedule.&lt;/li&gt;
&lt;li&gt;Run your scheduled Supabase function normally.&lt;/li&gt;
&lt;li&gt;Send a heartbeat ping only after the critical work succeeds.&lt;/li&gt;
&lt;li&gt;Alert if the ping does not arrive on time.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This catches the failure mode that logs and uptime checks often miss: absence.&lt;/p&gt;

&lt;p&gt;If the scheduler never fires, no ping arrives.&lt;/p&gt;

&lt;p&gt;If the function crashes before completion, no ping arrives.&lt;/p&gt;

&lt;p&gt;If a secret is missing, no ping arrives.&lt;/p&gt;

&lt;p&gt;If an external API fails and the job exits early, no ping arrives.&lt;/p&gt;

&lt;p&gt;If the job hangs or times out before reaching the final step, no ping arrives.&lt;/p&gt;

&lt;p&gt;That makes the monitoring signal much more meaningful than checking whether a generic endpoint returns &lt;code&gt;200 OK&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The heartbeat should represent successful completion, not just startup.&lt;/p&gt;

&lt;p&gt;For example, if your function syncs subscription status, the ping should happen after the sync finishes. If your function sends a daily digest, the ping should happen after the digest job completes. If your function refreshes reporting tables, the ping should happen after the refresh succeeds.&lt;/p&gt;

&lt;p&gt;A startup ping only proves that the function began. A completion ping proves the work finished.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple solution with example
&lt;/h2&gt;

&lt;p&gt;Here is a simplified Supabase Edge Function example.&lt;/p&gt;

&lt;p&gt;Imagine you have a scheduled function that cleans up expired rows once per day.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// supabase/functions/cleanup-expired-records/index.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;serve&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://deno.land/std@0.224.0/http/server.ts&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createClient&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://esm.sh/@supabase/supabase-js@2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nf"&gt;serve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;supabaseUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Deno&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SUPABASE_URL&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;serviceRoleKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Deno&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SUPABASE_SERVICE_ROLE_KEY&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;heartbeatUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Deno&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;CLEANUP_HEARTBEAT_URL&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;supabaseUrl&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;serviceRoleKey&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Missing Supabase configuration&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;supabaseUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;serviceRoleKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;supabase&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;temporary_records&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;delete&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;expires_at&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Cleanup failed:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Cleanup failed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;heartbeatUrl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ping&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;heartbeatUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Heartbeat ping failed:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Heartbeat failed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Cleanup completed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then set the heartbeat URL as an environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;CLEANUP_HEARTBEAT_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://quietpulse.xyz/ping/&lt;span class="o"&gt;{&lt;/span&gt;token&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The exact scheduler can vary. You might call this Edge Function from an external cron service, from GitHub Actions, from another platform scheduler, or from a database-driven scheduling setup.&lt;/p&gt;

&lt;p&gt;The important part is not which scheduler triggers the function.&lt;/p&gt;

&lt;p&gt;The important part is that the function reports success only after the critical work completes.&lt;/p&gt;

&lt;p&gt;If the scheduled function runs every day at 02:00 UTC, configure the heartbeat check to expect one ping per day, with a reasonable grace period.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;expected interval: 24 hours&lt;/li&gt;
&lt;li&gt;grace period: 30–60 minutes&lt;/li&gt;
&lt;li&gt;alert channel: Telegram, webhook, or another notification route&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That way, if the function does not complete by the expected time, you get an alert.&lt;/p&gt;

&lt;p&gt;Instead of discovering the issue from stale data later, you know shortly after the scheduled work fails to report success.&lt;/p&gt;

&lt;p&gt;Instead of building this alerting flow yourself, you can use a heartbeat monitoring tool like QuietPulse. Create a check, copy the ping URL, and call it at the end of your scheduled function. If the ping does not arrive on time, QuietPulse can notify you through Telegram or webhooks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Pinging at the start of the function
&lt;/h3&gt;

&lt;p&gt;A heartbeat ping at the beginning only proves that the function started.&lt;/p&gt;

&lt;p&gt;That can create false confidence.&lt;/p&gt;

&lt;p&gt;If the function starts, deletes nothing, fails on an external API call, times out, or crashes halfway through, the monitor still sees a successful ping.&lt;/p&gt;

&lt;p&gt;For scheduled functions, the heartbeat should usually be the final step after the important work completes.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Monitoring only the public app
&lt;/h3&gt;

&lt;p&gt;A homepage uptime check is useful, but it does not monitor background work.&lt;/p&gt;

&lt;p&gt;Your Supabase app can be online while scheduled functions fail for days.&lt;/p&gt;

&lt;p&gt;Use uptime checks for request/response availability. Use heartbeat checks for scheduled work.&lt;/p&gt;

&lt;p&gt;They answer different questions.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Relying only on logs
&lt;/h3&gt;

&lt;p&gt;Logs are valuable for debugging, but they are weak as the first line of detection.&lt;/p&gt;

&lt;p&gt;If nobody is watching the logs, a failure can sit there unnoticed.&lt;/p&gt;

&lt;p&gt;A heartbeat check gives you an explicit missing-success alert. Logs then help you investigate why the success signal did not arrive.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Using one heartbeat for multiple jobs
&lt;/h3&gt;

&lt;p&gt;If you have several scheduled functions, do not hide them behind one generic monitor.&lt;/p&gt;

&lt;p&gt;A cleanup job, billing sync, digest sender, and reporting refresh should usually have separate checks.&lt;/p&gt;

&lt;p&gt;Separate checks make alerts actionable. You immediately know which scheduled function missed its expected run.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Ignoring partial failures
&lt;/h3&gt;

&lt;p&gt;A function can return success even when part of the work failed.&lt;/p&gt;

&lt;p&gt;For example, a digest job might send 900 emails out of 1,000 and silently skip the rest. A sync might process one page of API results and fail before the next page.&lt;/p&gt;

&lt;p&gt;Make sure your function treats important partial failures as failures. Only ping the heartbeat after the job meets your real success criteria.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternative approaches
&lt;/h2&gt;

&lt;p&gt;Heartbeat monitoring is the cleanest way to detect missed scheduled functions, but it is not the only useful signal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Supabase logs
&lt;/h3&gt;

&lt;p&gt;Supabase logs are important for debugging. They can show function invocations, errors, stack traces, and timing information.&lt;/p&gt;

&lt;p&gt;Use them to answer “what happened?”&lt;/p&gt;

&lt;p&gt;But logs are less reliable for answering “did the expected scheduled function finish on time?” unless you build alerting around them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Database audit tables
&lt;/h3&gt;

&lt;p&gt;Some teams create a &lt;code&gt;job_runs&lt;/code&gt; table and insert a row for each scheduled execution.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;create&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt; &lt;span class="n"&gt;job_runs&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt; &lt;span class="k"&gt;primary&lt;/span&gt; &lt;span class="k"&gt;key&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="n"&gt;gen_random_uuid&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;job_name&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;started_at&lt;/span&gt; &lt;span class="n"&gt;timestamptz&lt;/span&gt; &lt;span class="k"&gt;not&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="n"&gt;finished_at&lt;/span&gt; &lt;span class="n"&gt;timestamptz&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;error_message&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This can be very useful, especially for internal dashboards and debugging history.&lt;/p&gt;

&lt;p&gt;But you still need something to check that table and alert when a run is missing or failed. Otherwise, it becomes another place where failures are recorded but not noticed.&lt;/p&gt;

&lt;h3&gt;
  
  
  External scheduler alerts
&lt;/h3&gt;

&lt;p&gt;Some schedulers provide failure notifications. That helps when the scheduler fires and receives a failing response.&lt;/p&gt;

&lt;p&gt;But scheduler alerts may not catch every important case. They might not know whether the function completed all internal work correctly. They also may not alert when the function returns &lt;code&gt;200 OK&lt;/code&gt; too early.&lt;/p&gt;

&lt;p&gt;Heartbeat monitoring works well alongside scheduler alerts because it focuses on completion of the actual work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Application metrics
&lt;/h3&gt;

&lt;p&gt;Metrics are useful when scheduled functions affect measurable values: rows processed, emails sent, API records synced, duration, error count, and so on.&lt;/p&gt;

&lt;p&gt;If you already have metrics infrastructure, instrumenting scheduled functions is a good idea.&lt;/p&gt;

&lt;p&gt;But for many small teams and indie projects, a simple heartbeat is faster to set up and catches the most important failure mode: the job did not complete.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Supabase scheduled functions monitoring?
&lt;/h3&gt;

&lt;p&gt;Supabase scheduled functions monitoring is the practice of tracking whether scheduled backend work in a Supabase app runs and completes successfully. This can include Edge Functions, database jobs, cleanup tasks, sync jobs, reporting refreshes, and other recurring automation.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I know if a Supabase scheduled function did not run?
&lt;/h3&gt;

&lt;p&gt;The most direct way is to use a heartbeat check. Add a ping at the end of the scheduled function after the important work succeeds. If the ping does not arrive within the expected time window, the function probably did not run or did not complete successfully.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are Supabase logs enough for scheduled function monitoring?
&lt;/h3&gt;

&lt;p&gt;Supabase logs are useful for debugging, but they are not always enough for monitoring. Logs can show errors after you know there is a problem. Heartbeat monitoring alerts you when the expected success signal is missing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I ping before or after the scheduled function work?
&lt;/h3&gt;

&lt;p&gt;Usually after. A heartbeat ping should represent successful completion. If you ping at the beginning, the function can still fail halfway through while the monitor thinks everything is fine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I monitor multiple Supabase scheduled functions with one heartbeat?
&lt;/h3&gt;

&lt;p&gt;You can, but it is usually better to create one heartbeat check per important scheduled function. Separate checks make alerts clearer and help you quickly identify which job missed its run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Scheduled functions are easy to trust because they usually run quietly in the background.&lt;/p&gt;

&lt;p&gt;That quietness is also the risk.&lt;/p&gt;

&lt;p&gt;A Supabase app can look healthy while a cleanup job, billing sync, digest sender, or reporting refresh silently stops running. Uptime checks and logs help, but they do not always prove that the scheduled work completed on time.&lt;/p&gt;

&lt;p&gt;For production scheduled work, add a completion signal.&lt;/p&gt;

&lt;p&gt;Use heartbeat monitoring to confirm that each important Supabase scheduled function runs when expected and finishes successfully. If the heartbeat does not arrive, alert early, investigate quickly, and fix the issue before stale data or missed automation turns into a real incident.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://quietpulse.xyz/blog/supabase-scheduled-functions-monitoring" rel="noopener noreferrer"&gt;https://quietpulse.xyz/blog/supabase-scheduled-functions-monitoring&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>supabase</category>
      <category>serverless</category>
      <category>monitoring</category>
      <category>devops</category>
    </item>
    <item>
      <title>GitLab Scheduled Pipeline Monitoring: How to Catch Missed CI/CD Runs Before They Break Production</title>
      <dc:creator>quietpulse</dc:creator>
      <pubDate>Thu, 07 May 2026 06:22:08 +0000</pubDate>
      <link>https://dev.to/quietpulse-social/gitlab-scheduled-pipeline-monitoring-how-to-catch-missed-cicd-runs-before-they-break-production-5don</link>
      <guid>https://dev.to/quietpulse-social/gitlab-scheduled-pipeline-monitoring-how-to-catch-missed-cicd-runs-before-they-break-production-5don</guid>
      <description>&lt;p&gt;GitLab scheduled pipeline monitoring matters because scheduled CI/CD jobs can fail quietly while the rest of your system looks healthy.&lt;/p&gt;

&lt;p&gt;Your application is up. Your GitLab project is reachable. Recent commits build successfully. But the scheduled pipeline that runs nightly tests, refreshes staging data, checks dependencies, builds reports, syncs artifacts, or runs cleanup jobs may have stopped hours or days ago.&lt;/p&gt;

&lt;p&gt;That is the uncomfortable part about scheduled pipelines: they often run outside the normal developer flow. Nobody is sitting there waiting for them. When they fail silently, the first visible symptom may be stale data, missed checks, broken deployments, or a production issue that should have been caught earlier.&lt;/p&gt;

&lt;p&gt;GitLab scheduled pipelines are useful, but they still need monitoring that answers one simple question:&lt;/p&gt;

&lt;p&gt;Did the scheduled pipeline actually run and complete successfully?&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;A GitLab scheduled pipeline is not the same thing as a pipeline triggered by a commit or merge request.&lt;/p&gt;

&lt;p&gt;Commit pipelines are visible because they happen during active development. Someone pushes code, reviews a merge request, and sees whether the pipeline passed or failed.&lt;/p&gt;

&lt;p&gt;Scheduled pipelines are different. They run in the background on a timer.&lt;/p&gt;

&lt;p&gt;For example, a team might use GitLab pipeline schedules to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;run nightly end-to-end tests&lt;/li&gt;
&lt;li&gt;rebuild static assets or documentation&lt;/li&gt;
&lt;li&gt;refresh a staging database&lt;/li&gt;
&lt;li&gt;check dependency updates&lt;/li&gt;
&lt;li&gt;scan containers for vulnerabilities&lt;/li&gt;
&lt;li&gt;generate reports&lt;/li&gt;
&lt;li&gt;run cleanup scripts&lt;/li&gt;
&lt;li&gt;sync data between systems&lt;/li&gt;
&lt;li&gt;trigger periodic deployments&lt;/li&gt;
&lt;li&gt;validate backups or exports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If one of these scheduled pipelines stops running, normal uptime monitoring will not catch it. Your web app may still return &lt;code&gt;200 OK&lt;/code&gt;. Your API may still respond. GitLab may still be available. But the specific piece of scheduled work is missing.&lt;/p&gt;

&lt;p&gt;That creates a silent failure.&lt;/p&gt;

&lt;p&gt;The system is not completely down, so broad monitoring stays green. But an important recurring job did not happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it happens
&lt;/h2&gt;

&lt;p&gt;GitLab scheduled pipelines can fail silently for several reasons.&lt;/p&gt;

&lt;p&gt;The first cause is schedule configuration drift. A pipeline schedule may be disabled, edited, pointed at the wrong branch, or configured with a cron expression that does not mean what the team thinks it means. Time zones can also be confusing, especially when teams expect local business time but the schedule is evaluated differently.&lt;/p&gt;

&lt;p&gt;The second cause is CI configuration drift. A scheduled pipeline depends on &lt;code&gt;.gitlab-ci.yml&lt;/code&gt;. A refactor can rename a job, change rules, remove a stage, or accidentally make a scheduled job stop matching the schedule source.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;nightly_tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;stage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test&lt;/span&gt;
  &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;npm run test:e2e&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;$CI_PIPELINE_SOURCE&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;==&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"schedule"'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is clear enough when it works. But if someone changes rules globally, updates stages, removes a variable, or changes the default branch, this job may no longer run as expected.&lt;/p&gt;

&lt;p&gt;The third cause is expired or missing credentials. Scheduled pipelines often use tokens, deploy keys, API credentials, registry access, cloud credentials, or environment variables. A normal build might still work while the scheduled job fails because it needs a different secret.&lt;/p&gt;

&lt;p&gt;The fourth cause is dependency failure. A scheduled pipeline might call an external API, database, package registry, object storage bucket, internal service, or deployment endpoint. If that dependency fails, the pipeline may fail, hang, or exit early.&lt;/p&gt;

&lt;p&gt;The fifth cause is false confidence from GitLab status alone. A failed scheduled pipeline might be visible somewhere in GitLab, but visibility is not the same as alerting. If nobody checks the schedule page or pipeline history, the failure can sit there unnoticed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it's dangerous
&lt;/h2&gt;

&lt;p&gt;Missed scheduled pipelines are dangerous because they usually protect work that is not checked by normal request/response monitoring.&lt;/p&gt;

&lt;p&gt;A failed or missed scheduled pipeline can mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;nightly tests stop catching regressions&lt;/li&gt;
&lt;li&gt;dependency checks stop running&lt;/li&gt;
&lt;li&gt;stale artifacts are served&lt;/li&gt;
&lt;li&gt;vulnerability scans are skipped&lt;/li&gt;
&lt;li&gt;staging data becomes outdated&lt;/li&gt;
&lt;li&gt;reports are not generated&lt;/li&gt;
&lt;li&gt;cleanup jobs never run&lt;/li&gt;
&lt;li&gt;backups are not verified&lt;/li&gt;
&lt;li&gt;scheduled deployments do not happen&lt;/li&gt;
&lt;li&gt;compliance or audit checks are missed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The risk is not always immediate. That is exactly what makes it easy to ignore.&lt;/p&gt;

&lt;p&gt;If your production API goes down, someone notices quickly. If a nightly scheduled pipeline fails three nights in a row, you might only discover it when a release breaks, a customer reports stale data, or a security scan that should have run never produced results.&lt;/p&gt;

&lt;p&gt;By then, debugging becomes harder.&lt;/p&gt;

&lt;p&gt;You have to answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did GitLab trigger the schedule?&lt;/li&gt;
&lt;li&gt;Did the pipeline start?&lt;/li&gt;
&lt;li&gt;Did the expected jobs run?&lt;/li&gt;
&lt;li&gt;Did a rule skip them?&lt;/li&gt;
&lt;li&gt;Did a secret expire?&lt;/li&gt;
&lt;li&gt;Did a dependency fail?&lt;/li&gt;
&lt;li&gt;Did the pipeline succeed but skip the important step?&lt;/li&gt;
&lt;li&gt;Did anyone get notified?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GitLab pipeline history is useful for investigation. But monitoring should tell you there is a problem before you need to investigate.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to detect it
&lt;/h2&gt;

&lt;p&gt;The most reliable way to detect a missing scheduled pipeline is to make the pipeline send a success signal after the important work finishes.&lt;/p&gt;

&lt;p&gt;This is heartbeat monitoring.&lt;/p&gt;

&lt;p&gt;Instead of asking, “Is GitLab up?”, heartbeat monitoring asks, “Did this specific scheduled pipeline report success inside the expected time window?”&lt;/p&gt;

&lt;p&gt;For GitLab scheduled pipeline monitoring, the pattern looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a heartbeat check for the expected schedule.&lt;/li&gt;
&lt;li&gt;Run the scheduled GitLab pipeline normally.&lt;/li&gt;
&lt;li&gt;Put the heartbeat ping at the end of the job that proves success.&lt;/li&gt;
&lt;li&gt;If the ping does not arrive on time, send an alert.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This catches the failure mode that normal uptime checks miss: absence.&lt;/p&gt;

&lt;p&gt;A heartbeat monitor does not need to understand every detail of your pipeline. It only needs to know whether the completion signal arrived when expected.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a nightly test pipeline should ping once per night&lt;/li&gt;
&lt;li&gt;an hourly sync pipeline should ping once per hour&lt;/li&gt;
&lt;li&gt;a weekly vulnerability scan should ping once per week&lt;/li&gt;
&lt;li&gt;a daily report pipeline should ping after the report is generated&lt;/li&gt;
&lt;li&gt;a backup verification pipeline should ping after verification succeeds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important detail is placement.&lt;/p&gt;

&lt;p&gt;Send the heartbeat after the meaningful work completes, not at the start of the pipeline. If you ping first and the job fails later, your monitor will think the scheduled pipeline is healthy when it is not.&lt;/p&gt;

&lt;p&gt;A good heartbeat means:&lt;/p&gt;

&lt;p&gt;“The scheduled pipeline ran and reached the success point.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple solution
&lt;/h2&gt;

&lt;p&gt;Here is a simple GitLab CI job that runs only for scheduled pipelines and sends a heartbeat after the work succeeds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;stages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;test&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;notify&lt;/span&gt;

&lt;span class="na"&gt;nightly_tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;stage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;$CI_PIPELINE_SOURCE&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;==&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"schedule"'&lt;/span&gt;
  &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;npm run test:e2e&lt;/span&gt;

&lt;span class="na"&gt;scheduled_pipeline_heartbeat&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;stage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;notify&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;$CI_PIPELINE_SOURCE&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;==&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"schedule"'&lt;/span&gt;
  &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;nightly_tests&lt;/span&gt;
  &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;curl --fail --silent --show-error "$QUIETPULSE_PING_URL"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The environment variable would contain a ping URL like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://quietpulse.xyz/ping/YOUR_TOKEN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is intentionally simple.&lt;/p&gt;

&lt;p&gt;The scheduled work runs first. If &lt;code&gt;nightly_tests&lt;/code&gt; fails, the heartbeat job does not run. If the scheduled pipeline never starts, no heartbeat arrives. If the pipeline is disabled, no heartbeat arrives. If a rule skips the job, no heartbeat arrives.&lt;/p&gt;

&lt;p&gt;That absence is the signal.&lt;/p&gt;

&lt;p&gt;For a more realistic pipeline, you might have multiple jobs before the heartbeat:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;stages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;prepare&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;test&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;scan&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;notify&lt;/span&gt;

&lt;span class="na"&gt;prepare_staging_data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;stage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;prepare&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;$CI_PIPELINE_SOURCE&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;==&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"schedule"'&lt;/span&gt;
  &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./scripts/refresh-staging-data.sh&lt;/span&gt;

&lt;span class="na"&gt;nightly_e2e_tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;stage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;test&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;$CI_PIPELINE_SOURCE&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;==&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"schedule"'&lt;/span&gt;
  &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;prepare_staging_data&lt;/span&gt;
  &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;npm run test:e2e&lt;/span&gt;

&lt;span class="na"&gt;dependency_scan&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;stage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;scan&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;$CI_PIPELINE_SOURCE&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;==&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"schedule"'&lt;/span&gt;
  &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./scripts/check-dependencies.sh&lt;/span&gt;

&lt;span class="na"&gt;scheduled_success_ping&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;stage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;notify&lt;/span&gt;
  &lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;$CI_PIPELINE_SOURCE&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;==&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"schedule"'&lt;/span&gt;
  &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;nightly_e2e_tests&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;dependency_scan&lt;/span&gt;
  &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;curl --fail --silent --show-error "$QUIETPULSE_PING_URL"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this setup, the heartbeat is only sent after the important scheduled work succeeds.&lt;/p&gt;

&lt;p&gt;You can store the ping URL as a protected CI/CD variable in GitLab:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;QUIETPULSE_PING_URL=https://quietpulse.xyz/ping/YOUR_TOKEN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then your pipeline can reference it without hardcoding the URL in the repository.&lt;/p&gt;

&lt;p&gt;If you have multiple scheduled pipelines, use separate heartbeat checks.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;NIGHTLY_TESTS_PING_URL&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;DAILY_REPORT_PING_URL&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;WEEKLY_SCAN_PING_URL&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;HOURLY_SYNC_PING_URL&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not reuse one heartbeat URL for unrelated schedules. A weekly scan and an hourly sync have different expectations. Sharing a monitor between them makes alerts confusing and can hide failures.&lt;/p&gt;

&lt;p&gt;Instead of building alerting around GitLab schedule history yourself, you can use a simple heartbeat monitoring tool like QuietPulse. Create one check per scheduled pipeline, put the ping URL at the end of the successful job, and get alerted if the signal does not arrive on time. The important part is not the tool name; it is monitoring the actual completion signal.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Pinging at the start of the pipeline
&lt;/h3&gt;

&lt;p&gt;This is the most common mistake.&lt;/p&gt;

&lt;p&gt;If your first job sends the heartbeat and then the important work fails, your monitoring is lying to you.&lt;/p&gt;

&lt;p&gt;Bad pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;scheduled_start_ping&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;curl "$QUIETPULSE_PING_URL"&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./run-important-job.sh&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;scheduled_job&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./run-important-job.sh&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;curl --fail --silent --show-error "$QUIETPULSE_PING_URL"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ping should mean success, not “the pipeline started.”&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Monitoring only GitLab availability
&lt;/h3&gt;

&lt;p&gt;GitLab being reachable does not mean your scheduled pipeline ran.&lt;/p&gt;

&lt;p&gt;A status page or uptime check can tell you whether GitLab is generally available. It cannot tell you whether your specific project schedule fired, whether the correct jobs ran, or whether your nightly task finished successfully.&lt;/p&gt;

&lt;p&gt;Scheduled work needs job-level monitoring.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Relying on someone to check pipeline history
&lt;/h3&gt;

&lt;p&gt;Pipeline history is useful, but it is passive.&lt;/p&gt;

&lt;p&gt;If the workflow depends on a human remembering to open GitLab and inspect yesterday’s scheduled run, the monitoring system is really just hope with a dashboard.&lt;/p&gt;

&lt;p&gt;Dashboards are for investigation. Alerts are for detection.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Using one monitor for many different jobs
&lt;/h3&gt;

&lt;p&gt;It is tempting to create one generic “GitLab scheduled jobs” monitor.&lt;/p&gt;

&lt;p&gt;That becomes messy quickly.&lt;/p&gt;

&lt;p&gt;If an alert fires, which job failed? The nightly tests? The weekly scan? The report builder? The cleanup script?&lt;/p&gt;

&lt;p&gt;Use separate heartbeat checks for separate responsibilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Ignoring skipped jobs
&lt;/h3&gt;

&lt;p&gt;A GitLab pipeline can “succeed” while the job you cared about was skipped because of &lt;code&gt;rules&lt;/code&gt;, &lt;code&gt;only&lt;/code&gt;, &lt;code&gt;except&lt;/code&gt;, branch filters, variables, or a config change.&lt;/p&gt;

&lt;p&gt;For scheduled pipeline monitoring, make sure the heartbeat depends on the actual jobs that prove the scheduled task succeeded.&lt;/p&gt;

&lt;p&gt;If the important job is skipped but the heartbeat still runs, your monitor will miss the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternative approaches
&lt;/h2&gt;

&lt;p&gt;Heartbeat monitoring is usually the simplest way to detect missed scheduled pipelines, but it is not the only signal you can use.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitLab pipeline notifications
&lt;/h3&gt;

&lt;p&gt;GitLab can notify users about failed pipelines. This is useful, especially for failures that GitLab clearly detects.&lt;/p&gt;

&lt;p&gt;The limitation is that notification settings can be noisy, personal, or easy to ignore. They also may not cover the case where the expected scheduled pipeline never runs or the important job is skipped.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitLab API checks
&lt;/h3&gt;

&lt;p&gt;You can build a script that calls the GitLab API and checks the latest scheduled pipeline status.&lt;/p&gt;

&lt;p&gt;This gives you more control. For example, you can query the last pipeline for a schedule and alert if it is too old or failed.&lt;/p&gt;

&lt;p&gt;The tradeoff is complexity. You now need another scheduled job to check the scheduled job, plus authentication, API handling, retries, and alert routing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Logs and artifacts
&lt;/h3&gt;

&lt;p&gt;Logs and artifacts are excellent for debugging.&lt;/p&gt;

&lt;p&gt;They can show why a scheduled pipeline failed, which command broke, and what output was produced.&lt;/p&gt;

&lt;p&gt;But logs are not enough for detection. A log file sitting in GitLab does not help if nobody knows they need to look at it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Uptime checks
&lt;/h3&gt;

&lt;p&gt;Uptime checks are good for public HTTP endpoints.&lt;/p&gt;

&lt;p&gt;They are not enough for scheduled pipelines.&lt;/p&gt;

&lt;p&gt;A website can be up while a scheduled pipeline is missing. Your production API can respond successfully while nightly tests have not run in three days.&lt;/p&gt;

&lt;p&gt;Use uptime monitoring for availability. Use heartbeat monitoring for scheduled work.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitLab status badges
&lt;/h3&gt;

&lt;p&gt;Pipeline badges are useful visual indicators in README files or dashboards.&lt;/p&gt;

&lt;p&gt;But they are not alerts. They also usually show the latest pipeline status, which might not represent a specific scheduled workflow.&lt;/p&gt;

&lt;p&gt;A badge can be green while a separate scheduled job is missing.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is GitLab scheduled pipeline monitoring?
&lt;/h3&gt;

&lt;p&gt;GitLab scheduled pipeline monitoring is the practice of checking that scheduled GitLab CI/CD pipelines run and complete successfully on time. It helps detect missed, failed, skipped, or silently broken scheduled jobs before they cause production issues.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I know if a GitLab scheduled pipeline did not run?
&lt;/h3&gt;

&lt;p&gt;The safest approach is to use a completion heartbeat. Put a ping at the end of the scheduled job or final success stage. If the expected ping does not arrive within the schedule window, the pipeline either did not run, failed before completion, or skipped the success path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can GitLab notify me when a scheduled pipeline fails?
&lt;/h3&gt;

&lt;p&gt;Yes, GitLab can send pipeline notifications, and those are useful. But they may not catch every silent failure mode, especially when a schedule is disabled, jobs are skipped, or alerts depend on individual notification preferences. A heartbeat check gives you an external signal that the scheduled work completed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I monitor every GitLab scheduled pipeline separately?
&lt;/h3&gt;

&lt;p&gt;Usually, yes. Each scheduled pipeline with a different responsibility or schedule should have its own monitor. This makes alerts easier to understand and prevents one successful job from hiding another failed or missed job.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where should I put the heartbeat ping in GitLab CI?
&lt;/h3&gt;

&lt;p&gt;Put the heartbeat ping after the important work succeeds. If you have multiple stages, place it in a final stage that depends on the jobs that must complete successfully. Do not put it at the beginning of the pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;GitLab scheduled pipelines are great for recurring CI/CD work, but they are easy to forget because they run in the background.&lt;/p&gt;

&lt;p&gt;A passing website uptime check does not prove that nightly tests ran. A green GitLab project does not prove that a scheduled cleanup, report, scan, or sync completed successfully.&lt;/p&gt;

&lt;p&gt;The practical fix is simple: make each important scheduled pipeline send a heartbeat after it finishes. If that signal does not arrive on time, alert someone.&lt;/p&gt;

&lt;p&gt;That turns a silent failure into a visible one — before it becomes a production surprise.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://quietpulse.xyz/blog/gitlab-scheduled-pipeline-monitoring" rel="noopener noreferrer"&gt;https://quietpulse.xyz/blog/gitlab-scheduled-pipeline-monitoring&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>gitlab</category>
      <category>cicd</category>
      <category>monitoring</category>
      <category>cron</category>
    </item>
    <item>
      <title>Cloudflare Workers Cron Monitoring: How to Catch Missed Triggers Before They Break Production</title>
      <dc:creator>quietpulse</dc:creator>
      <pubDate>Wed, 06 May 2026 06:14:56 +0000</pubDate>
      <link>https://dev.to/quietpulse-social/cloudflare-workers-cron-monitoring-how-to-catch-missed-triggers-before-they-break-production-4mkb</link>
      <guid>https://dev.to/quietpulse-social/cloudflare-workers-cron-monitoring-how-to-catch-missed-triggers-before-they-break-production-4mkb</guid>
      <description>&lt;p&gt;Cloudflare Workers Cron Monitoring matters because scheduled edge jobs can fail quietly while the rest of your app looks healthy.&lt;/p&gt;

&lt;p&gt;Your website can be up. Your API can return &lt;code&gt;200 OK&lt;/code&gt;. The Worker can be deployed. But the Cron Trigger that refreshes cached data, syncs records, sends reports, or cleans old state may have stopped completing successfully hours ago.&lt;/p&gt;

&lt;p&gt;That is the monitoring gap with cron-like systems: normal uptime checks tell you whether a public endpoint responds. They do not tell you whether scheduled background work actually happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Cloudflare Workers Cron Triggers are commonly used for small but important recurring tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;refreshing cached data&lt;/li&gt;
&lt;li&gt;syncing from third-party APIs&lt;/li&gt;
&lt;li&gt;generating reports&lt;/li&gt;
&lt;li&gt;cleaning expired records&lt;/li&gt;
&lt;li&gt;updating search indexes&lt;/li&gt;
&lt;li&gt;sending webhook retries&lt;/li&gt;
&lt;li&gt;warming edge data before traffic arrives&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many of these jobs do not have a public URL. Cloudflare invokes the Worker on a schedule, the code runs, and the result is visible only through logs, metrics, or downstream state.&lt;/p&gt;

&lt;p&gt;If the job stops running or fails halfway through, your normal uptime monitor may stay green.&lt;/p&gt;

&lt;p&gt;That is a silent failure.&lt;/p&gt;

&lt;p&gt;The system is not fully down, but something important stopped happening.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it happens
&lt;/h2&gt;

&lt;p&gt;A scheduled Cloudflare Worker can fail for several practical reasons.&lt;/p&gt;

&lt;p&gt;Configuration can be wrong. The cron expression may not match the intended schedule. The trigger may exist in staging but not production. A deployment may accidentally remove or change the scheduled handler.&lt;/p&gt;

&lt;p&gt;Runtime code can fail. The Worker may throw while calling an API, parsing JSON, writing to KV, D1, R2, or an external database.&lt;/p&gt;

&lt;p&gt;Dependencies can fail. Third-party APIs can return errors, rate limits, malformed responses, or slow timeouts.&lt;/p&gt;

&lt;p&gt;Jobs can also partially succeed. A Worker may process some records, skip others, log an error, and exit in a way nobody notices until stale data shows up.&lt;/p&gt;

&lt;p&gt;A simple scheduled Worker might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;scheduled&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;refreshCache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That code may be fine. But nothing in it tells you that &lt;code&gt;refreshCache()&lt;/code&gt; completed successfully every time it was expected to run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it's dangerous
&lt;/h2&gt;

&lt;p&gt;Missed Cron Triggers usually break business logic, not basic availability.&lt;/p&gt;

&lt;p&gt;A failed scheduled job can mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stale data remains visible&lt;/li&gt;
&lt;li&gt;reports are not generated&lt;/li&gt;
&lt;li&gt;usage is not synced&lt;/li&gt;
&lt;li&gt;cleanup tasks do not run&lt;/li&gt;
&lt;li&gt;exports are missing&lt;/li&gt;
&lt;li&gt;old records pile up&lt;/li&gt;
&lt;li&gt;customers see outdated information&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The delay is what makes it painful.&lt;/p&gt;

&lt;p&gt;If a public API goes down, someone notices quickly. If an hourly scheduled Worker fails silently, the first symptom may appear much later. By then you are digging through logs and trying to reconstruct what happened.&lt;/p&gt;

&lt;p&gt;Logs help with investigation. They do not always help with detection.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to detect it
&lt;/h2&gt;

&lt;p&gt;The simplest detection pattern is heartbeat monitoring.&lt;/p&gt;

&lt;p&gt;Instead of asking, “Is my website up?”, heartbeat monitoring asks:&lt;/p&gt;

&lt;p&gt;Did this specific scheduled job finish successfully within the expected time window?&lt;/p&gt;

&lt;p&gt;For Cloudflare Workers Cron Monitoring, the flow is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a heartbeat check for the job schedule.&lt;/li&gt;
&lt;li&gt;Run the scheduled Worker normally.&lt;/li&gt;
&lt;li&gt;Send a heartbeat ping after the job completes successfully.&lt;/li&gt;
&lt;li&gt;Alert if the ping does not arrive on time.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key detail is that the ping should happen at the end, not the beginning.&lt;/p&gt;

&lt;p&gt;A heartbeat at the start only proves the Worker began running. It does not prove the work finished.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple solution
&lt;/h2&gt;

&lt;p&gt;Here is a basic Cloudflare Worker scheduled handler with a completion heartbeat:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;scheduled&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runScheduledJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;QUIETPULSE_PING_URL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;runScheduledJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.example.com/data&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`API request failed: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;saveDataSomewhere&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;saveDataSomewhere&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Write to KV, R2, D1, an external API, or another storage system.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The heartbeat URL can be stored as an environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://quietpulse.xyz/ping/YOUR_TOKEN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The job runs first. The heartbeat is sent only after the useful work completes.&lt;/p&gt;

&lt;p&gt;If the Worker throws before that point, the ping is not sent. The missing ping becomes the alert signal.&lt;/p&gt;

&lt;p&gt;A slightly more explicit version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;scheduled&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;refreshImportantData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;QUIETPULSE_PING_URL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Scheduled Worker failed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;refreshImportantData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.example.com/latest&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Upstream API failed with &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// Store or process the payload here.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If one Worker handles multiple Cron Triggers, use separate heartbeat checks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;scheduled&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cron&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;0 * * * *&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;hourlySync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HOURLY_SYNC_PING_URL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;0 2 * * *&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;dailyCleanup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DAILY_CLEANUP_PING_URL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="nl"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`No handler for cron: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cron&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Separate checks make alerts more useful. “Hourly sync missed a run” is better than “some scheduled Worker may have failed.”&lt;/p&gt;

&lt;p&gt;Instead of building all the heartbeat timing, grace periods, and alert delivery yourself, you can use a simple heartbeat monitoring tool like QuietPulse. Create a check, copy the ping URL, and call it after your Cloudflare Worker Cron Trigger finishes successfully. If the expected ping is missing, QuietPulse can notify you through the alert channels you configured.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Monitoring only the website
&lt;/h3&gt;

&lt;p&gt;A public uptime monitor does not prove that a scheduled Worker ran. Use uptime checks for public URLs and heartbeat checks for scheduled jobs.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Pinging before the work is done
&lt;/h3&gt;

&lt;p&gt;If you send the heartbeat at the start, the monitor can show success even when the job fails later.&lt;/p&gt;

&lt;p&gt;Send the ping after successful completion.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Swallowing errors and still pinging
&lt;/h3&gt;

&lt;p&gt;Avoid this pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runJob&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;QUIETPULSE_PING_URL&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The job failed, but the heartbeat still says success.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Sharing one monitor across unrelated jobs
&lt;/h3&gt;

&lt;p&gt;Different schedules should usually have different heartbeat checks. It makes alerts easier to understand and act on.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Forgetting time zones
&lt;/h3&gt;

&lt;p&gt;Be careful with cron expressions and expected run times. Document whether the schedule is intended to match UTC or a business timezone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternative approaches
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Logs
&lt;/h3&gt;

&lt;p&gt;Cloudflare logs are useful for debugging after an alert. They are less useful as the only way to notice a missed run.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dashboard metrics
&lt;/h3&gt;

&lt;p&gt;Metrics can show invocations and errors, but they may not map directly to “this business job completed successfully every hour.”&lt;/p&gt;

&lt;h3&gt;
  
  
  Downstream state checks
&lt;/h3&gt;

&lt;p&gt;You can monitor the output of the job, such as a timestamp in storage or a recently updated file. This is powerful but often more custom than a heartbeat ping.&lt;/p&gt;

&lt;h3&gt;
  
  
  Status endpoint
&lt;/h3&gt;

&lt;p&gt;Some teams expose an endpoint that reports the last successful run time. An external monitor checks whether that timestamp is fresh. This works well, but for simple jobs a heartbeat ping is usually less code.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Cloudflare Workers Cron Monitoring?
&lt;/h3&gt;

&lt;p&gt;Cloudflare Workers Cron Monitoring means checking whether scheduled Cloudflare Worker jobs run and complete successfully. Heartbeat monitoring is a common way to do this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can uptime monitoring detect missed Cloudflare Cron Triggers?
&lt;/h3&gt;

&lt;p&gt;Not reliably. Uptime monitoring checks public endpoints. A Cron Trigger can fail while the rest of your app stays online.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where should the heartbeat ping go?
&lt;/h3&gt;

&lt;p&gt;After the scheduled work finishes successfully. If the job fails, the success heartbeat should not be sent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should every Cron Trigger have its own heartbeat?
&lt;/h3&gt;

&lt;p&gt;Usually yes. Separate heartbeat checks make alerts clearer and easier to debug.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are logs enough?
&lt;/h3&gt;

&lt;p&gt;Logs are helpful for investigation, but they are not always enough for alerting. A heartbeat check detects the missing successful run directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Cloudflare Workers Cron Triggers are great for lightweight scheduled work, but they still need monitoring.&lt;/p&gt;

&lt;p&gt;If a job matters, make it report successful completion. Send a heartbeat after the work finishes, alert when the heartbeat is missing, and treat scheduled jobs as production systems — not background magic.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://quietpulse.xyz/blog/cloudflare-workers-cron-monitoring" rel="noopener noreferrer"&gt;https://quietpulse.xyz/blog/cloudflare-workers-cron-monitoring&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>cloudflare</category>
      <category>serverless</category>
      <category>monitoring</category>
      <category>devops</category>
    </item>
    <item>
      <title>Vercel Cron Monitoring: How to Catch Missed Executions Before They Break Production</title>
      <dc:creator>quietpulse</dc:creator>
      <pubDate>Tue, 05 May 2026 06:18:52 +0000</pubDate>
      <link>https://dev.to/quietpulse-social/vercel-cron-monitoring-how-to-catch-missed-executions-before-they-break-production-3mbn</link>
      <guid>https://dev.to/quietpulse-social/vercel-cron-monitoring-how-to-catch-missed-executions-before-they-break-production-3mbn</guid>
      <description>&lt;p&gt;Vercel Cron monitoring matters because scheduled serverless work is easy to forget once it “usually works.” You add a cron job to rebuild cached data, sync billing state, send reports, clean up expired records, or call an internal API every hour. It runs fine during testing. The deployment looks healthy. The website stays online.&lt;/p&gt;

&lt;p&gt;Then one day the scheduled work silently stops.&lt;/p&gt;

&lt;p&gt;No page goes down. No uptime monitor turns red. Users may not notice immediately. But your database starts drifting, stale records pile up, notifications stop sending, or an external integration falls behind. By the time someone spots the problem, the failure has already become operational debt.&lt;/p&gt;

&lt;p&gt;This is the awkward part of scheduled serverless work: the absence of a run is itself the failure. If nobody is watching for that absence, Vercel Cron Jobs can fail quietly.&lt;/p&gt;

&lt;p&gt;This guide explains why Vercel Cron Jobs can be missed or broken, why logs alone are not enough, and how to monitor them with heartbeat checks so you know when an expected execution does not happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Vercel Cron Jobs let you schedule HTTP requests to routes in your Vercel project. That makes them a convenient way to trigger small recurring jobs without running your own server.&lt;/p&gt;

&lt;p&gt;Common examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;refreshing cached API data&lt;/li&gt;
&lt;li&gt;syncing subscription or payment status&lt;/li&gt;
&lt;li&gt;sending daily email digests&lt;/li&gt;
&lt;li&gt;cleaning up expired sessions or tokens&lt;/li&gt;
&lt;li&gt;rebuilding search indexes&lt;/li&gt;
&lt;li&gt;pulling data from a third-party API&lt;/li&gt;
&lt;li&gt;checking whether external workflows are still healthy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The setup is usually simple. You define a schedule in &lt;code&gt;vercel.json&lt;/code&gt;, point it at an API route, deploy, and Vercel calls that route on schedule.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"crons"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/api/cron/sync-customers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"schedule"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0 * * * *"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That looks clean, but there is a monitoring gap.&lt;/p&gt;

&lt;p&gt;Your app can be online while the cron job is not doing useful work. The route can return a response while the real sync failed halfway through. The job can time out, hit a third-party rate limit, throw an exception, or stop being called after a config change.&lt;/p&gt;

&lt;p&gt;Traditional uptime monitoring checks whether a URL responds. Vercel Cron monitoring needs to answer a different question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Did the scheduled job actually run successfully when it was supposed to?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the answer is no, you need to know quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it happens
&lt;/h2&gt;

&lt;p&gt;Vercel Cron Jobs are reliable enough for many scheduled tasks, but they still live inside a real production system. That means they can break for boring, ordinary reasons.&lt;/p&gt;

&lt;p&gt;A cron route might fail because of application code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;an unhandled exception&lt;/li&gt;
&lt;li&gt;a changed database schema&lt;/li&gt;
&lt;li&gt;a missing environment variable&lt;/li&gt;
&lt;li&gt;an expired API token&lt;/li&gt;
&lt;li&gt;a timeout during a slow external request&lt;/li&gt;
&lt;li&gt;a deployment that changed route behavior&lt;/li&gt;
&lt;li&gt;a bad assumption about time zones or dates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It can also fail because of platform or configuration issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the cron path was renamed&lt;/li&gt;
&lt;li&gt;the route was deleted or moved&lt;/li&gt;
&lt;li&gt;the project was redeployed with an invalid &lt;code&gt;vercel.json&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;the schedule was changed accidentally&lt;/li&gt;
&lt;li&gt;the function exceeds execution limits&lt;/li&gt;
&lt;li&gt;the job depends on a third-party service that is unavailable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is also a subtle category: partial success.&lt;/p&gt;

&lt;p&gt;Imagine a cron route that syncs invoices from a billing provider. It starts correctly, fetches the first page, updates a few records, then crashes before processing the rest. Depending on how the handler is written, the response might still look successful or the failure might only appear in logs.&lt;/p&gt;

&lt;p&gt;Another common problem is assuming that “no alert” means “everything ran.” For scheduled jobs, no alert often just means nothing is checking whether the job happened.&lt;/p&gt;

&lt;p&gt;That is why Vercel Cron monitoring should not only look for route errors. It should detect missing successful executions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it's dangerous
&lt;/h2&gt;

&lt;p&gt;Missed cron executions rarely look dramatic at first. That is what makes them dangerous.&lt;/p&gt;

&lt;p&gt;If a public page goes down, someone notices. If a checkout flow breaks, customers complain. If a server crashes, metrics spike.&lt;/p&gt;

&lt;p&gt;But if a scheduled background task does not run, the damage is often delayed.&lt;/p&gt;

&lt;p&gt;A missed customer sync can leave billing state stale. A missed cleanup job can slowly fill a database table. A missed reporting job can make dashboards inaccurate. A missed notification job can break user trust without creating an obvious infrastructure incident.&lt;/p&gt;

&lt;p&gt;The risk is higher with serverless cron jobs because the system is distributed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the scheduler lives on the platform&lt;/li&gt;
&lt;li&gt;the handler lives in your app&lt;/li&gt;
&lt;li&gt;dependencies may live in external APIs&lt;/li&gt;
&lt;li&gt;logs may be spread across deployments&lt;/li&gt;
&lt;li&gt;retries may not match your business expectations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You need a signal that represents the thing you actually care about: successful completion.&lt;/p&gt;

&lt;p&gt;Not “the app is up.”&lt;/p&gt;

&lt;p&gt;Not “the route exists.”&lt;/p&gt;

&lt;p&gt;Not “there are logs somewhere.”&lt;/p&gt;

&lt;p&gt;The useful signal is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This scheduled job finished its expected work within the expected time window.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If that signal does not arrive, you should get an alert.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to detect it
&lt;/h2&gt;

&lt;p&gt;The most practical way to monitor Vercel Cron Jobs is heartbeat monitoring.&lt;/p&gt;

&lt;p&gt;A heartbeat is a small HTTP request your job sends after it completes successfully. An external monitor expects that request on a schedule. If the heartbeat does not arrive on time, the monitor alerts you.&lt;/p&gt;

&lt;p&gt;The key detail is where you place the heartbeat.&lt;/p&gt;

&lt;p&gt;Do not ping at the very beginning of the cron handler. If you do that, the monitor only knows the job started. It does not know whether the important work finished.&lt;/p&gt;

&lt;p&gt;Instead, send the heartbeat after the successful part of the job:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Vercel triggers your cron route.&lt;/li&gt;
&lt;li&gt;Your handler performs the scheduled work.&lt;/li&gt;
&lt;li&gt;The work completes successfully.&lt;/li&gt;
&lt;li&gt;The handler sends a heartbeat ping.&lt;/li&gt;
&lt;li&gt;The monitor resets the expected window.&lt;/li&gt;
&lt;li&gt;If no ping arrives next time, you get alerted.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This creates a much better Vercel Cron monitoring signal.&lt;/p&gt;

&lt;p&gt;For example, if a job runs every hour, you might configure the monitor to expect a ping every 60 minutes with a grace period of 10–15 minutes. If the job misses that window, it means the scheduled execution did not complete successfully.&lt;/p&gt;

&lt;p&gt;This catches problems like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the cron route was not called&lt;/li&gt;
&lt;li&gt;the handler crashed before completion&lt;/li&gt;
&lt;li&gt;the job timed out&lt;/li&gt;
&lt;li&gt;the deployment broke the route&lt;/li&gt;
&lt;li&gt;an external API caused the job to fail&lt;/li&gt;
&lt;li&gt;the code returned early before doing the real work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Heartbeat monitoring is especially useful because it detects silence. Logs and errors are helpful when something runs and fails loudly. Heartbeats catch the case where the expected success signal never arrives.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple solution
&lt;/h2&gt;

&lt;p&gt;Here is a simple Vercel Cron Job handler with a heartbeat ping after successful work.&lt;/p&gt;

&lt;p&gt;Example with a Next.js App Router route:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// app/api/cron/sync-customers/route.ts&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dynamic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;force-dynamic&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;syncCustomers&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Replace this with your real scheduled work.&lt;/span&gt;
  &lt;span class="c1"&gt;// For example: fetch customers from Stripe, update your database,&lt;/span&gt;
  &lt;span class="c1"&gt;// refresh cached records, or call an internal service.&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Syncing customers...&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Customer sync finished&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;sendHeartbeat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://quietpulse.xyz/ping/YOUR_TOKEN&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;no-store&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Heartbeat ping failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;GET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;authHeader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;authorization&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;authHeader&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CRON_SECRET&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Unauthorized&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;401&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;syncCustomers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Send the heartbeat only after the scheduled work succeeds.&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sendHeartbeat&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Cron job failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Cron job failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the matching &lt;code&gt;vercel.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"crons"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/api/cron/sync-customers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"schedule"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0 * * * *"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this pattern, the heartbeat is not a replacement for logs or error tracking. It is a separate completion signal.&lt;/p&gt;

&lt;p&gt;If the job succeeds, the monitor receives the ping. If the job does not run, crashes, times out, or fails before completion, the ping never arrives. That missing ping becomes the alert.&lt;/p&gt;

&lt;p&gt;You can build a heartbeat monitor yourself, but it is usually easier to use a small tool built for this. Instead of building scheduling windows, grace periods, and alert delivery from scratch, you can use a heartbeat monitoring tool like QuietPulse. Create a monitored job, copy the ping URL, place it after successful completion, and configure alerts through Telegram or webhooks.&lt;/p&gt;

&lt;p&gt;The important part is not the specific tool. The important part is that your Vercel Cron monitoring should watch for successful completion, not just route availability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Pinging before the work starts
&lt;/h3&gt;

&lt;p&gt;This is the most common mistake.&lt;/p&gt;

&lt;p&gt;If your cron handler sends the heartbeat at the top of the function, the monitor only knows that the route started. The real job may still fail afterward.&lt;/p&gt;

&lt;p&gt;Bad pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;GET&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://quietpulse.xyz/ping/YOUR_TOKEN&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;syncCustomers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;GET&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;syncCustomers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://quietpulse.xyz/ping/YOUR_TOKEN&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The heartbeat should represent success, not just execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Treating Vercel logs as monitoring
&lt;/h3&gt;

&lt;p&gt;Logs are useful when you already know something went wrong. They are not enough for missed execution detection.&lt;/p&gt;

&lt;p&gt;If nobody checks the logs, they do not alert you. If the job never runs, there may be no useful application log at all. And if the failure is hidden inside partial work, the logs might not make the problem obvious.&lt;/p&gt;

&lt;p&gt;Use logs for debugging. Use heartbeats for detection.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Ignoring function time limits
&lt;/h3&gt;

&lt;p&gt;Cron jobs often start small and grow over time. A job that once took five seconds may eventually take forty seconds, then several minutes.&lt;/p&gt;

&lt;p&gt;If your function approaches platform limits, it may fail before sending the heartbeat. That is good in the sense that monitoring catches it, but you should also treat duration growth as a design warning.&lt;/p&gt;

&lt;p&gt;Long-running jobs may need batching, pagination, queues, or a different execution environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Not protecting the cron route
&lt;/h3&gt;

&lt;p&gt;A Vercel Cron route is still an HTTP endpoint. If it triggers real production work, protect it.&lt;/p&gt;

&lt;p&gt;Use a secret header or token check so random requests cannot trigger the job manually. Vercel supports cron requests to your path, but your app should still validate that the request is expected.&lt;/p&gt;

&lt;p&gt;A simple bearer token check is often enough for small projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Using the wrong schedule window
&lt;/h3&gt;

&lt;p&gt;If your cron runs every hour, do not alert at exactly 60 minutes unless you are comfortable with occasional noise. Real systems have small delays.&lt;/p&gt;

&lt;p&gt;Use a grace period. For an hourly job, expecting a heartbeat every 60 minutes with a 10–15 minute grace period is often reasonable. For daily jobs, a larger grace period may make sense.&lt;/p&gt;

&lt;p&gt;The goal is to catch real misses without creating alert fatigue.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternative approaches
&lt;/h2&gt;

&lt;p&gt;Heartbeat monitoring is usually the cleanest signal for missed Vercel Cron Jobs, but it is not the only useful monitoring layer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vercel logs
&lt;/h3&gt;

&lt;p&gt;Vercel logs help you debug what happened inside a function. They can show errors, response status, runtime output, and timing information.&lt;/p&gt;

&lt;p&gt;They are good for investigation, but weaker for proactive detection. Logs answer “what happened?” after you look. Heartbeats answer “did the expected success happen?” automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Error tracking
&lt;/h3&gt;

&lt;p&gt;Tools like Sentry or similar error trackers are useful when your cron handler throws an exception.&lt;/p&gt;

&lt;p&gt;But missed executions do not always throw exceptions. If the route is not called, the schedule is wrong, or the function exits early without raising an error, error tracking may stay silent.&lt;/p&gt;

&lt;p&gt;Use error tracking for exceptions. Use heartbeat monitoring for missing success.&lt;/p&gt;

&lt;h3&gt;
  
  
  Uptime checks
&lt;/h3&gt;

&lt;p&gt;You can point an uptime monitor at the cron route, but that can be risky.&lt;/p&gt;

&lt;p&gt;A cron route often performs side effects. Calling it from an uptime monitor might trigger real work at the wrong time. If you create a separate health endpoint, that only tells you the app is reachable, not that the scheduled job completed.&lt;/p&gt;

&lt;p&gt;Uptime checks are great for public endpoints. They are not enough for scheduled background work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Database markers
&lt;/h3&gt;

&lt;p&gt;Some teams store a &lt;code&gt;last_success_at&lt;/code&gt; timestamp in the database and check it from an admin dashboard.&lt;/p&gt;

&lt;p&gt;This can work well, especially for internal systems. But you still need something to alert when the timestamp gets too old. Otherwise it becomes another value that nobody checks until after an incident.&lt;/p&gt;

&lt;p&gt;A heartbeat monitor is basically this idea turned into an external alerting mechanism.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How do I monitor Vercel Cron Jobs?
&lt;/h3&gt;

&lt;p&gt;The most practical approach is to send a heartbeat ping after your cron handler completes successfully. Configure an external monitor to expect that ping on the same schedule as your Vercel Cron Job. If the ping does not arrive within the expected window, you get alerted.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Vercel Cron monitoring different from uptime monitoring?
&lt;/h3&gt;

&lt;p&gt;Yes. Uptime monitoring checks whether an endpoint responds. Vercel Cron monitoring checks whether scheduled work completed successfully. Your app can be online while a cron job is missed, broken, or failing silently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where should I put the heartbeat ping in a Vercel Cron Job?
&lt;/h3&gt;

&lt;p&gt;Place the heartbeat ping after the important scheduled work succeeds. Do not put it at the beginning of the handler. A heartbeat should mean “the job completed,” not merely “the route started.”&lt;/p&gt;

&lt;h3&gt;
  
  
  What schedule should I use for heartbeat alerts?
&lt;/h3&gt;

&lt;p&gt;Match the heartbeat schedule to the cron schedule, then add a grace period. For example, if the cron runs every hour, you might alert after 70–75 minutes without a heartbeat. The right grace period depends on how much delay is acceptable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can Vercel logs catch missed cron executions?
&lt;/h3&gt;

&lt;p&gt;Logs help debug failures, but they are not reliable missed-run detection by themselves. If a cron job never runs, there may be no useful application log. Heartbeat monitoring is better for detecting absence.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Vercel Cron Jobs are a convenient way to run scheduled serverless work, but they still need monitoring.&lt;/p&gt;

&lt;p&gt;The dangerous failures are not always loud. Sometimes the job simply does not run, exits early, times out, or fails before completing the important work. Your app may stay online while the scheduled task quietly stops doing its job.&lt;/p&gt;

&lt;p&gt;Good Vercel Cron monitoring should focus on successful completion. Add a heartbeat ping after the cron handler finishes its real work, configure an expected schedule and grace period, and alert when the ping goes missing.&lt;/p&gt;

&lt;p&gt;That simple signal turns silent missed executions into visible, actionable failures.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://quietpulse.xyz/blog/vercel-cron-monitoring" rel="noopener noreferrer"&gt;https://quietpulse.xyz/blog/vercel-cron-monitoring&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>vercel</category>
      <category>cron</category>
      <category>monitoring</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Zapier Monitoring: How to Catch Silent Automation Failures</title>
      <dc:creator>quietpulse</dc:creator>
      <pubDate>Mon, 04 May 2026 06:11:39 +0000</pubDate>
      <link>https://dev.to/quietpulse-social/zapier-monitoring-how-to-catch-silent-automation-failures-4b4d</link>
      <guid>https://dev.to/quietpulse-social/zapier-monitoring-how-to-catch-silent-automation-failures-4b4d</guid>
      <description>&lt;p&gt;Zapier monitoring sounds simple until an important Zap quietly stops doing its job.&lt;/p&gt;

&lt;p&gt;Maybe a lead should be copied from a form into your CRM. Maybe an invoice should trigger a Slack message. Maybe a paid signup should create a user record, tag the customer, and notify your team. When everything works, nobody thinks about it.&lt;/p&gt;

&lt;p&gt;The problem is that automation failures are often silent. A Zap can be turned off, skipped because of a changed field, blocked by an expired token, or delayed long enough that nobody notices until the downstream mess is already real.&lt;/p&gt;

&lt;p&gt;This guide explains how to monitor Zapier Zaps in a practical way, what usually breaks, and how heartbeat monitoring can help you detect missing automation runs before users or customers find the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Zapier is great at connecting tools quickly. That is also why it often becomes part of production workflows without being treated like production infrastructure.&lt;/p&gt;

&lt;p&gt;A typical Zap might do something like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;New form submission in Typeform&lt;/li&gt;
&lt;li&gt;Create contact in HubSpot&lt;/li&gt;
&lt;li&gt;Add row to Google Sheets&lt;/li&gt;
&lt;li&gt;Send Slack notification&lt;/li&gt;
&lt;li&gt;Add subscriber to Mailchimp&lt;/li&gt;
&lt;li&gt;Trigger an internal webhook&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On paper, this is simple. In reality, the Zap may be responsible for sales, support, onboarding, reporting, billing operations, or customer communication.&lt;/p&gt;

&lt;p&gt;The dangerous part is not always a visible error. The dangerous part is missing work.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A customer fills out a form, but no CRM contact is created.&lt;/li&gt;
&lt;li&gt;A payment happens, but the onboarding message is never sent.&lt;/li&gt;
&lt;li&gt;A support escalation is created, but nobody gets notified.&lt;/li&gt;
&lt;li&gt;A daily sync should run every morning, but stops for three days.&lt;/li&gt;
&lt;li&gt;A webhook step silently fails because the receiving app changed its schema.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If nobody is watching for expected Zap activity, the workflow can look “fine” from the outside while important business operations are stuck.&lt;/p&gt;

&lt;p&gt;That is the core Zapier monitoring problem: you do not only need to know when a Zap errors. You need to know when expected automation work does not happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it happens
&lt;/h2&gt;

&lt;p&gt;Zapier workflows can fail or stop for many normal reasons.&lt;/p&gt;

&lt;p&gt;The most common ones are not dramatic. They are small operational issues that accumulate quietly.&lt;/p&gt;

&lt;p&gt;One common cause is account authentication. A connected app token expires, permissions change, or someone removes access from the external service. The Zap may stop at the affected step until the account is reconnected.&lt;/p&gt;

&lt;p&gt;Another common cause is input shape changes. If a form field is renamed, a CRM property is removed, or a webhook payload changes, later Zap steps may no longer receive the data they expect.&lt;/p&gt;

&lt;p&gt;Filters and paths are another source of confusion. A Zap can trigger correctly but skip the important action because a filter condition no longer matches. From a monitoring perspective, that is tricky: the Zap technically ran, but the business outcome did not happen.&lt;/p&gt;

&lt;p&gt;Rate limits can also create partial failures. A busy workflow may hit API limits in Google Sheets, Slack, HubSpot, Airtable, or another connected app. Some steps may retry, delay, or fail depending on the integration.&lt;/p&gt;

&lt;p&gt;Scheduled Zaps have their own problems. A daily or hourly automation can be disabled, delayed, or misconfigured. If it runs at 06:00 every morning and stops, there may be no obvious signal unless you explicitly check for the run.&lt;/p&gt;

&lt;p&gt;Human changes matter too. Someone can edit a Zap, turn it off during debugging, change a filter, remove a step, or switch accounts. The change may be reasonable at the time, but the workflow can stay broken longer than expected.&lt;/p&gt;

&lt;p&gt;This is why Zapier monitoring needs to focus on the actual expected signal: did the automation complete the work it was supposed to complete within the expected time window?&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it's dangerous
&lt;/h2&gt;

&lt;p&gt;Silent Zap failures are dangerous because they usually sit between systems.&lt;/p&gt;

&lt;p&gt;When a backend job fails, you may see an error log. When a website goes down, an uptime monitor catches it. But when a Zap misses a business action, the symptom often appears somewhere else later.&lt;/p&gt;

&lt;p&gt;A missed CRM sync becomes a sales follow-up problem.&lt;/p&gt;

&lt;p&gt;A missed Slack notification becomes a support response problem.&lt;/p&gt;

&lt;p&gt;A missed spreadsheet update becomes a reporting problem.&lt;/p&gt;

&lt;p&gt;A missed webhook delivery becomes a customer onboarding problem.&lt;/p&gt;

&lt;p&gt;A missed daily automation becomes a pile of stale data.&lt;/p&gt;

&lt;p&gt;These failures are especially painful for small teams and indie products because Zapier often fills gaps between tools. It is not “just automation.” It is glue code, except the code lives in a visual workflow builder.&lt;/p&gt;

&lt;p&gt;The risk is higher when Zaps are used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lead capture&lt;/li&gt;
&lt;li&gt;Customer onboarding&lt;/li&gt;
&lt;li&gt;Payment and billing operations&lt;/li&gt;
&lt;li&gt;Support routing&lt;/li&gt;
&lt;li&gt;Internal alerts&lt;/li&gt;
&lt;li&gt;Daily reports&lt;/li&gt;
&lt;li&gt;Data synchronization&lt;/li&gt;
&lt;li&gt;No-code backend workflows&lt;/li&gt;
&lt;li&gt;Webhook-based integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The incident can also be hard to reconstruct. Zapier task history helps, but only after someone knows what to look for. If you discover the issue days later, you may need to replay data manually, deduplicate records, contact customers, or rebuild state across several tools.&lt;/p&gt;

&lt;p&gt;Good Zapier monitoring reduces the detection time. It does not make every integration perfect, but it gives you a fast signal when expected automation stops happening.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to detect it
&lt;/h2&gt;

&lt;p&gt;The simplest monitoring model is this:&lt;/p&gt;

&lt;p&gt;If a Zap is expected to run regularly, it should emit a signal when it successfully reaches the important point.&lt;/p&gt;

&lt;p&gt;That signal is usually called a heartbeat.&lt;/p&gt;

&lt;p&gt;A heartbeat is just a small HTTP request that says, “this workflow reached this point.” If the heartbeat does not arrive within the expected interval, your monitor alerts you.&lt;/p&gt;

&lt;p&gt;This is different from only checking Zapier task history.&lt;/p&gt;

&lt;p&gt;Task history tells you what happened inside Zapier. Heartbeat monitoring tells you whether the expected external signal arrived on time.&lt;/p&gt;

&lt;p&gt;For scheduled Zaps, this is very straightforward:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Zap should run every hour.&lt;/li&gt;
&lt;li&gt;Add a webhook step near the end of the Zap.&lt;/li&gt;
&lt;li&gt;The webhook calls a heartbeat URL.&lt;/li&gt;
&lt;li&gt;If the heartbeat is missing for more than, for example, 75 minutes, alert someone.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For event-driven Zaps, the pattern depends on expected volume.&lt;/p&gt;

&lt;p&gt;If a Zap should run many times per day, you can monitor for activity gaps. For example, if your lead capture Zap normally runs every few hours during business days, a full day without any signal may be suspicious.&lt;/p&gt;

&lt;p&gt;If the Zap handles critical but irregular events, you can monitor a companion scheduled check instead. For example, a scheduled Zap can query whether new records are being processed and ping a heartbeat when the check completes.&lt;/p&gt;

&lt;p&gt;The key is to monitor completion, not just start.&lt;/p&gt;

&lt;p&gt;A heartbeat at the beginning of a Zap proves only that the Zap started. A heartbeat near the end proves that the important steps completed before the signal was sent.&lt;/p&gt;

&lt;p&gt;For Zapier workflows, a good heartbeat step is usually placed after:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The CRM record is created&lt;/li&gt;
&lt;li&gt;The notification is sent&lt;/li&gt;
&lt;li&gt;The spreadsheet row is written&lt;/li&gt;
&lt;li&gt;The webhook succeeds&lt;/li&gt;
&lt;li&gt;The data sync finishes&lt;/li&gt;
&lt;li&gt;The final important action is complete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gives you a practical signal for “the automation actually did the thing.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple solution (with example)
&lt;/h2&gt;

&lt;p&gt;Zapier can call external URLs using Webhooks by Zapier.&lt;/p&gt;

&lt;p&gt;A simple monitoring setup looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a monitor for the Zap.&lt;/li&gt;
&lt;li&gt;Copy the heartbeat URL.&lt;/li&gt;
&lt;li&gt;Add a Webhooks by Zapier step near the end of the Zap.&lt;/li&gt;
&lt;li&gt;Configure it to make a GET request to the heartbeat URL.&lt;/li&gt;
&lt;li&gt;Set the expected schedule or grace period in your monitoring tool.&lt;/li&gt;
&lt;li&gt;Alert if the heartbeat does not arrive on time.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example heartbeat URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://quietpulse.xyz/ping/{token}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Zapier, add an action step:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;App: Webhooks by Zapier&lt;/li&gt;
&lt;li&gt;Event: GET&lt;/li&gt;
&lt;li&gt;URL: &lt;code&gt;https://quietpulse.xyz/ping/{token}&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Payload Type: leave default unless needed&lt;/li&gt;
&lt;li&gt;Headers: usually not required&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Place this step after the critical work.&lt;/p&gt;

&lt;p&gt;For example, imagine a Zap that handles new paid signups:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Trigger: new successful payment&lt;/li&gt;
&lt;li&gt;Create customer in CRM&lt;/li&gt;
&lt;li&gt;Add customer to onboarding list&lt;/li&gt;
&lt;li&gt;Send Slack notification&lt;/li&gt;
&lt;li&gt;Call heartbeat URL&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The heartbeat should be last because that is the signal that the important automation path completed.&lt;/p&gt;

&lt;p&gt;If the Zap does not run, the heartbeat is missing.&lt;/p&gt;

&lt;p&gt;If the Zap fails before the final step, the heartbeat is missing.&lt;/p&gt;

&lt;p&gt;If someone turns the Zap off, the heartbeat is missing.&lt;/p&gt;

&lt;p&gt;If an app authorization breaks, the heartbeat is missing.&lt;/p&gt;

&lt;p&gt;That missing signal is what creates the alert.&lt;/p&gt;

&lt;p&gt;Instead of building this yourself, you can use a simple heartbeat monitoring tool like QuietPulse. Create a monitored job, copy the ping URL, and add it as a final Webhooks by Zapier step. If the expected ping does not arrive, QuietPulse can alert you before the broken automation quietly damages the rest of your workflow.&lt;/p&gt;

&lt;p&gt;For scheduled Zaps, choose an interval slightly longer than the expected schedule. If the Zap runs hourly, a 75- or 90-minute threshold is often safer than exactly 60 minutes because automation platforms can have delays.&lt;/p&gt;

&lt;p&gt;For daily Zaps, add a reasonable grace period too. If a Zap should run at 06:00, alerting at 06:01 may create noise. Alerting after 07:00 or 08:00 may be more practical depending on the workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Monitoring only the trigger
&lt;/h3&gt;

&lt;p&gt;A Zap trigger firing does not mean the workflow completed.&lt;/p&gt;

&lt;p&gt;If you ping at the start, the monitor may stay green even when later steps fail. Put the heartbeat after the important action, not before it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Treating Zapier errors as the only failure mode
&lt;/h3&gt;

&lt;p&gt;Zapier task errors are useful, but they do not cover every business failure.&lt;/p&gt;

&lt;p&gt;A Zap can skip work because of filters, paths, changed data, or logic that no longer matches reality. Monitor the expected outcome, not just platform errors.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Using no grace period
&lt;/h3&gt;

&lt;p&gt;Automation platforms can be delayed.&lt;/p&gt;

&lt;p&gt;If a scheduled Zap runs every hour, do not alert the second the hour passes. Use a grace period that reflects real-world delays while still catching problems quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Forgetting about low-volume workflows
&lt;/h3&gt;

&lt;p&gt;Some Zaps do not run often, but they are still critical.&lt;/p&gt;

&lt;p&gt;For irregular workflows, consider a scheduled audit Zap that checks whether source and destination systems are in sync, then sends a heartbeat when the audit completes.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Not documenting ownership
&lt;/h3&gt;

&lt;p&gt;When a Zap fails, who fixes it?&lt;/p&gt;

&lt;p&gt;Many no-code automations are created by one person and later become team infrastructure. Keep a short note with the owner, expected schedule, connected apps, and what the heartbeat means.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternative approaches
&lt;/h2&gt;

&lt;p&gt;Heartbeat monitoring is useful, but it is not the only signal.&lt;/p&gt;

&lt;p&gt;Zapier task history is still important. It helps you inspect failed tasks, replay data, and debug specific steps. The limitation is that someone has to look there or rely on Zapier's built-in notifications.&lt;/p&gt;

&lt;p&gt;Zapier built-in alerts can catch some platform-level failures. They are a good baseline, especially for broken app connections or task errors. But they may not tell you that expected business work is missing.&lt;/p&gt;

&lt;p&gt;Destination-system checks are another option. For example, you can check whether a CRM received new leads, whether a spreadsheet has fresh rows, or whether Slack messages were sent. This can be powerful, but it usually requires more custom logic.&lt;/p&gt;

&lt;p&gt;Logs can help if your Zap calls an internal service. If you own the receiving API, log every incoming Zapier request and monitor error rates. This is useful for webhook-heavy workflows, but less useful for purely no-code flows between third-party apps.&lt;/p&gt;

&lt;p&gt;Manual review is sometimes enough for low-risk workflows. For example, a weekly personal productivity automation may not need alerting. But if the Zap affects customers, revenue, support, or production data, manual review is usually too slow.&lt;/p&gt;

&lt;p&gt;A practical setup often combines several layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zapier built-in error notifications&lt;/li&gt;
&lt;li&gt;Task history for debugging&lt;/li&gt;
&lt;li&gt;Heartbeat monitoring for missing runs&lt;/li&gt;
&lt;li&gt;Destination checks for critical data syncs&lt;/li&gt;
&lt;li&gt;Clear ownership and documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives you both fast detection and enough context to fix the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Zapier monitoring?
&lt;/h3&gt;

&lt;p&gt;Zapier monitoring means tracking whether your Zaps are running and completing the work they are supposed to do. Good monitoring does not only look for task errors. It also detects missing runs, skipped workflows, delayed automations, and broken downstream actions.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I know if a Zapier Zap stopped running?
&lt;/h3&gt;

&lt;p&gt;For scheduled Zaps, add a heartbeat ping near the end of the workflow and alert when the ping does not arrive on time. You can also check Zapier task history, connected app errors, and whether the destination system received the expected data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can Zapier send a heartbeat ping?
&lt;/h3&gt;

&lt;p&gt;Yes. You can use Webhooks by Zapier to send a GET request to a heartbeat URL such as &lt;code&gt;https://quietpulse.xyz/ping/{token}&lt;/code&gt;. Put that step after the critical work so the ping means the Zap completed successfully.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Zapier task history enough for monitoring?
&lt;/h3&gt;

&lt;p&gt;Zapier task history is useful for debugging, but it is not always enough for proactive monitoring. It helps explain what happened after you look, but heartbeat monitoring can alert you when an expected Zap run or completion signal is missing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where should I place the heartbeat step in a Zap?
&lt;/h3&gt;

&lt;p&gt;Place the heartbeat step near the end of the Zap, after the most important action. If you ping at the beginning, your monitor may stay green even when later steps fail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Zapier automations are often more important than they look.&lt;/p&gt;

&lt;p&gt;If a Zap moves leads, customers, payments, support tickets, reports, or internal alerts, it deserves monitoring like any other production workflow.&lt;/p&gt;

&lt;p&gt;The most reliable pattern is simple: define what “successful completion” means, send a heartbeat when the Zap reaches that point, and alert when the heartbeat is missing.&lt;/p&gt;

&lt;p&gt;That turns silent automation failures into visible, fixable problems.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://quietpulse.xyz/blog/zapier-monitoring" rel="noopener noreferrer"&gt;https://quietpulse.xyz/blog/zapier-monitoring&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>zapier</category>
      <category>automation</category>
      <category>monitoring</category>
      <category>devops</category>
    </item>
    <item>
      <title>Make Scenario Monitoring: How to Catch Silent Automation Failures</title>
      <dc:creator>quietpulse</dc:creator>
      <pubDate>Sun, 03 May 2026 07:38:39 +0000</pubDate>
      <link>https://dev.to/quietpulse-social/make-scenario-monitoring-how-to-catch-silent-automation-failures-59hb</link>
      <guid>https://dev.to/quietpulse-social/make-scenario-monitoring-how-to-catch-silent-automation-failures-59hb</guid>
      <description>&lt;p&gt;Make scenario monitoring is easy to overlook until an automation silently stops running.&lt;/p&gt;

&lt;p&gt;A Make.com scenario might sync leads, update a CRM, send reports, copy invoices, or notify a team when something important happens. When it works, it feels invisible. When it breaks quietly, the damage can build up for hours or days.&lt;/p&gt;

&lt;p&gt;The key is to monitor for missing successful runs, not only visible errors.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Make.com scenarios often become business-critical glue between tools.&lt;/p&gt;

&lt;p&gt;They might:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;copy form submissions into a CRM&lt;/li&gt;
&lt;li&gt;sync orders into a spreadsheet&lt;/li&gt;
&lt;li&gt;send Slack alerts&lt;/li&gt;
&lt;li&gt;update Airtable or Notion&lt;/li&gt;
&lt;li&gt;trigger onboarding emails&lt;/li&gt;
&lt;li&gt;generate daily reports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem is that many automation failures are quiet.&lt;/p&gt;

&lt;p&gt;A scenario can be disabled, a schedule can be wrong, an app connection can expire, or an upstream webhook can stop sending events. Sometimes the scenario runs, but a filter or router path prevents useful work from happening.&lt;/p&gt;

&lt;p&gt;If nobody checks, the failure can remain hidden.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it happens
&lt;/h2&gt;

&lt;p&gt;Make scenarios depend on many moving parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;connected app credentials&lt;/li&gt;
&lt;li&gt;third-party APIs&lt;/li&gt;
&lt;li&gt;schedules and timezones&lt;/li&gt;
&lt;li&gt;webhook payloads&lt;/li&gt;
&lt;li&gt;filters and routers&lt;/li&gt;
&lt;li&gt;account limits and quotas&lt;/li&gt;
&lt;li&gt;human configuration changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Any of these can change after the scenario was originally built.&lt;/p&gt;

&lt;p&gt;A CRM token expires. A Google Sheets column is renamed. A teammate pauses a scenario for testing. A SaaS API starts returning rate limits. A webhook sender changes its payload shape.&lt;/p&gt;

&lt;p&gt;The automation platform may still be online, but your specific workflow is no longer doing its job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it's dangerous
&lt;/h2&gt;

&lt;p&gt;Silent automation failures are dangerous because they rarely look urgent at first.&lt;/p&gt;

&lt;p&gt;Your website is still up. Your dashboard may still be green. Nobody sees a crash screen.&lt;/p&gt;

&lt;p&gt;But the work is not happening.&lt;/p&gt;

&lt;p&gt;That can mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;missed leads&lt;/li&gt;
&lt;li&gt;stale customer records&lt;/li&gt;
&lt;li&gt;incomplete finance reports&lt;/li&gt;
&lt;li&gt;delayed onboarding&lt;/li&gt;
&lt;li&gt;missing support notifications&lt;/li&gt;
&lt;li&gt;bad data in downstream systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The longer the failure stays hidden, the more manual cleanup it creates.&lt;/p&gt;

&lt;p&gt;For small teams, this is especially painful because Make scenarios often replace custom backend jobs. They may be no-code workflows, but they still handle production responsibilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to detect it
&lt;/h2&gt;

&lt;p&gt;The most practical way to detect silent failures is heartbeat monitoring.&lt;/p&gt;

&lt;p&gt;A heartbeat is a small signal sent when a job or workflow reaches an important successful point. If the signal arrives on time, the workflow probably ran. If it does not arrive, something needs attention.&lt;/p&gt;

&lt;p&gt;For Make scenario monitoring, add the heartbeat near the end of the scenario, after the important work completes.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;after leads are copied into the CRM&lt;/li&gt;
&lt;li&gt;after a report is generated&lt;/li&gt;
&lt;li&gt;after invoices are synced&lt;/li&gt;
&lt;li&gt;after a Slack notification is sent&lt;/li&gt;
&lt;li&gt;after a batch of records is processed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This turns silence into something you can alert on.&lt;/p&gt;

&lt;p&gt;If the scenario is disabled, the heartbeat stops. If the schedule is wrong, the heartbeat is late. If an earlier module fails, the heartbeat never sends.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple solution (with example)
&lt;/h2&gt;

&lt;p&gt;Add an HTTP request module at the end of the Make scenario.&lt;/p&gt;

&lt;p&gt;Example heartbeat URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://quietpulse.xyz/ping/YOUR_TOKEN_HERE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A simple scenario might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Scheduler trigger
  → Search new rows in Google Sheets
  → Create or update contacts in CRM
  → Send Slack summary
  → HTTP request: GET https://quietpulse.xyz/ping/YOUR_TOKEN_HERE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Outside Make, the same ping would look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsS&lt;/span&gt; https://quietpulse.xyz/ping/YOUR_TOKEN_HERE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Make's HTTP module, use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Method: &lt;code&gt;GET&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;URL: &lt;code&gt;https://quietpulse.xyz/ping/YOUR_TOKEN_HERE&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Body: empty&lt;/li&gt;
&lt;li&gt;Headers: usually none required&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Place the heartbeat after the work you actually care about.&lt;/p&gt;

&lt;p&gt;If the scenario runs every hour, alert after something like 90 minutes without a ping. If it runs daily at 02:00, alert if no ping arrives by 03:00 or 04:00. The grace period prevents noisy alerts from normal delays.&lt;/p&gt;

&lt;p&gt;For scenarios with routers, consider separate heartbeats for separate important paths.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Webhook trigger
  → Router
    → New customer path
      → Create onboarding tasks
      → Ping onboarding heartbeat
    → Refund path
      → Update finance sheet
      → Ping refund heartbeat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives you more precise alerts when only one branch breaks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Putting the heartbeat at the beginning
&lt;/h3&gt;

&lt;p&gt;If the heartbeat runs right after the trigger, it only proves the scenario started. It does not prove the important work completed.&lt;/p&gt;

&lt;p&gt;Put it near the end.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Relying only on Make history
&lt;/h3&gt;

&lt;p&gt;Scenario history is useful for debugging, but it mostly helps after someone looks. It does not always catch missing runs quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Using no grace period
&lt;/h3&gt;

&lt;p&gt;Schedules are not always exact. APIs can be slow and scenarios can take longer than usual.&lt;/p&gt;

&lt;p&gt;Use a practical alert window instead of alerting immediately after the expected time.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Treating every branch as one workflow
&lt;/h3&gt;

&lt;p&gt;If a scenario has multiple router paths, one path can break while another still works.&lt;/p&gt;

&lt;p&gt;Monitor critical branches separately when needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Sending heartbeats when no useful work happened
&lt;/h3&gt;

&lt;p&gt;For some automations, a successful run is not enough. If a lead sync processes zero leads because a filter broke, you may want the heartbeat only after useful data is processed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternative approaches
&lt;/h2&gt;

&lt;p&gt;Heartbeat monitoring works best alongside other signals.&lt;/p&gt;

&lt;h3&gt;
  
  
  Make execution history
&lt;/h3&gt;

&lt;p&gt;Great for debugging failed modules, input bundles, output bundles, and error details. Less ideal as the only proactive monitor.&lt;/p&gt;

&lt;h3&gt;
  
  
  Built-in error notifications
&lt;/h3&gt;

&lt;p&gt;Useful for visible scenario errors, but not always enough for disabled scenarios, missed schedules, or logical failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Logs in destination systems
&lt;/h3&gt;

&lt;p&gt;A CRM, database, or spreadsheet may show when data was last updated. This can help confirm results, but it is often harder to centralize.&lt;/p&gt;

&lt;h3&gt;
  
  
  Uptime monitoring
&lt;/h3&gt;

&lt;p&gt;Good for checking whether a website or API is reachable. Not enough to prove a Make scenario processed records or sent a report.&lt;/p&gt;

&lt;h3&gt;
  
  
  Result-based checks
&lt;/h3&gt;

&lt;p&gt;For critical workflows, you can monitor the destination directly: did today's report exist, did new records arrive, did a timestamp update? This is precise, but usually takes more setup.&lt;/p&gt;

&lt;p&gt;A strong setup combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Make history for debugging&lt;/li&gt;
&lt;li&gt;built-in alerts for visible errors&lt;/li&gt;
&lt;li&gt;heartbeat monitoring for missing runs&lt;/li&gt;
&lt;li&gt;result checks for critical data correctness&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Make scenario monitoring?
&lt;/h3&gt;

&lt;p&gt;Make scenario monitoring means tracking whether Make.com scenarios run successfully and on time. It includes checking errors, execution history, schedules, and heartbeat signals.&lt;/p&gt;

&lt;h3&gt;
  
  
  How can I detect if a Make scenario stopped running?
&lt;/h3&gt;

&lt;p&gt;Add a heartbeat ping near the end of the scenario and alert when the ping is missing. If the scenario is disabled, delayed, or fails before completion, the heartbeat will not arrive.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Make's built-in error history enough?
&lt;/h3&gt;

&lt;p&gt;It is useful, but it is not always enough. History helps debug executions that happened. Heartbeat monitoring also catches expected executions that did not happen.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where should the heartbeat go?
&lt;/h3&gt;

&lt;p&gt;Place it after the critical work succeeds: after syncing records, sending a report, updating a destination system, or completing a key branch.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does this work for webhook scenarios?
&lt;/h3&gt;

&lt;p&gt;Yes. A heartbeat can confirm that a webhook scenario processed an event successfully, not just that the scenario exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Make.com scenarios can quietly become production infrastructure.&lt;/p&gt;

&lt;p&gt;If a scenario matters, monitor it like any other scheduled job or background process. Add a heartbeat after the critical work, choose a reasonable alert window, and make missing runs visible before they become business problems.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://quietpulse.xyz/blog/make-scenario-monitoring" rel="noopener noreferrer"&gt;https://quietpulse.xyz/blog/make-scenario-monitoring&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>make</category>
      <category>automation</category>
      <category>monitoring</category>
      <category>devops</category>
    </item>
    <item>
      <title>Systemd Timer Monitoring: How to Detect Failed or Missed Timers</title>
      <dc:creator>quietpulse</dc:creator>
      <pubDate>Sat, 02 May 2026 08:46:57 +0000</pubDate>
      <link>https://dev.to/quietpulse-social/systemd-timer-monitoring-how-to-detect-failed-or-missed-timers-469a</link>
      <guid>https://dev.to/quietpulse-social/systemd-timer-monitoring-how-to-detect-failed-or-missed-timers-469a</guid>
      <description>&lt;p&gt;Systemd timer monitoring matters when you use Linux timers for real production work: backups, imports, billing tasks, report generation, cleanup scripts, queue maintenance, certificate renewal, and dozens of other scheduled jobs that nobody wants to babysit.&lt;/p&gt;

&lt;p&gt;Systemd timers are often cleaner than cron. They integrate with &lt;code&gt;systemctl&lt;/code&gt;, log through journald, support dependencies, and can run missed jobs after boot. But they still have one uncomfortable weakness: a timer can stop doing useful work while the server itself looks perfectly healthy.&lt;/p&gt;

&lt;p&gt;The machine is up. SSH works. Your app responds. The timer unit exists.&lt;/p&gt;

&lt;p&gt;And yet the job did not run.&lt;/p&gt;

&lt;p&gt;That is the gap systemd timer monitoring should close.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;A systemd timer is usually made of two units:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# /etc/systemd/system/example-backup.timer
&lt;/span&gt;&lt;span class="nn"&gt;[Unit]&lt;/span&gt;
&lt;span class="py"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;Run example backup daily&lt;/span&gt;

&lt;span class="nn"&gt;[Timer]&lt;/span&gt;
&lt;span class="py"&gt;OnCalendar&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;daily&lt;/span&gt;
&lt;span class="py"&gt;Persistent&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;

&lt;span class="nn"&gt;[Install]&lt;/span&gt;
&lt;span class="py"&gt;WantedBy&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;timers.target&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the service it triggers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# /etc/systemd/system/example-backup.service
&lt;/span&gt;&lt;span class="nn"&gt;[Unit]&lt;/span&gt;
&lt;span class="py"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;Example daily backup&lt;/span&gt;

&lt;span class="nn"&gt;[Service]&lt;/span&gt;
&lt;span class="py"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;oneshot&lt;/span&gt;
&lt;span class="py"&gt;ExecStart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/usr/local/bin/example-backup.sh&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You enable it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl &lt;span class="nb"&gt;enable&lt;/span&gt; &lt;span class="nt"&gt;--now&lt;/span&gt; example-backup.timer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then you check it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl list-timers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything looks fine.&lt;/p&gt;

&lt;p&gt;The problem is that “timer exists” does not mean “the work is being completed successfully.”&lt;/p&gt;

&lt;p&gt;A timer can be active while the service fails. A service can exit successfully while the script skipped the important part. A job can hang forever. A server can be off during the scheduled window. A deployment can replace the script path. Permissions can change. Environment variables can disappear.&lt;/p&gt;

&lt;p&gt;If nobody checks the actual execution signal, these failures can stay silent for days.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it happens
&lt;/h2&gt;

&lt;p&gt;Systemd timers are reliable, but they are not magic. They schedule execution. They do not automatically prove that the business task succeeded.&lt;/p&gt;

&lt;p&gt;Common failure modes include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;.timer&lt;/code&gt; unit is enabled, but the &lt;code&gt;.service&lt;/code&gt; unit fails.&lt;/li&gt;
&lt;li&gt;The service exits with code &lt;code&gt;0&lt;/code&gt;, but the script did not complete meaningful work.&lt;/li&gt;
&lt;li&gt;The job depends on network access before the network is ready.&lt;/li&gt;
&lt;li&gt;The script works manually but fails under systemd’s limited environment.&lt;/li&gt;
&lt;li&gt;The timer was disabled during maintenance and never re-enabled.&lt;/li&gt;
&lt;li&gt;The server rebooted, and the timer did not catch up because &lt;code&gt;Persistent=true&lt;/code&gt; was missing.&lt;/li&gt;
&lt;li&gt;A long-running service overlaps with the next scheduled run.&lt;/li&gt;
&lt;li&gt;Logs rotate or disappear before anyone checks them.&lt;/li&gt;
&lt;li&gt;A package update changes permissions, paths, or runtime behavior.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A classic example is a backup script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

pg_dump &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DATABASE_URL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /backups/app.sql
aws s3 &lt;span class="nb"&gt;cp&lt;/span&gt; /backups/app.sql s3://example-backups/app.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This may work perfectly from your shell.&lt;/p&gt;

&lt;p&gt;But when systemd runs it, &lt;code&gt;$DATABASE_URL&lt;/code&gt; may not exist. The AWS credentials may not be loaded. The script may not have permission to write to &lt;code&gt;/backups&lt;/code&gt;. DNS may fail for a few minutes after boot.&lt;/p&gt;

&lt;p&gt;You will probably see the failure in journald if you look:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;journalctl &lt;span class="nt"&gt;-u&lt;/span&gt; example-backup.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the whole point of monitoring is not needing to remember to look.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it’s dangerous
&lt;/h2&gt;

&lt;p&gt;Missed systemd timers are dangerous because they usually affect work that happens behind the scenes.&lt;/p&gt;

&lt;p&gt;Users do not immediately notice that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;backups stopped running&lt;/li&gt;
&lt;li&gt;reports were not generated&lt;/li&gt;
&lt;li&gt;invoices were not sent&lt;/li&gt;
&lt;li&gt;expired sessions were not cleaned up&lt;/li&gt;
&lt;li&gt;data syncs stopped&lt;/li&gt;
&lt;li&gt;temporary files are filling the disk&lt;/li&gt;
&lt;li&gt;webhooks are not being retried&lt;/li&gt;
&lt;li&gt;usage counters are stale&lt;/li&gt;
&lt;li&gt;SSL renewal hooks did not run&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The app can look healthy while important background work is broken.&lt;/p&gt;

&lt;p&gt;This is why uptime monitoring is not enough. An uptime check tells you that an HTTP endpoint responded. It does not tell you that last night’s backup finished. It does not tell you that a timer ran at 03:00. It does not tell you that your cleanup job is stuck waiting on a locked file.&lt;/p&gt;

&lt;p&gt;For small teams and side projects, this can be especially painful. You may not have a full observability stack. You may not check servers every morning. You may only discover the issue when something has already gone wrong.&lt;/p&gt;

&lt;p&gt;A missed timer is rarely dramatic at first. It is quiet.&lt;/p&gt;

&lt;p&gt;That is what makes it risky.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to detect it
&lt;/h2&gt;

&lt;p&gt;Good systemd timer monitoring should answer a simple question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Did the expected job complete within the expected time window?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;There are a few signals you can use.&lt;/p&gt;

&lt;p&gt;First, systemd itself can show scheduled timers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl list-timers &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells you the next run, last run, and associated unit.&lt;/p&gt;

&lt;p&gt;Second, you can inspect service status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl status example-backup.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Third, you can check logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;journalctl &lt;span class="nt"&gt;-u&lt;/span&gt; example-backup.service &lt;span class="nt"&gt;--since&lt;/span&gt; &lt;span class="s2"&gt;"24 hours ago"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are useful debugging tools.&lt;/p&gt;

&lt;p&gt;But they are mostly pull-based. You have to remember to check them.&lt;/p&gt;

&lt;p&gt;For production monitoring, you usually want push-based detection. The job should emit a small success signal after it completes. If that signal does not arrive on time, your monitoring system alerts you.&lt;/p&gt;

&lt;p&gt;That is heartbeat monitoring.&lt;/p&gt;

&lt;p&gt;The timer runs the service. The service runs the script. At the end of a successful run, the script sends a heartbeat ping.&lt;/p&gt;

&lt;p&gt;If the ping arrives, the job completed.&lt;/p&gt;

&lt;p&gt;If the ping does not arrive by the expected deadline, something is wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the timer did not fire&lt;/li&gt;
&lt;li&gt;the service failed&lt;/li&gt;
&lt;li&gt;the script crashed&lt;/li&gt;
&lt;li&gt;the server was down&lt;/li&gt;
&lt;li&gt;the network was unavailable&lt;/li&gt;
&lt;li&gt;the job hung before completion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Heartbeat monitoring does not replace logs. It answers a different question: “Did the scheduled work happen?”&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple solution
&lt;/h2&gt;

&lt;p&gt;Let’s say you have a daily backup job triggered by a systemd timer.&lt;/p&gt;

&lt;p&gt;Your service calls this script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="nv"&gt;BACKUP_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/var/backups/app-&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%F&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;.sql"&lt;/span&gt;

pg_dump &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DATABASE_URL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;gzip&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;&lt;span class="s2"&gt;.gz"&lt;/span&gt; &lt;span class="s2"&gt;"s3://example-backups/"&lt;/span&gt;

curl &lt;span class="nt"&gt;-fsS&lt;/span&gt; &lt;span class="s2"&gt;"https://quietpulse.xyz/ping/YOUR_TOKEN"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important part is that the ping happens only after the meaningful work succeeds.&lt;/p&gt;

&lt;p&gt;Do not ping at the start. Do not ping before the upload. Do not ping before the database dump completes.&lt;/p&gt;

&lt;p&gt;Ping after success.&lt;/p&gt;

&lt;p&gt;Your service file might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[Unit]&lt;/span&gt;
&lt;span class="py"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;Daily application backup&lt;/span&gt;

&lt;span class="nn"&gt;[Service]&lt;/span&gt;
&lt;span class="py"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;oneshot&lt;/span&gt;
&lt;span class="py"&gt;EnvironmentFile&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/etc/example-backup.env&lt;/span&gt;
&lt;span class="py"&gt;ExecStart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/usr/local/bin/example-backup.sh&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your timer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[Unit]&lt;/span&gt;
&lt;span class="py"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;Run daily application backup&lt;/span&gt;

&lt;span class="nn"&gt;[Timer]&lt;/span&gt;
&lt;span class="py"&gt;OnCalendar&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;03:00&lt;/span&gt;
&lt;span class="py"&gt;Persistent&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;Unit&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;example-backup.service&lt;/span&gt;

&lt;span class="nn"&gt;[Install]&lt;/span&gt;
&lt;span class="py"&gt;WantedBy&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;timers.target&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then enable it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl daemon-reload
systemctl &lt;span class="nb"&gt;enable&lt;/span&gt; &lt;span class="nt"&gt;--now&lt;/span&gt; example-backup.timer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check that systemd knows about it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl list-timers example-backup.timer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With heartbeat monitoring, you configure the expected interval externally. For example, if the backup runs every day at 03:00, you might expect one ping every 24 hours with a small grace period.&lt;/p&gt;

&lt;p&gt;If no ping arrives, you get alerted.&lt;/p&gt;

&lt;p&gt;Instead of building that alerting logic yourself, you can use a simple heartbeat monitoring tool like QuietPulse. Create a monitor, copy the ping URL, and call it from the end of your systemd-triggered script. The important idea is still the same: alert on missing success signals, not just server uptime.&lt;/p&gt;

&lt;h2&gt;
  
  
  A better pattern for scripts
&lt;/h2&gt;

&lt;p&gt;For more robust scripts, use a trap so failures are easier to debug locally, but keep the success ping at the end.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

log&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"[&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="nt"&gt;--iso-8601&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;seconds&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;] &lt;/span&gt;&lt;span class="nv"&gt;$*&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

log &lt;span class="s2"&gt;"Starting backup"&lt;/span&gt;

&lt;span class="nv"&gt;BACKUP_FILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/var/backups/app-&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%F&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;.sql"&lt;/span&gt;

pg_dump &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DATABASE_URL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;gzip&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BACKUP_FILE&lt;/span&gt;&lt;span class="s2"&gt;.gz"&lt;/span&gt; &lt;span class="s2"&gt;"s3://example-backups/"&lt;/span&gt;

log &lt;span class="s2"&gt;"Backup completed successfully"&lt;/span&gt;

curl &lt;span class="nt"&gt;-fsS&lt;/span&gt; &lt;span class="s2"&gt;"https://quietpulse.xyz/ping/YOUR_TOKEN"&lt;/span&gt;

log &lt;span class="s2"&gt;"Heartbeat sent"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you two layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;journald logs for investigation&lt;/li&gt;
&lt;li&gt;heartbeat monitoring for missed execution detection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the script fails before the final &lt;code&gt;curl&lt;/code&gt;, the heartbeat does not fire. That is exactly what you want.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Monitoring only the timer unit
&lt;/h3&gt;

&lt;p&gt;Checking that a timer is enabled is not enough.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl is-enabled example-backup.timer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This only tells you that systemd is configured to schedule it. It does not prove successful execution.&lt;/p&gt;

&lt;p&gt;You need to monitor completion, not configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Sending the heartbeat too early
&lt;/h3&gt;

&lt;p&gt;A common mistake is placing the ping at the top of the script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsS&lt;/span&gt; &lt;span class="s2"&gt;"https://quietpulse.xyz/ping/YOUR_TOKEN"&lt;/span&gt;

pg_dump &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DATABASE_URL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; backup.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a false positive. The monitor sees a successful ping even if the actual job fails immediately afterward.&lt;/p&gt;

&lt;p&gt;The ping should be the last step after the important work completes.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Ignoring the systemd environment
&lt;/h3&gt;

&lt;p&gt;Systemd services do not run with the same environment as your interactive shell.&lt;/p&gt;

&lt;p&gt;This often breaks scripts that depend on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;shell profile files&lt;/li&gt;
&lt;li&gt;local PATH changes&lt;/li&gt;
&lt;li&gt;exported secrets&lt;/li&gt;
&lt;li&gt;user-specific credentials&lt;/li&gt;
&lt;li&gt;working directories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use explicit paths, &lt;code&gt;EnvironmentFile=&lt;/code&gt;, and clear permissions.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Forgetting &lt;code&gt;Persistent=true&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;If a server is off during a scheduled time, &lt;code&gt;Persistent=true&lt;/code&gt; tells systemd to run the missed timer after boot.&lt;/p&gt;

&lt;p&gt;Without it, some jobs may simply be skipped.&lt;/p&gt;

&lt;p&gt;For daily maintenance jobs, backups, and syncs, this setting is often worth enabling.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Not setting timeouts
&lt;/h3&gt;

&lt;p&gt;A oneshot service can hang longer than expected if a command waits forever.&lt;/p&gt;

&lt;p&gt;Use systemd options like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[Service]&lt;/span&gt;
&lt;span class="py"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;oneshot&lt;/span&gt;
&lt;span class="py"&gt;TimeoutStartSec&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;30min&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A hung timer can be just as bad as a missed one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternative approaches
&lt;/h2&gt;

&lt;p&gt;Heartbeat monitoring is usually the simplest way to detect missed timers, but it is not the only useful signal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Journald logs
&lt;/h3&gt;

&lt;p&gt;You can inspect logs with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;journalctl &lt;span class="nt"&gt;-u&lt;/span&gt; example-backup.service &lt;span class="nt"&gt;--since&lt;/span&gt; today
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is excellent for debugging.&lt;/p&gt;

&lt;p&gt;But logs are passive. They help after you know something is wrong.&lt;/p&gt;

&lt;h3&gt;
  
  
  Systemd status checks
&lt;/h3&gt;

&lt;p&gt;You can check failed units:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl &lt;span class="nt"&gt;--failed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or inspect one service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl status example-backup.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This helps catch hard service failures.&lt;/p&gt;

&lt;p&gt;But it may not catch a script that exits successfully while doing incomplete work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Metrics and dashboards
&lt;/h3&gt;

&lt;p&gt;If you already use Prometheus, Grafana, or another monitoring stack, you can export timer metrics and alert on them.&lt;/p&gt;

&lt;p&gt;This is powerful, but it may be too much for a small VPS, indie app, or simple background job.&lt;/p&gt;

&lt;h3&gt;
  
  
  Email from scripts
&lt;/h3&gt;

&lt;p&gt;Some scripts send email on failure. This can work, but it depends on mail delivery, spam filtering, and correct error handling.&lt;/p&gt;

&lt;p&gt;Also, failure-only alerts do not catch every missed run. If the script never starts, it may never send the email.&lt;/p&gt;

&lt;h3&gt;
  
  
  Uptime checks
&lt;/h3&gt;

&lt;p&gt;Uptime checks are still useful for web apps.&lt;/p&gt;

&lt;p&gt;They just do not answer the systemd timer question. Your website can be up while your daily job is broken.&lt;/p&gt;

&lt;p&gt;Use uptime checks for endpoints. Use heartbeat checks for scheduled work.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is systemd timer monitoring?
&lt;/h3&gt;

&lt;p&gt;Systemd timer monitoring is the practice of checking whether scheduled systemd timer jobs actually run and complete successfully. It usually combines systemd status, logs, and heartbeat checks that alert when an expected job does not report success.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I know if a systemd timer failed?
&lt;/h3&gt;

&lt;p&gt;You can start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl list-timers &lt;span class="nt"&gt;--all&lt;/span&gt;
systemctl status your-service.service
journalctl &lt;span class="nt"&gt;-u&lt;/span&gt; your-service.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For proactive detection, add a heartbeat ping at the end of the job and alert when the ping is missing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are systemd timers better than cron?
&lt;/h3&gt;

&lt;p&gt;Systemd timers are often better for Linux services because they integrate with unit dependencies, journald, boot behavior, and systemctl. Cron is simpler and widely known. Both still need monitoring if the scheduled work matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can uptime monitoring detect missed systemd timers?
&lt;/h3&gt;

&lt;p&gt;No, not reliably. Uptime monitoring checks whether a service or endpoint responds. A missed systemd timer can happen while the server and application are still online.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where should I put the heartbeat ping?
&lt;/h3&gt;

&lt;p&gt;Put the heartbeat ping at the end of the script, after the important work has completed successfully. If you ping at the beginning, you may hide failures that happen later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Systemd timers are a strong replacement for many cron jobs, but they still need monitoring.&lt;/p&gt;

&lt;p&gt;Do not stop at “the timer is enabled.” Monitor whether the job actually completed.&lt;/p&gt;

&lt;p&gt;Use systemd logs and status for debugging. Use heartbeat monitoring to catch missed or failed execution automatically. For backups, syncs, reports, cleanup scripts, and other scheduled production work, that small success ping can be the difference between a quiet failure and an early alert.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://quietpulse.xyz/blog/systemd-timer-monitoring" rel="noopener noreferrer"&gt;https://quietpulse.xyz/blog/systemd-timer-monitoring&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>systemd</category>
      <category>monitoring</category>
      <category>devops</category>
      <category>reliability</category>
    </item>
    <item>
      <title>Kubernetes CronJob Monitoring: How to Catch Missed Runs Before They Break Production</title>
      <dc:creator>quietpulse</dc:creator>
      <pubDate>Fri, 01 May 2026 07:30:50 +0000</pubDate>
      <link>https://dev.to/quietpulse-social/kubernetes-cronjob-monitoring-how-to-catch-missed-runs-before-they-break-production-48g9</link>
      <guid>https://dev.to/quietpulse-social/kubernetes-cronjob-monitoring-how-to-catch-missed-runs-before-they-break-production-48g9</guid>
      <description>&lt;p&gt;Kubernetes CronJob monitoring sounds simple until the first scheduled job silently does not run.&lt;/p&gt;

&lt;p&gt;Your cluster is healthy. The pods look fine. The app is serving traffic. Prometheus is green. Then somebody asks why yesterday’s invoices were not generated, why cleanup did not happen, or why a customer export is missing.&lt;/p&gt;

&lt;p&gt;The problem is that Kubernetes can tell you a lot about pods and workloads, but a scheduled job is different: it matters that it ran at the right time, completed successfully, and keeps doing that every time.&lt;/p&gt;

&lt;p&gt;This guide explains what actually breaks with Kubernetes CronJobs, why missed runs are easy to miss, and how to monitor them with heartbeat checks.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;A Kubernetes CronJob is a scheduled workload. You define a schedule, Kubernetes creates Jobs, and those Jobs create Pods.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;batch/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CronJob&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nightly-invoice-sync&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*"&lt;/span&gt;
  &lt;span class="na"&gt;jobTemplate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;OnFailure&lt;/span&gt;
          &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sync&lt;/span&gt;
              &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;example/invoice-sync:latest&lt;/span&gt;
              &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;node"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sync-invoices.js"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks clean. But in production, several things can go wrong:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The CronJob never creates a Job.&lt;/li&gt;
&lt;li&gt;The Job starts but the Pod fails.&lt;/li&gt;
&lt;li&gt;The Pod hangs forever.&lt;/li&gt;
&lt;li&gt;The job runs too late.&lt;/li&gt;
&lt;li&gt;Multiple runs overlap.&lt;/li&gt;
&lt;li&gt;The job succeeds from Kubernetes’ point of view but does not finish the business task.&lt;/li&gt;
&lt;li&gt;The schedule is suspended and nobody notices.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kubernetes usually exposes these as separate signals: CronJob status, Job status, Pod events, logs, and metrics. That is useful, but it also means there is no single obvious signal that says:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“This scheduled task did not complete when expected.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the core monitoring gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it happens
&lt;/h2&gt;

&lt;p&gt;Kubernetes CronJobs depend on several moving parts.&lt;/p&gt;

&lt;p&gt;First, the CronJob controller must notice that a schedule is due and create a Job. If the controller is delayed, the cluster is under pressure, or the CronJob configuration has edge cases, the Job may be late or skipped.&lt;/p&gt;

&lt;p&gt;Second, the Job must create a Pod. That can fail because of image pull errors, missing secrets, resource limits, node pressure, admission policies, or broken service accounts.&lt;/p&gt;

&lt;p&gt;Third, the Pod must actually run the task. This is where application-level failures appear: bad credentials, API rate limits, database locks, schema changes, network timeouts, or logic bugs.&lt;/p&gt;

&lt;p&gt;Finally, the task must complete the real business operation. A script can exit with code &lt;code&gt;0&lt;/code&gt; even if it processed zero records because a query changed or an upstream API returned an unexpected empty response.&lt;/p&gt;

&lt;p&gt;Kubernetes is good at managing containers. It is not automatically aware of your business expectation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“This billing sync must finish once every night.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That expectation needs to be monitored directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it's dangerous
&lt;/h2&gt;

&lt;p&gt;Missed CronJobs are dangerous because they often fail quietly.&lt;/p&gt;

&lt;p&gt;A web server failure is visible quickly. Users complain. Error rates spike. Uptime checks fail.&lt;/p&gt;

&lt;p&gt;A missed scheduled task can sit unnoticed for hours or days.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A billing job does not run, so invoices are never created.&lt;/li&gt;
&lt;li&gt;A cleanup job stops, so storage usage grows until something breaks.&lt;/li&gt;
&lt;li&gt;A data import misses one night, so dashboards show stale numbers.&lt;/li&gt;
&lt;li&gt;A reminder job silently fails, so customers do not receive notifications.&lt;/li&gt;
&lt;li&gt;A reconciliation task skips a run, so financial state drifts.&lt;/li&gt;
&lt;li&gt;A backup verification job stops running, so nobody knows backups are broken.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The worst part is that many CronJob failures do not look urgent at the infrastructure level. The cluster can be perfectly healthy while the scheduled business process is failing.&lt;/p&gt;

&lt;p&gt;That is why Kubernetes CronJob monitoring should focus on expected completion, not just pod health.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to detect it
&lt;/h2&gt;

&lt;p&gt;The most reliable way to detect missed CronJobs is to monitor the job from the outside.&lt;/p&gt;

&lt;p&gt;Instead of only asking Kubernetes “did a pod exist?”, ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Did this scheduled task finish within the expected time window?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is what heartbeat monitoring does.&lt;/p&gt;

&lt;p&gt;The pattern is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a unique heartbeat URL for the scheduled task.&lt;/li&gt;
&lt;li&gt;At the end of the CronJob, call that URL.&lt;/li&gt;
&lt;li&gt;Configure the monitor to expect a ping every schedule interval.&lt;/li&gt;
&lt;li&gt;If the ping does not arrive on time, send an alert.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For example, if a CronJob runs every night at 02:00 and normally finishes by 02:10, you might expect a heartbeat once every 24 hours with a grace period.&lt;/p&gt;

&lt;p&gt;This detects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The CronJob did not start.&lt;/li&gt;
&lt;li&gt;The Job failed before the end.&lt;/li&gt;
&lt;li&gt;The Pod crashed.&lt;/li&gt;
&lt;li&gt;The script hung.&lt;/li&gt;
&lt;li&gt;The schedule was suspended.&lt;/li&gt;
&lt;li&gt;The task completed too late.&lt;/li&gt;
&lt;li&gt;Kubernetes created objects but the real work never finished.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is different from log monitoring or pod monitoring. It checks the outcome that matters: the job reached the point where it can say “I completed.”&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple solution with example
&lt;/h2&gt;

&lt;p&gt;A simple pattern is to send the heartbeat only after the task succeeds.&lt;/p&gt;

&lt;p&gt;For a shell-based Kubernetes CronJob, that might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;batch/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CronJob&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nightly-report&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*"&lt;/span&gt;
  &lt;span class="na"&gt;concurrencyPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Forbid&lt;/span&gt;
  &lt;span class="na"&gt;successfulJobsHistoryLimit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
  &lt;span class="na"&gt;failedJobsHistoryLimit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
  &lt;span class="na"&gt;jobTemplate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;backoffLimit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
      &lt;span class="na"&gt;template&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;restartPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;OnFailure&lt;/span&gt;
          &lt;span class="na"&gt;containers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;report&lt;/span&gt;
              &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;curlimages/curl:latest&lt;/span&gt;
              &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/bin/sh&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
                  &lt;span class="s"&gt;set -e&lt;/span&gt;

                  &lt;span class="s"&gt;echo "Running nightly report..."&lt;/span&gt;

                  &lt;span class="s"&gt;# Replace this with your real command.&lt;/span&gt;
                  &lt;span class="s"&gt;/app/generate-nightly-report.sh&lt;/span&gt;

                  &lt;span class="s"&gt;curl -fsS --max-time 10 https://quietpulse.xyz/ping/YOUR_TOKEN&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important detail is the order.&lt;/p&gt;

&lt;p&gt;The heartbeat happens after the actual work. If the report command fails, &lt;code&gt;set -e&lt;/code&gt; stops the script and the ping never happens. That means the monitor will alert.&lt;/p&gt;

&lt;p&gt;For a Node.js job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateReport&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://quietpulse.xyz/ping/YOUR_TOKEN&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;GET&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AbortSignal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="k"&gt;catch&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a Python job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;generate_report&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://quietpulse.xyz/ping/YOUR_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can build this yourself with a small service that stores last-seen timestamps and sends alerts. Or you can use a heartbeat monitoring tool like QuietPulse, create a monitor for the CronJob, and ping its URL when the job finishes.&lt;/p&gt;

&lt;p&gt;The key idea is not the tool. The key idea is that every important scheduled task should prove it completed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Pinging at the start of the job
&lt;/h3&gt;

&lt;p&gt;A start ping proves the job started. It does not prove the job completed.&lt;/p&gt;

&lt;p&gt;If the task hangs halfway through, crashes after processing some records, or fails during the final API call, a start ping gives a false sense of safety.&lt;/p&gt;

&lt;p&gt;For most CronJobs, send the heartbeat at the end.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Only watching pod status
&lt;/h3&gt;

&lt;p&gt;Pod status is useful, but it is not enough.&lt;/p&gt;

&lt;p&gt;A pod can exist and still fail the real task. A container can exit successfully while processing no data. A Job can be retried and eventually disappear from history.&lt;/p&gt;

&lt;p&gt;Infrastructure status should support CronJob monitoring, not replace it.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Ignoring execution time
&lt;/h3&gt;

&lt;p&gt;A job that normally finishes in 3 minutes but suddenly takes 2 hours may already be broken.&lt;/p&gt;

&lt;p&gt;Track duration when possible. At minimum, configure heartbeat grace periods based on realistic runtime, not just the schedule.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Allowing overlapping runs by accident
&lt;/h3&gt;

&lt;p&gt;If a CronJob runs every 10 minutes but sometimes takes 20 minutes, overlapping executions can create duplicates, locks, or inconsistent data.&lt;/p&gt;

&lt;p&gt;Use &lt;code&gt;concurrencyPolicy: Forbid&lt;/code&gt; when overlap is unsafe:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;concurrencyPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Forbid&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then monitor for missed completions so skipped or delayed work does not stay invisible.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Keeping too little job history
&lt;/h3&gt;

&lt;p&gt;Kubernetes lets you control how many successful and failed Jobs are retained.&lt;/p&gt;

&lt;p&gt;If history limits are too low, useful debugging context disappears quickly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;successfulJobsHistoryLimit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;span class="na"&gt;failedJobsHistoryLimit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Heartbeat alerts tell you something is wrong. Job and pod history help you investigate why.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternative approaches
&lt;/h2&gt;

&lt;p&gt;Heartbeat monitoring is usually the cleanest way to detect missed CronJobs, but it should not be your only signal.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kubernetes events
&lt;/h3&gt;

&lt;p&gt;Kubernetes events can show scheduling problems, failed pod creation, image pull errors, and resource issues.&lt;/p&gt;

&lt;p&gt;They are useful for debugging, but they are noisy and not always retained long enough.&lt;/p&gt;

&lt;h3&gt;
  
  
  Logs
&lt;/h3&gt;

&lt;p&gt;Logs help explain what happened inside the job.&lt;/p&gt;

&lt;p&gt;They are less reliable for detecting jobs that never started. If there is no run, there may be no log line to search for.&lt;/p&gt;

&lt;h3&gt;
  
  
  Metrics
&lt;/h3&gt;

&lt;p&gt;Prometheus and kube-state-metrics can expose useful signals about CronJobs, Jobs, and Pods.&lt;/p&gt;

&lt;p&gt;This can work well if your team already has a strong Kubernetes monitoring setup. But it still requires careful alert rules around expected schedule, last successful completion, and delay tolerance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Uptime checks
&lt;/h3&gt;

&lt;p&gt;Uptime monitoring checks whether a service responds.&lt;/p&gt;

&lt;p&gt;That is not the same as checking whether a scheduled job completed. Your app can be online while the nightly reconciliation job has not run in three days.&lt;/p&gt;

&lt;h3&gt;
  
  
  Application-level checks
&lt;/h3&gt;

&lt;p&gt;For some jobs, the best signal is a business metric: “new report generated”, “backup verified”, “records imported”, or “emails sent”.&lt;/p&gt;

&lt;p&gt;These are excellent when available. Heartbeat monitoring is often the simplest baseline, and business metrics can add extra confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Kubernetes CronJob monitoring?
&lt;/h3&gt;

&lt;p&gt;Kubernetes CronJob monitoring is the practice of checking whether scheduled Kubernetes Jobs run and complete as expected. Good monitoring detects missed runs, failed pods, delayed execution, hangs, and broken business tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I know if a Kubernetes CronJob did not run?
&lt;/h3&gt;

&lt;p&gt;You can inspect CronJob, Job, and Pod status with &lt;code&gt;kubectl&lt;/code&gt;, but the most reliable production signal is an external heartbeat. If the expected heartbeat does not arrive after the scheduled run, the CronJob likely failed, missed its schedule, or did not complete.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is pod monitoring enough for Kubernetes CronJobs?
&lt;/h3&gt;

&lt;p&gt;No. Pod monitoring helps, but it does not fully prove that the scheduled task completed its business work. A pod can start and still fail internally, hang, process no records, or exit successfully with bad results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should the heartbeat happen at the start or end of the CronJob?
&lt;/h3&gt;

&lt;p&gt;Usually at the end. A heartbeat at the end proves that the job reached its completion point. A heartbeat at the start only proves that execution began.&lt;/p&gt;

&lt;h3&gt;
  
  
  What grace period should I use for a CronJob monitor?
&lt;/h3&gt;

&lt;p&gt;Use the normal schedule plus expected runtime and a small buffer. If a job runs every hour and usually finishes in 5 minutes, a 10–15 minute grace period may be reasonable. For long jobs, base the grace period on real historical runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Kubernetes CronJobs are easy to create, but missed runs are easy to overlook.&lt;/p&gt;

&lt;p&gt;The safest monitoring pattern is simple: make each important CronJob send a heartbeat after successful completion, then alert when that heartbeat does not arrive on time.&lt;/p&gt;

&lt;p&gt;Kubernetes can tell you what happened to pods. Heartbeat monitoring tells you whether the scheduled task actually completed.&lt;/p&gt;

&lt;p&gt;For production CronJobs, that difference matters.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://quietpulse.xyz/blog/kubernetes-cronjob-monitoring" rel="noopener noreferrer"&gt;https://quietpulse.xyz/blog/kubernetes-cronjob-monitoring&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>cronjob</category>
      <category>monitoring</category>
      <category>devops</category>
    </item>
    <item>
      <title>Node.js Cron Job Monitoring Best Practices for Catching Silent Failures</title>
      <dc:creator>quietpulse</dc:creator>
      <pubDate>Thu, 30 Apr 2026 06:22:33 +0000</pubDate>
      <link>https://dev.to/quietpulse-social/nodejs-cron-job-monitoring-best-practices-for-catching-silent-failures-139b</link>
      <guid>https://dev.to/quietpulse-social/nodejs-cron-job-monitoring-best-practices-for-catching-silent-failures-139b</guid>
      <description>&lt;p&gt;Node.js cron job monitoring becomes important the first time a scheduled task quietly stops doing its job.&lt;/p&gt;

&lt;p&gt;Your API can be healthy. Your frontend can load. Your uptime monitor can stay green. Meanwhile, a billing sync, cleanup task, report generator, or import job may have stopped running days ago.&lt;/p&gt;

&lt;p&gt;That is the tricky part about cron-style work: the failure is often not visible from the outside.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Node.js scheduled jobs often run away from normal user requests.&lt;/p&gt;

&lt;p&gt;They might handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;daily email digests&lt;/li&gt;
&lt;li&gt;payment retries&lt;/li&gt;
&lt;li&gt;database cleanup&lt;/li&gt;
&lt;li&gt;cache refreshes&lt;/li&gt;
&lt;li&gt;scheduled notifications&lt;/li&gt;
&lt;li&gt;data imports&lt;/li&gt;
&lt;li&gt;report generation&lt;/li&gt;
&lt;li&gt;third-party API syncs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When one of these breaks, there may be no customer-facing error at first. The job is simply missing.&lt;/p&gt;

&lt;p&gt;That missing work can become stale data, failed billing, unprocessed records, or support tickets later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it happens
&lt;/h2&gt;

&lt;p&gt;Node.js cron jobs can break in obvious and non-obvious ways.&lt;/p&gt;

&lt;p&gt;A simple job might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;cron&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;0 * * * *&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;syncCustomers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This can fail because &lt;code&gt;syncCustomers()&lt;/code&gt; throws. But scheduled jobs can also fail because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the worker process crashed&lt;/li&gt;
&lt;li&gt;the scheduler was not started after deploy&lt;/li&gt;
&lt;li&gt;environment variables changed&lt;/li&gt;
&lt;li&gt;the cron expression is wrong&lt;/li&gt;
&lt;li&gt;the job hangs on an external API&lt;/li&gt;
&lt;li&gt;database queries never return&lt;/li&gt;
&lt;li&gt;the job overlaps with itself&lt;/li&gt;
&lt;li&gt;multiple app instances run the same task&lt;/li&gt;
&lt;li&gt;a server timezone changed&lt;/li&gt;
&lt;li&gt;errors are caught and only logged&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A common mistake is forgetting proper async handling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;cron&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;*/15 * * * *&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;syncInventory&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// missing await / error handling&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This can make production failures harder to notice.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it's dangerous
&lt;/h2&gt;

&lt;p&gt;Missed scheduled jobs rarely create one neat incident.&lt;/p&gt;

&lt;p&gt;They create slow damage.&lt;/p&gt;

&lt;p&gt;A sync that fails once may not matter. A sync that fails for three days can create stale data, missing records, broken reports, or customer confusion.&lt;/p&gt;

&lt;p&gt;The longer the issue continues, the more painful recovery becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;more data needs reprocessing&lt;/li&gt;
&lt;li&gt;duplicate work becomes more likely&lt;/li&gt;
&lt;li&gt;logs may rotate away&lt;/li&gt;
&lt;li&gt;manual fixes become risky&lt;/li&gt;
&lt;li&gt;customers may notice first&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Uptime monitoring does not solve this. It tells you whether an endpoint responds. It does not tell you whether your scheduled jobs actually completed.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to detect it
&lt;/h2&gt;

&lt;p&gt;The core monitoring question is simple:&lt;/p&gt;

&lt;p&gt;Did the job send a success signal within the expected time window?&lt;/p&gt;

&lt;p&gt;This is usually called heartbeat monitoring.&lt;/p&gt;

&lt;p&gt;The pattern is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The scheduled job runs.&lt;/li&gt;
&lt;li&gt;It completes the important work.&lt;/li&gt;
&lt;li&gt;It sends a heartbeat ping.&lt;/li&gt;
&lt;li&gt;A monitor expects that ping on schedule.&lt;/li&gt;
&lt;li&gt;If the ping does not arrive, someone gets alerted.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a 15-minute job should check in every 15–20 minutes&lt;/li&gt;
&lt;li&gt;an hourly job should check in every 60–70 minutes&lt;/li&gt;
&lt;li&gt;a daily job should check in every 24–26 hours&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This catches problems like missed runs, crashed workers, bad deploys, disabled schedulers, and jobs that hang before completion.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple solution
&lt;/h2&gt;

&lt;p&gt;Here is a basic example using &lt;code&gt;node-cron&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;node-cron
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;cron&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;node-cron&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;runJob&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Starting customer sync&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;syncCustomers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://quietpulse.xyz/ping/{token}&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Customer sync completed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;cron&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;0 * * * *&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runJob&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Customer sync failed:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;exitCode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key detail: send the heartbeat after the work succeeds.&lt;/p&gt;

&lt;p&gt;Do not do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://quietpulse.xyz/ping/{token}&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;syncCustomers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the sync fails after the ping, your monitor will think the job succeeded.&lt;/p&gt;

&lt;p&gt;For older Node.js versions, use a small HTTP client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;undici
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;fetch&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;undici&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://quietpulse.xyz/ping/{token}&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also add a timeout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;sendHeartbeat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;controller&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AbortController&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;timeout&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://quietpulse.xyz/ping/{token}&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;clearTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then call it after the job finishes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;runJob&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;syncCustomers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sendHeartbeat&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Instead of building the monitoring side yourself, you can use a heartbeat monitoring service. The important part is the pattern: each successful job run should create an external signal, and missing signals should trigger alerts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Pinging too early
&lt;/h3&gt;

&lt;p&gt;If you send a heartbeat before the real work, failures after that point are hidden.&lt;/p&gt;

&lt;p&gt;Send the heartbeat after successful completion.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Relying only on process uptime
&lt;/h3&gt;

&lt;p&gt;A process can be running while the scheduled task is broken.&lt;/p&gt;

&lt;p&gt;PM2, Docker, systemd, or Kubernetes can tell you whether a process exists. They cannot always tell you whether a specific job completed.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Ignoring long runtimes
&lt;/h3&gt;

&lt;p&gt;A job that usually takes 20 seconds but now takes 30 minutes may be failing in a slower way.&lt;/p&gt;

&lt;p&gt;Long runtimes can cause overlap, stale data, and queue buildup.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Running jobs on every app instance
&lt;/h3&gt;

&lt;p&gt;If your app runs on multiple servers and each one starts the scheduler, the same job may run multiple times.&lt;/p&gt;

&lt;p&gt;Use a dedicated worker, external scheduler, or distributed lock when needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Swallowing errors
&lt;/h3&gt;

&lt;p&gt;Logging errors is useful, but it is not the same as alerting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;syncCustomers&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If nobody reads the logs, this is still a silent failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Alternative approaches
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Logs
&lt;/h3&gt;

&lt;p&gt;Logs are useful for debugging what happened. They are weaker at detecting something that never happened.&lt;/p&gt;

&lt;p&gt;If the job never ran, there may be no log line.&lt;/p&gt;

&lt;h3&gt;
  
  
  Error tracking
&lt;/h3&gt;

&lt;p&gt;Error tracking tools can catch thrown exceptions and rejected promises.&lt;/p&gt;

&lt;p&gt;They help when a job starts and fails loudly. They do not catch every missed run, disabled scheduler, or stuck process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Uptime checks
&lt;/h3&gt;

&lt;p&gt;Uptime checks are great for websites and APIs.&lt;/p&gt;

&lt;p&gt;They do not confirm that a background job completed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Queue dashboards
&lt;/h3&gt;

&lt;p&gt;If your scheduled job creates queue work, queue metrics can help. Watch queue depth, retries, failed jobs, and processing latency.&lt;/p&gt;

&lt;p&gt;But queue metrics may not catch the scheduler failing to enqueue work in the first place.&lt;/p&gt;

&lt;h3&gt;
  
  
  Database timestamps
&lt;/h3&gt;

&lt;p&gt;You can store &lt;code&gt;last_success_at&lt;/code&gt; in your database.&lt;/p&gt;

&lt;p&gt;This works, but you still need something that checks whether the timestamp is too old and sends an alert.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Node.js cron job monitoring?
&lt;/h3&gt;

&lt;p&gt;It is the practice of checking whether scheduled Node.js tasks run successfully when expected. This includes jobs for syncs, cleanup, billing, reports, imports, and other background work.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I detect if a Node.js cron job stopped running?
&lt;/h3&gt;

&lt;p&gt;Send a heartbeat after each successful run. If the heartbeat does not arrive within the expected interval, alert someone.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are logs enough for Node.js scheduled jobs?
&lt;/h3&gt;

&lt;p&gt;No. Logs help with debugging, but they do not reliably detect missed runs. If the job never starts, logs may not show anything useful.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should cron jobs run inside the main Node.js app?
&lt;/h3&gt;

&lt;p&gt;For small apps, it can work. For production systems, a dedicated worker, external scheduler, or distributed lock is usually safer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Node.js cron job monitoring is about detecting missing work, not just errors.&lt;/p&gt;

&lt;p&gt;A scheduled job can stop running while the rest of your app looks healthy. Add a heartbeat after successful completion, alert when it goes missing, and you will catch silent failures much earlier.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://quietpulse.xyz/blog/node-js-cron-job-monitoring-best-practices" rel="noopener noreferrer"&gt;https://quietpulse.xyz/blog/node-js-cron-job-monitoring-best-practices&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>node</category>
      <category>cron</category>
      <category>monitoring</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
