<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ajdin Halac</title>
    <description>The latest articles on DEV Community by Ajdin Halac (@ajdin_halac).</description>
    <link>https://dev.to/ajdin_halac</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3869924%2F12dec91e-0166-47c3-86a3-6efd5cd1a4a4.JPEG</url>
      <title>DEV Community: Ajdin Halac</title>
      <link>https://dev.to/ajdin_halac</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ajdin_halac"/>
    <language>en</language>
    <item>
      <title>Your cron job started. That does not mean it finished.</title>
      <dc:creator>Ajdin Halac</dc:creator>
      <pubDate>Thu, 09 Apr 2026 14:02:06 +0000</pubDate>
      <link>https://dev.to/ajdin_halac/your-cron-job-started-that-does-not-mean-it-finished-44ic</link>
      <guid>https://dev.to/ajdin_halac/your-cron-job-started-that-does-not-mean-it-finished-44ic</guid>
      <description>&lt;p&gt;A lot of cron monitoring is too shallow.&lt;/p&gt;

&lt;p&gt;It tells you that a job ran, or that the server was up when it was supposed to run, and that is where the thinking stops.&lt;/p&gt;

&lt;p&gt;That might be enough for throwaway jobs. It is not enough for anything important.&lt;/p&gt;

&lt;p&gt;Because for real scheduled work, "it started" and "it finished" are two different things.&lt;/p&gt;

&lt;p&gt;And that gap is where a lot of failures hide.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem with most cron monitoring
&lt;/h2&gt;

&lt;p&gt;A lot of teams monitor cron jobs as if the only failure mode is "it never ran."&lt;/p&gt;

&lt;p&gt;But that is not how these jobs usually fail.&lt;/p&gt;

&lt;p&gt;They fail like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the job starts and gets stuck halfway through&lt;/li&gt;
&lt;li&gt;the script exits before the important part&lt;/li&gt;
&lt;li&gt;the process times out after doing some of the work&lt;/li&gt;
&lt;li&gt;the backup starts but never finishes&lt;/li&gt;
&lt;li&gt;the sync starts on time and silently dies later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why "did it run?" is not a very useful question by itself.&lt;/p&gt;

&lt;p&gt;The better questions are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did it start on time?&lt;/li&gt;
&lt;li&gt;Did it finish successfully?&lt;/li&gt;
&lt;li&gt;If it did not finish, where did it stop?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you cannot answer those, your monitoring is giving you false confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  One heartbeat is fine for simple jobs
&lt;/h2&gt;

&lt;p&gt;There is nothing wrong with a single heartbeat if the job is short and simple.&lt;/p&gt;

&lt;p&gt;If a task runs quickly and one completion ping tells you the whole story, that is fine.&lt;/p&gt;

&lt;p&gt;But a lot of jobs are not like that.&lt;/p&gt;

&lt;p&gt;Backups are not.&lt;br&gt;&lt;br&gt;
Imports are not.&lt;br&gt;&lt;br&gt;
Sync jobs are not.&lt;br&gt;&lt;br&gt;
Billing tasks are not.&lt;br&gt;&lt;br&gt;
Scheduled reports are not.&lt;/p&gt;

&lt;p&gt;These jobs can start correctly and still fail in a way that matters.&lt;/p&gt;

&lt;p&gt;That is the class of failure basic cron monitoring often misses.&lt;/p&gt;
&lt;h2&gt;
  
  
  What you actually want to know
&lt;/h2&gt;

&lt;p&gt;For anything important, you usually care about two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the job started&lt;/li&gt;
&lt;li&gt;the job finished&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That sounds obvious, but it changes how you monitor the job.&lt;/p&gt;

&lt;p&gt;If there is no start signal, the job never began.&lt;/p&gt;

&lt;p&gt;If there is a start signal but no finish signal, the job probably got stuck, timed out, or failed during execution.&lt;/p&gt;

&lt;p&gt;If both signals arrive, the run completed.&lt;/p&gt;

&lt;p&gt;That is a much better operational signal than a single generic success ping.&lt;/p&gt;
&lt;h2&gt;
  
  
  Track the lifecycle, not just one event
&lt;/h2&gt;

&lt;p&gt;This is really the core idea.&lt;/p&gt;

&lt;p&gt;Instead of treating a cron job as one event, treat it like a sequence.&lt;/p&gt;

&lt;p&gt;At minimum:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;start&lt;/li&gt;
&lt;li&gt;finish&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That gives you a more useful signal immediately.&lt;/p&gt;

&lt;p&gt;You stop asking "did something happen?" and start asking "did the job complete the way it was supposed to?"&lt;/p&gt;

&lt;p&gt;That is much closer to how these jobs actually behave in production.&lt;/p&gt;
&lt;h2&gt;
  
  
  A practical example
&lt;/h2&gt;

&lt;p&gt;Say you have a nightly backup.&lt;/p&gt;

&lt;p&gt;A lot of setups only send one ping after the backup command finishes. That works if everything goes well.&lt;/p&gt;

&lt;p&gt;But if the job starts and gets stuck halfway through, that single-ping model tells you very little.&lt;/p&gt;

&lt;p&gt;A better version looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;

curl &lt;span class="s2"&gt;"https://heartbeats.upti.my/v1/heartbeat/&amp;lt;heartbeat-id&amp;gt;?step=start"&lt;/span&gt;

pg_dump mydb &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /backups/mydb.sql

curl &lt;span class="s2"&gt;"https://heartbeats.upti.my/v1/heartbeat/&amp;lt;heartbeat-id&amp;gt;?step=finish"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the signal is more useful.&lt;/p&gt;

&lt;p&gt;No &lt;code&gt;start&lt;/code&gt; means the job never began.&lt;br&gt;&lt;br&gt;
&lt;code&gt;start&lt;/code&gt; without &lt;code&gt;finish&lt;/code&gt; means it failed during execution.&lt;br&gt;&lt;br&gt;
&lt;code&gt;start&lt;/code&gt; and &lt;code&gt;finish&lt;/code&gt; means it completed.&lt;/p&gt;

&lt;p&gt;That is the kind of monitoring that actually helps when something breaks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters more than people think
&lt;/h2&gt;

&lt;p&gt;Silent cron failures are annoying because they usually do not show up as immediate downtime.&lt;/p&gt;

&lt;p&gt;They show up later.&lt;/p&gt;

&lt;p&gt;You find out your backup was broken when you need to restore it.&lt;br&gt;&lt;br&gt;
You find out a sync stopped when someone notices stale data.&lt;br&gt;&lt;br&gt;
You find out invoices were not generated after a customer asks about billing.&lt;br&gt;&lt;br&gt;
You find out reports were never sent because somebody complains.&lt;/p&gt;

&lt;p&gt;By then, the failure is old news. You are already dealing with the fallout.&lt;/p&gt;

&lt;p&gt;That is why "the server is healthy" is not enough, and "the job probably ran" is definitely not enough.&lt;/p&gt;

&lt;p&gt;For important scheduled work, you want direct visibility into whether the job started and whether it actually reached the end.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple rule
&lt;/h2&gt;

&lt;p&gt;Use one heartbeat when the job is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;short&lt;/li&gt;
&lt;li&gt;simple&lt;/li&gt;
&lt;li&gt;easy to validate with one completion event&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use a chained heartbeat approach when the job:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;takes time&lt;/li&gt;
&lt;li&gt;can fail midway&lt;/li&gt;
&lt;li&gt;has meaningful execution stages&lt;/li&gt;
&lt;li&gt;matters to the business if it only partially runs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That usually covers backups, syncs, imports, exports, billing tasks, ETL pipelines, and reporting jobs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thought
&lt;/h2&gt;

&lt;p&gt;For important cron jobs, "did it run?" is a weak question.&lt;/p&gt;

&lt;p&gt;A better one is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Did it start on time, and did it finish successfully?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is the difference between basic monitoring and useful monitoring.&lt;/p&gt;

&lt;p&gt;If you only track one heartbeat, you can miss a whole class of failures where the job started but never completed.&lt;/p&gt;




&lt;p&gt;I built this into upti.my as &lt;strong&gt;Heartbeat with Job Chain&lt;/strong&gt;, which lets jobs ping at different steps like &lt;code&gt;start&lt;/code&gt; and &lt;code&gt;finish&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you want to see the original post, it is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.upti.my/blog/why-cron-job-started-does-not-mean-finished" rel="noopener noreferrer"&gt;https://www.upti.my/blog/why-cron-job-started-does-not-mean-finished&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>devops</category>
      <category>monitoring</category>
      <category>backend</category>
    </item>
  </channel>
</rss>
