<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: All Quiet</title>
    <description>The latest articles on DEV Community by All Quiet (@allquiet).</description>
    <link>https://dev.to/allquiet</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F9123%2Fecf19124-1809-458a-9cc7-d1c82d97cc74.png</url>
      <title>DEV Community: All Quiet</title>
      <link>https://dev.to/allquiet</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/allquiet"/>
    <language>en</language>
    <item>
      <title>Migrating from Opsgenie to All Quiet: A Full Terraform-First Guide</title>
      <dc:creator>Mads Quist</dc:creator>
      <pubDate>Tue, 12 May 2026 15:33:44 +0000</pubDate>
      <link>https://dev.to/allquiet/migrating-from-opsgenie-to-all-quiet-a-full-terraform-first-guide-1i1o</link>
      <guid>https://dev.to/allquiet/migrating-from-opsgenie-to-all-quiet-a-full-terraform-first-guide-1i1o</guid>
      <description>&lt;p&gt;Originally published on 12 May 2026 on the &lt;a href="https://allquiet.app/blog/migrating-from-opsgenie-to-all-quiet-terraform-first-guide" rel="noopener noreferrer"&gt;All Quiet Tech Blog.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If your Opsgenie config already lives in Terraform, you can migrate methodically instead of clicking two consoles side by side. This guide translates users, teams, integrations, on-call schedules, escalations, and routing into All Quiet - complete with example HCL, migration checklist, and tips for running both tools in parallel before you switch.&lt;/p&gt;

&lt;p&gt;With the recent changes in the Atlassian ecosystem, many SRE and DevOps teams are finding themselves at a crossroads: adapt to the increasing complexity of Jira Service Management (JSM) or move to a leaner, more focused incident management platform.&lt;/p&gt;

&lt;p&gt;At All Quiet, we believe incident management should stay close to the code. That's why our platform is built to be managed via Terraform from day one. In this guide, we'll walk through a complete technical migration from Opsgenie Terraform resources to All Quiet, resource by resource, with real HCL on both sides.&lt;/p&gt;

&lt;p&gt;If you are still weighing vendors before you change tooling, start with our overview of &lt;a href="https://allquiet.app/opsgenie-alternative" rel="noopener noreferrer"&gt;All Quiet as an Opsgenie alternative&lt;/a&gt;, then use this article for the Terraform resource mapping and cutover checklist.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a Terraform-First Migration?
&lt;/h2&gt;

&lt;p&gt;If you're already managing Opsgenie via Terraform, you have an advantage: your entire on-call configuration is already codified. Rather than clicking through two UIs in parallel, you can translate your .tf files directly from one provider to the other, terraform plan the result, and cut over with confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Strategy: "Logic-First" Migration
&lt;/h2&gt;

&lt;p&gt;In Opsgenie, configuration is fragmented across six or more resource types: &lt;code&gt;opsgenie_user&lt;/code&gt;, &lt;code&gt;opsgenie_team&lt;/code&gt;, &lt;code&gt;opsgenie_api_integration&lt;/code&gt;, &lt;code&gt;opsgenie_schedule&lt;/code&gt;, &lt;code&gt;opsgenie_schedule_rotation&lt;/code&gt;, and &lt;code&gt;opsgenie_escalation&lt;/code&gt;. All Quiet centralizes this logic into fewer, more cohesive resources, most notably &lt;code&gt;allquiet_team_escalations&lt;/code&gt;, which unifies schedules, rotations, and escalation policies into a single resource that can't get out of sync.&lt;/p&gt;

&lt;p&gt;Here's the full resource mapping at a glance:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Opsgenie Resource&lt;/th&gt;
&lt;th&gt;All Quiet Resource&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;opsgenie_user&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;allquiet_user&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Standalone identity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;opsgenie_team&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;allquiet_team&lt;/code&gt; + &lt;code&gt;allquiet_team_membership&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;One membership resource per user–team pair&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;opsgenie_api_integration&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;allquiet_integration&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Team-owned, strongly typed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;opsgenie_schedule&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;allquiet_team_escalations&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Merged into unified resource&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;opsgenie_schedule_rotation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;allquiet_team_escalations&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Rotations live inside escalation tiers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;opsgenie_escalation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;allquiet_team_escalations&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Rules become escalation tiers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(routing within integration)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;allquiet_routing&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Explicit routing resource&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  1. Setting Up the Providers
&lt;/h2&gt;

&lt;p&gt;First, initialize your environment. You'll need your All Quiet API Key, which you can generate in Organization Settings &amp;gt; API Keys (requires Owner or Administrator role). See our &lt;a href="https://docs.allquiet.app/advanced/terraform" rel="noopener noreferrer"&gt;Terraform setup docs&lt;/a&gt; for the full walkthrough.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;required_providers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;allquiet&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"AllQuietApp/allquiet"&lt;/span&gt;
      &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&amp;gt;= 3.0.0"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="s2"&gt;"allquiet"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;api_key&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;allquiet_api_key&lt;/span&gt;
  &lt;span class="nx"&gt;api_region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"eu"&lt;/span&gt; &lt;span class="c1"&gt;# or "us" — must match your organization's data region&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_api_key"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;sensitive&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; We recommend creating your organization with a shared admin account (e.g., &lt;a href="mailto:admin@company.com"&gt;admin@company.com&lt;/a&gt;) rather than a personal email. This way, every "real" on-call user can be provisioned via Terraform, and you won't have a chicken-and-egg problem with the account that created the org.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Teams, Users, and Memberships
&lt;/h2&gt;

&lt;p&gt;In Opsgenie, users are standalone resources with roles, and team membership is defined inline within the team. This creates tight coupling, changing a user's team membership means editing the team resource.&lt;/p&gt;

&lt;p&gt;All Quiet separates these concerns into three distinct resources: the team, the user identity, and the membership link between them. Each membership is its own resource (&lt;code&gt;allquiet_team_membership&lt;/code&gt;), one resource per user–team pair. This allows for cleaner state management: adding or removing a single member doesn't trigger a plan change on the team or on any other member.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Opsgenie Way:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"opsgenie_user"&lt;/span&gt; &lt;span class="s2"&gt;"sre_lead"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;username&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"alex@company.com"&lt;/span&gt;
  &lt;span class="nx"&gt;full_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Alex Rivera"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"user"&lt;/span&gt;
  &lt;span class="nx"&gt;timezone&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Europe/Berlin"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"opsgenie_team"&lt;/span&gt; &lt;span class="s2"&gt;"devops"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"DevOps"&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Core DevOps and SRE team"&lt;/span&gt;

  &lt;span class="nx"&gt;member&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sre_lead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
    &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"admin"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;member&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;backend_eng&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
    &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"user"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The All Quiet Way:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 1. Define the team&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_team"&lt;/span&gt; &lt;span class="s2"&gt;"devops"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"DevOps"&lt;/span&gt;
  &lt;span class="nx"&gt;time_zone_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Europe/Berlin"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Define user identities&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_user"&lt;/span&gt; &lt;span class="s2"&gt;"sre_lead"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;email&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"alex@company.com"&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Alex Rivera"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_user"&lt;/span&gt; &lt;span class="s2"&gt;"backend_eng"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;email&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"jordan@company.com"&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Jordan Lee"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Link each user to the team via a dedicated membership resource (one per user–team pair)&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_team_membership"&lt;/span&gt; &lt;span class="s2"&gt;"devops_sre_lead"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;team_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;user_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sre_lead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Administrator"&lt;/span&gt; &lt;span class="c1"&gt;# "Administrator" or "Member"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_team_membership"&lt;/span&gt; &lt;span class="s2"&gt;"devops_backend_eng"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;team_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;user_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;backend_eng&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Member"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What changed:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Opsgenie's admin / user team roles map to All Quiet's Administrator / Member.&lt;/li&gt;
&lt;li&gt;Each membership is its own resource (&lt;code&gt;allquiet_team_membership&lt;/code&gt;), so adding or removing a single team member is a targeted change, no cascading diffs on the team or other members.&lt;/li&gt;
&lt;li&gt;Users provisioned via Terraform receive an invite to set their password. If you use SSO (OIDC, Google, or Microsoft), configure that first, see the &lt;a href="https://docs.allquiet.app/advanced/sso" rel="noopener noreferrer"&gt;SSO docs&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Integrations: Team-Owned and Strongly Typed
&lt;/h2&gt;

&lt;p&gt;Opsgenie requires separate resources for API integrations and their subsequent notification or routing actions. The integration itself is often a loose endpoint that you wire to teams via responders blocks.&lt;/p&gt;

&lt;p&gt;All Quiet treats integrations as team-owned endpoints with strongly typed integration types, so there's no guesswork about payload format.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Opsgenie Way:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"opsgenie_api_integration"&lt;/span&gt; &lt;span class="s2"&gt;"grafana"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Grafana-Alerts"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Grafana"&lt;/span&gt;
  &lt;span class="nx"&gt;owner_team_id&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;enabled&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;allow_write_access&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="nx"&gt;responders&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"team"&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The All Quiet Way:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_integration"&lt;/span&gt; &lt;span class="s2"&gt;"grafana"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;team_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Grafana Production"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Grafana"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it, no responders block, no allow_write_access flag. The integration belongs to the team, and the team's escalation policy handles notification logic.&lt;/p&gt;

&lt;p&gt;Common type mappings:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Opsgenie type&lt;/th&gt;
&lt;th&gt;All Quiet type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Datadog&lt;/td&gt;
&lt;td&gt;Datadog&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prometheus&lt;/td&gt;
&lt;td&gt;Prometheus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grafana&lt;/td&gt;
&lt;td&gt;Grafana&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch&lt;/td&gt;
&lt;td&gt;AmazonCloudWatch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API / Webhook&lt;/td&gt;
&lt;td&gt;Webhook&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The full list of supported integration types is available at: &lt;a href="https://allquiet.app/api/public/v1/inbound-integration/types" rel="noopener noreferrer"&gt;https://allquiet.app/api/public/v1/inbound-integration/types&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Treat the integration type and team assignment as fixed once created, changing them may require destroying and re-creating the resource, which generates a new webhook URL. Plan these carefully. After &lt;code&gt;terraform apply&lt;/code&gt;, the new webhook URL will be available. For each supported inbound integration type you can download a default Terraform snippet for payload mapping: use &lt;code&gt;https://allquiet.app/api/integrations/terraform/default/&amp;lt;Type&amp;gt;.tf&lt;/code&gt; where &lt;code&gt;&amp;lt;Type&amp;gt;&lt;/code&gt; is the exact integration type identifier (see the inbound integration types list above). For example, Datadog is &lt;a href="https://allquiet.app/api/integrations/terraform/default/Datadog.tf" rel="noopener noreferrer"&gt;https://allquiet.app/api/integrations/terraform/default/Datadog.tf&lt;/a&gt;; Grafana, Webhook, AmazonCloudWatch, and every other supported type follow the same URL pattern with their own type name.&lt;/p&gt;

&lt;p&gt;If you need to customize how payloads map to incidents (e.g., extracting severity from a specific JSON field), use the &lt;code&gt;allquiet_integration_mapping&lt;/code&gt; resource. If you don't define one, All Quiet uses sensible defaults for each integration type. The mapping supports JSONPath, XPath, regex, and static values and every incident maps to three key attributes: Status (Open/Resolved), Severity (Minor/Warning/Critical), and an optional Title.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Migration tip:&lt;/strong&gt; You can run both Opsgenie and All Quiet integrations in parallel during the transition period. Point your monitoring tools at both webhook URLs until you're confident in the All Quiet setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. On-Call Schedules and Escalations: The Unified Resource
&lt;/h2&gt;

&lt;p&gt;This is the most significant architectural difference between the two providers, and the biggest win in your Terraform-first migration.&lt;/p&gt;

&lt;p&gt;In Opsgenie, on-call configuration is spread across three separate resources that reference each other by ID. In All Quiet, it's all one resource: &lt;code&gt;allquiet_team_escalations&lt;/code&gt;. This resource follows a clear hierarchy: Escalation Tiers → Schedules → Rotations, which mirrors how on-call actually works: you have layers of people to notify (tiers), each layer has time-based coverage windows (schedules), and people rotate through those windows (rotations).&lt;/p&gt;

&lt;h3&gt;
  
  
  The Opsgenie Way (3 resources, fragile cross-references):
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"opsgenie_schedule"&lt;/span&gt; &lt;span class="s2"&gt;"devops_oncall"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"DevOps On-Call"&lt;/span&gt;
  &lt;span class="nx"&gt;timezone&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Europe/Berlin"&lt;/span&gt;
  &lt;span class="nx"&gt;enabled&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;owner_team_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"opsgenie_schedule_rotation"&lt;/span&gt; &lt;span class="s2"&gt;"devops_weekly"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;schedule_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_schedule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops_oncall&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Weekly Rotation"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"weekly"&lt;/span&gt;
  &lt;span class="nx"&gt;length&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="nx"&gt;start_date&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2024-01-01T09:00:00Z"&lt;/span&gt;

  &lt;span class="nx"&gt;participant&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"user"&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sre_lead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;participant&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"user"&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;backend_eng&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"opsgenie_escalation"&lt;/span&gt; &lt;span class="s2"&gt;"devops_escalation"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"DevOps Escalation"&lt;/span&gt;
  &lt;span class="nx"&gt;owner_team_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;rules&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;condition&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"if-not-acked"&lt;/span&gt;
    &lt;span class="nx"&gt;notify_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"default"&lt;/span&gt;
    &lt;span class="nx"&gt;delay&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;

    &lt;span class="nx"&gt;recipient&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"schedule"&lt;/span&gt;
      &lt;span class="nx"&gt;id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_schedule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops_oncall&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;rules&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;condition&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"if-not-acked"&lt;/span&gt;
    &lt;span class="nx"&gt;notify_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"default"&lt;/span&gt;
    &lt;span class="nx"&gt;delay&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;

    &lt;span class="nx"&gt;recipient&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"user"&lt;/span&gt;
      &lt;span class="nx"&gt;id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;repeat&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;wait_interval&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
    &lt;span class="nx"&gt;count&lt;/span&gt;                  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="nx"&gt;reset_recipient_states&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's 3 resources, 50+ lines, with IDs threaded between them. Delete the schedule without updating the escalation and you get a dangling reference.&lt;/p&gt;

&lt;h3&gt;
  
  
  The All Quiet Way (1 resource, self-contained):
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_team_escalations"&lt;/span&gt; &lt;span class="s2"&gt;"devops_oncall"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;team_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;escalation_tiers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;# TIER 1: On-call rotation — alert the person on duty&lt;/span&gt;
    &lt;span class="nx"&gt;repeats&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="nx"&gt;repeats_after_minutes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;

    &lt;span class="nx"&gt;schedules&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"DevOps Weekly Rotation"&lt;/span&gt;

      &lt;span class="nx"&gt;rotation_settings&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;rotation_mode&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"auto"&lt;/span&gt;
        &lt;span class="nx"&gt;auto_rotation_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;         &lt;span class="c1"&gt;# One person on-call at a time&lt;/span&gt;
        &lt;span class="nx"&gt;repeats&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"weekly"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="nx"&gt;rotations&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;members&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;team_membership_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team_membership&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops_sre_lead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nx"&gt;members&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;team_membership_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team_membership&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops_backend_eng&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;escalation_tiers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;# TIER 2: If Tier 1 exhausts its repeats, escalate to the manager&lt;/span&gt;
    &lt;span class="nx"&gt;repeats&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="nx"&gt;schedules&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Manager Escalation"&lt;/span&gt;

      &lt;span class="nx"&gt;rotations&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;members&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;team_membership_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team_membership&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything that was spread across &lt;code&gt;opsgenie_schedule&lt;/code&gt;, &lt;code&gt;opsgenie_schedule_rotation&lt;/code&gt;, and &lt;code&gt;opsgenie_escalation&lt;/code&gt; is now a single &lt;code&gt;allquiet_team_escalations&lt;/code&gt; resource. Schedules and rotations live inside escalation tiers, so there's no way for them to become orphaned. Every person, whether they're in a rotating schedule or a single-person escalation target, is referenced via their &lt;code&gt;team_membership_id&lt;/code&gt;, which keeps the dependency graph clean.&lt;/p&gt;

&lt;p&gt;Key mapping:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Opsgenie concept&lt;/th&gt;
&lt;th&gt;All Quiet equivalent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Escalation rule with delay&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;escalation_tiers&lt;/code&gt; with &lt;code&gt;repeats_after_minutes&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;opsgenie_schedule&lt;/code&gt; (time windows)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;schedules&lt;/code&gt; block within a tier, define on-call times (e.g., weekdays 08:00–18:00)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;opsgenie_schedule_rotation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;rotation_settings&lt;/code&gt; within a &lt;code&gt;schedules&lt;/code&gt; block, auto or explicit mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rotation participants&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;rotations&lt;/code&gt; → &lt;code&gt;members&lt;/code&gt; → &lt;code&gt;team_membership_id&lt;/code&gt; (always via membership)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiple schedules for follow-the-sun&lt;/td&gt;
&lt;td&gt;Multiple &lt;code&gt;schedules&lt;/code&gt; blocks in the same tier, each covering different hours/days&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;repeat&lt;/code&gt; block on escalation&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;repeats&lt;/code&gt; on the relevant tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recipient: schedule&lt;/td&gt;
&lt;td&gt;Tier with a &lt;code&gt;schedules&lt;/code&gt; block containing rotations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recipient: user&lt;/td&gt;
&lt;td&gt;Tier with one schedule, one rotation, one member&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Advanced patterns:&lt;/strong&gt; Round-robin alerting distributes incidents evenly when multiple people are on-call simultaneously. On-call overrides, both personal and team-level, can be managed via the &lt;code&gt;allquiet_on_call_override&lt;/code&gt; Terraform resource without touching the escalation config. See the &lt;a href="https://docs.allquiet.app/essentials/escalations" rel="noopener noreferrer"&gt;escalation docs&lt;/a&gt; for the full set of options.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note on complexity:&lt;/strong&gt; Opsgenie's delay/repeat model and All Quiet's tier-level repeats / repeats_after_minutes / auto_escalation_after_minutes don't map one-to-one in every case. Simple escalations translate cleanly, but complex multi-rule Opsgenie policies may need case-by-case tuning. We recommend testing each escalation path with a synthetic incident before cutting over.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note on time restrictions:&lt;/strong&gt; Opsgenie's time_restriction blocks on rotations (time-of-day and weekday-and-time-of-day) map to All Quiet's schedule on-call times. In All Quiet, each schedule defines its active hours and days directly (e.g., "Monday–Friday, 09:00–17:00"), which is more intuitive than Opsgenie's separate restriction blocks. Review these during migration.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Routing: The Incident Traffic Controller
&lt;/h2&gt;

&lt;p&gt;In Opsgenie, routing logic is often buried inside the integration itself (via responders blocks) or handled by notification policies. In All Quiet, routing is an explicit, first-class resource with a powerful rules engine. Each rule has three components: Conditions (when to trigger), Actions (what to do), and Channels (how to notify).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_routing"&lt;/span&gt; &lt;span class="s2"&gt;"prod_alerts"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;team_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Production Alert Routing"&lt;/span&gt;

  &lt;span class="nx"&gt;rules&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;# Mute Slack for test environment alerts, only send email&lt;/span&gt;
      &lt;span class="nx"&gt;conditions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;statuses&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Open"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;attributes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Environment"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;operator&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Test"&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;actions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;change_severity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Minor"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;channels&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;notification_channels&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Email"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;# For escalated critical incidents, also trigger the PagerDuty outbound webhook&lt;/span&gt;
      &lt;span class="nx"&gt;conditions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;statuses&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Open"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;severities&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Critical"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;intents&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Escalated"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;channels&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;outbound_integrations&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;allquiet_outbound_integration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pagerduty_webhook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Valid notification_channels values are "Email", "Push", "SMS", and "VoiceCall".&lt;/p&gt;

&lt;p&gt;Routing conditions can filter on severity, status, specific integrations, incident intents (created, escalated, resolved), custom payload attributes, and even time-of-day restrictions. Actions include discarding incidents, changing severity, assigning to other teams (within an Organization), adding interactions, and delaying execution. This replaces Opsgenie's scattered notification policies with a single, auditable, version-controlled resource.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pro tip:&lt;/strong&gt; For multi-team setups, you can create a "root team" that owns your integrations and uses routing rules to fan out incidents to the appropriate team based on payload attributes. See the &lt;a href="https://docs.allquiet.app/advanced/routing" rel="noopener noreferrer"&gt;routing docs&lt;/a&gt; for detailed examples.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration Checklist
&lt;/h2&gt;

&lt;p&gt;Here's the step-by-step order we recommend. The key dependency is that &lt;code&gt;allquiet_team_membership&lt;/code&gt; requires both the team and user to exist, and &lt;code&gt;allquiet_team_escalations&lt;/code&gt; references membership IDs, so teams, users, and memberships must all be in place before you build escalation tiers.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Set up the All Quiet Organization and generate your API key.&lt;/li&gt;
&lt;li&gt;Create teams (&lt;code&gt;allquiet_team&lt;/code&gt;) and provision users (&lt;code&gt;allquiet_user&lt;/code&gt;). These two have no dependency on each other and can be created in any order or in parallel.&lt;/li&gt;
&lt;li&gt;Link users to teams (&lt;code&gt;allquiet_team_membership&lt;/code&gt;), one resource per user–team pair. This requires both the team and user to exist.&lt;/li&gt;
&lt;li&gt;Create integrations (&lt;code&gt;allquiet_integration&lt;/code&gt;) for each monitoring source. Note the new webhook URLs.&lt;/li&gt;
&lt;li&gt;Customize payload mappings (&lt;code&gt;allquiet_integration_mapping&lt;/code&gt;) if the defaults don't fit your payload structure.&lt;/li&gt;
&lt;li&gt;Configure notification preferences (&lt;code&gt;allquiet_user_incident_notification_settings&lt;/code&gt;), this controls how each user gets alerted (push, SMS, voice call, email) and with what delay.&lt;/li&gt;
&lt;li&gt;Build escalation policies (&lt;code&gt;allquiet_team_escalations&lt;/code&gt;) by merging your Opsgenie schedules, rotations, and escalation rules into unified tiers. Rotations reference &lt;code&gt;team_membership_id&lt;/code&gt;, so memberships must exist first.&lt;/li&gt;
&lt;li&gt;Set up routing rules (&lt;code&gt;allquiet_routing&lt;/code&gt;) for any advanced alert routing.&lt;/li&gt;
&lt;li&gt;Set up outbound integrations (&lt;code&gt;allquiet_outbound_integration&lt;/code&gt;) for Slack, Microsoft Teams, or webhook notifications.&lt;/li&gt;
&lt;li&gt;Run both systems in parallel, point your monitoring tools at both Opsgenie and All Quiet webhook URLs for a burn-in period. Trigger test incidents to verify the full notification chain.&lt;/li&gt;
&lt;li&gt;Cut over, update webhook URLs to point only to All Quiet, then terraform destroy the Opsgenie resources.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why Switch to All Quiet?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bootstrapped and independent.&lt;/strong&gt; We aren't beholden to Private Equity, Venture Capital firms or enterprise conglomerates. We build for SREs, not for quarterly earnings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure as Code, natively.&lt;/strong&gt; Our Terraform provider isn't an afterthought, it's built to be the primary way you manage your on-call setup. Resources provisioned via Terraform are locked in the web app to prevent drift.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost efficiency.&lt;/strong&gt; Stop paying the "Atlassian Tax." All Quiet provides the same high-availability alerting at a fraction of the cost, with a transparent pricing model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Less fragmentation, less drift.&lt;/strong&gt; Opsgenie spreads on-call logic across separate schedule, rotation, and escalation resources that reference each other by ID. All Quiet collapses that into a single &lt;code&gt;allquiet_team_escalations&lt;/code&gt; resource, fewer cross-references means fewer ways for your Terraform state to diverge from reality.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ready to simplify your stack? Check out our &lt;a href="https://docs.allquiet.app/advanced/terraform" rel="noopener noreferrer"&gt;Terraform Provider documentation&lt;/a&gt; and start your migration today.&lt;/p&gt;

</description>
      <category>opsgenie</category>
      <category>terraform</category>
      <category>devops</category>
      <category>sre</category>
    </item>
    <item>
      <title>AWS Elastic IP failover with Keepalived: how we keep self-managed loadbalancers redundant</title>
      <dc:creator>Mads Quist</dc:creator>
      <pubDate>Mon, 11 May 2026 16:28:24 +0000</pubDate>
      <link>https://dev.to/allquiet/aws-elastic-ip-failover-with-keepalived-how-we-keep-self-managed-loadbalancers-redundant-489i</link>
      <guid>https://dev.to/allquiet/aws-elastic-ip-failover-with-keepalived-how-we-keep-self-managed-loadbalancers-redundant-489i</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on 10 May 2026 on the &lt;a href="https://allquiet.app/blog/elastic-ip-failover-with-keepalived-aws-ec2" rel="noopener noreferrer"&gt;All Quiet Tech Blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;At &lt;strong&gt;All Quiet&lt;/strong&gt; we build &lt;strong&gt;incident management&lt;/strong&gt;: alerting, on-call rotations, escalation, status pages, and integrations with the monitoring stacks teams already run. A meaningful slice of my job is keeping the boring edges boring, especially &lt;strong&gt;ingress&lt;/strong&gt;, when something breaks.&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;parts of our stack&lt;/strong&gt; we &lt;strong&gt;run loadbalancers ourselves&lt;/strong&gt; on EC2 instead of putting every path behind an AWS-managed balancer. We do that in part to &lt;strong&gt;avoid leaning too hard on higher-level AWS abstractions&lt;/strong&gt; for those tiers: we still rely on EC2 for reliable virtual machines, and we keep the design close to &lt;strong&gt;portable building blocks&lt;/strong&gt; so we could run the same pattern in &lt;strong&gt;another data center or provider&lt;/strong&gt; without a ground-up redesign. Once we made that choice, we still had a plain &lt;strong&gt;high availability (HA)&lt;/strong&gt; problem for the &lt;strong&gt;active-passive&lt;/strong&gt; pair: &lt;strong&gt;keep the public edge redundant.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For those tiers we use a &lt;strong&gt;small pattern&lt;/strong&gt;: a stable &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html" rel="noopener noreferrer"&gt;Elastic IP&lt;/a&gt; (EIP), the address we publish in DNS and the stand-in for a &lt;strong&gt;floating IP&lt;/strong&gt; on a traditional network; &lt;a href="https://www.keepalived.org/" rel="noopener noreferrer"&gt;Keepalived&lt;/a&gt; running &lt;a href="https://www.rfc-editor.org/rfc/rfc5798" rel="noopener noreferrer"&gt;Virtual Router Redundancy Protocol (VRRP)&lt;/a&gt; between peers; and the &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/APIReference/Welcome.html" rel="noopener noreferrer"&gt;EC2 API&lt;/a&gt;, mainly &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_AssignPrivateIpAddresses.html" rel="noopener noreferrer"&gt;&lt;code&gt;AssignPrivateIpAddresses&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_AssociateAddress.html" rel="noopener noreferrer"&gt;&lt;code&gt;AssociateAddress&lt;/code&gt;&lt;/a&gt;, to &lt;strong&gt;move&lt;/strong&gt; that EIP when mastership changes. We wire this with &lt;a href="https://docs.ansible.com/ansible/latest/getting_started/index.html" rel="noopener noreferrer"&gt;Ansible&lt;/a&gt; and the &lt;a href="https://docs.aws.amazon.com/cdk/v2/guide/home.html" rel="noopener noreferrer"&gt;AWS Cloud Development Kit (CDK)&lt;/a&gt; in our infrastructure repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem in AWS terms
&lt;/h2&gt;

&lt;p&gt;I grew up with patterns where a “floating IP” moves at layer 2 (L2) with gratuitous Address Resolution Protocol (ARP). &lt;a href="https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html" rel="noopener noreferrer"&gt;Amazon Virtual Private Cloud (VPC)&lt;/a&gt; doesn’t work like your favorite rack fabric: public routing for Elastic IPs is enforced by AWS’s control plane, tied to a specific Elastic Network Interface (ENI) and private address on an instance.&lt;/p&gt;

&lt;p&gt;So we split responsibilities deliberately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Between our servers&lt;/strong&gt;, we use Keepalived / VRRP, almost always &lt;strong&gt;unicast&lt;/strong&gt;, to decide which node is primary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Against AWS&lt;/strong&gt;, we run a script on &lt;code&gt;notify_master&lt;/code&gt; that calls the command-line interface (CLI) or API so the EIP actually attaches to the winner.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If we did only VRRP virtual-address tricks without &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_AssociateAddress.html" rel="noopener noreferrer"&gt;&lt;code&gt;AssociateAddress&lt;/code&gt;&lt;/a&gt;, we would not fix customer-visible public routing for that EIP. If we did only API moves without Keepalived, we’d lack a clean distributed agreement story on the pair. &lt;strong&gt;We need both layers.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture at a glance
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                    Elastic IP (stable in DNS)
                              │
                              ▼
              ┌───────────────────────────────┐
              │  EC2: EIP associated here     │
              │  (AssociateAddress, etc.)     │
              └───────────────────────────────┘
                              │
                  Our LB tier (e.g. HAProxy / nginx)
                              │
                           backends
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On both nodes we run Keepalived with unicast peers, priorities, and a &lt;code&gt;vrrp_script&lt;/code&gt; that reflects whether our LB process is actually alive (&lt;code&gt;systemctl&lt;/code&gt;, &lt;code&gt;curl&lt;/code&gt; to localhost, or whatever probe matches reality). When a node becomes MASTER, &lt;code&gt;notify_master&lt;/code&gt; runs our failover shell script: ensure a &lt;strong&gt;secondary private IP&lt;/strong&gt;, then associate the allocation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation sketch
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Kernel:&lt;/strong&gt; we often enable &lt;code&gt;ip_forward&lt;/code&gt; / &lt;code&gt;ip_nonlocal_bind&lt;/code&gt; where our &lt;a href="http://www.haproxy.org/" rel="noopener noreferrer"&gt;HAProxy&lt;/a&gt; or &lt;a href="https://nginx.org/en/docs/" rel="noopener noreferrer"&gt;nginx&lt;/a&gt; layout needs it. We validate per role, not globally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security groups:&lt;/strong&gt; &lt;a href="https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml" rel="noopener noreferrer"&gt;protocol 112&lt;/a&gt; (Virtual Router Redundancy Protocol, VRRP) allowed between peers, not the open internet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keepalived:&lt;/strong&gt; &lt;code&gt;notify_master&lt;/code&gt; logs to a rotated file; credentials via an &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html" rel="noopener noreferrer"&gt;Identity and Access Management (IAM) instance profile&lt;/a&gt; where we can.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instance identity:&lt;/strong&gt; in production we fetch &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html" rel="noopener noreferrer"&gt;instance metadata&lt;/a&gt; using &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html" rel="noopener noreferrer"&gt;Instance Metadata Service version 2 (IMDSv2)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Example Keepalived skeleton (placeholders only, not a drop-in):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;global_defs&lt;/span&gt; {
    &lt;span class="n"&gt;enable_script_security&lt;/span&gt;
    &lt;span class="n"&gt;script_user&lt;/span&gt; &lt;span class="n"&gt;root&lt;/span&gt;
    &lt;span class="n"&gt;vrrp_startup_delay&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
}

&lt;span class="n"&gt;vrrp_script&lt;/span&gt; &lt;span class="n"&gt;check_service&lt;/span&gt; {
    &lt;span class="n"&gt;script&lt;/span&gt; &lt;span class="s2"&gt;"/usr/bin/systemctl is-active --quiet nginx"&lt;/span&gt;
    &lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
    &lt;span class="n"&gt;weight&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
}

&lt;span class="n"&gt;vrrp_instance&lt;/span&gt; &lt;span class="n"&gt;VI_1&lt;/span&gt; {
    &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="n"&gt;MASTER&lt;/span&gt;
    &lt;span class="n"&gt;interface&lt;/span&gt; &lt;span class="n"&gt;eth0&lt;/span&gt;
    &lt;span class="n"&gt;unicast_src_ip&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;10&lt;/span&gt;
    &lt;span class="n"&gt;unicast_peer&lt;/span&gt; {
        &lt;span class="m"&gt;10&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;11&lt;/span&gt;
    }
    &lt;span class="n"&gt;virtual_router_id&lt;/span&gt; &lt;span class="m"&gt;51&lt;/span&gt;
    &lt;span class="n"&gt;priority&lt;/span&gt; &lt;span class="m"&gt;200&lt;/span&gt;
    &lt;span class="n"&gt;advert_int&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;track_script&lt;/span&gt; {
        &lt;span class="n"&gt;check_service&lt;/span&gt;
    }
    &lt;span class="n"&gt;notify_master&lt;/span&gt; &lt;span class="s2"&gt;"/etc/keepalived/aws-failover.sh &amp;gt;&amp;gt; /var/log/keepalived/aws-failover.log"&lt;/span&gt;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example failover script shape (replace IDs and IPs; use IMDSv2 in production):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="nv"&gt;ALLOCATION_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"eipalloc-REPLACE_ME"&lt;/span&gt;
&lt;span class="nv"&gt;PRIVATE_IP_SECONDARY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"10.0.0.50"&lt;/span&gt;
&lt;span class="nv"&gt;INTERFACE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"eth0"&lt;/span&gt;

&lt;span class="nv"&gt;INSTANCE_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-sf&lt;/span&gt; http://169.254.169.254/latest/meta-data/instance-id&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

ip addr add &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PRIVATE_IP_SECONDARY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/32"&lt;/span&gt; dev &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;INTERFACE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true

&lt;/span&gt;&lt;span class="nv"&gt;NI_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 describe-instances &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--instance-ids&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;INSTANCE_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Reservations[0].Instances[0].NetworkInterfaces[0].NetworkInterfaceId'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

aws ec2 assign-private-ip-addresses &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--network-interface-id&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NI_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--private-ip-addresses&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PRIVATE_IP_SECONDARY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

aws ec2 associate-address &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--allocation-id&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ALLOCATION_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--instance-id&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;INSTANCE_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--private-ip-address&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PRIVATE_IP_SECONDARY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tradeoffs of managing the edge ourselves
&lt;/h2&gt;

&lt;p&gt;When we &lt;strong&gt;self-manage&lt;/strong&gt; load balancer tiers instead of defaulting to AWS-managed front doors, we still need to evaluate the usual architectures: application or network load balancers, DNS failover, Kubernetes ingress, or the Elastic IP + Keepalived pattern this post describes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Application Load Balancer (ALB)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros / cons (for anyone choosing ALB):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Upside:&lt;/strong&gt; AWS-managed HA, &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/target-group-health-checks.html" rel="noopener noreferrer"&gt;health checks&lt;/a&gt;, Transport Layer Security (TLS) with &lt;a href="https://docs.aws.amazon.com/acm/latest/userguide/acm-overview.html" rel="noopener noreferrer"&gt;AWS Certificate Manager (ACM)&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/waf/latest/developerguide/waf-chapter.html" rel="noopener noreferrer"&gt;AWS Web Application Firewall (WAF)&lt;/a&gt;, and a clear scaling story for HTTP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downside:&lt;/strong&gt; cost at scale, less hands-on control over every packet and knob than raw EC2.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At All Quiet:&lt;/strong&gt; we rely on managed load balancing for paths where we want AWS to own HA end-to-end, including customer-facing HTTP. We treat ALB-class tooling as the default when we do not want to operate the edge ourselves.&lt;/p&gt;

&lt;h3&gt;
  
  
  Network Load Balancer (NLB)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros / cons (for anyone choosing NLB):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Upside:&lt;/strong&gt; &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html" rel="noopener noreferrer"&gt;TCP/UDP transparency&lt;/a&gt;, static IPs per Availability Zone (AZ), low listener overhead compared to full layer 7 (L7).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downside:&lt;/strong&gt; fewer HTTP-specific features than ALB; still another billable and operated AWS component.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At All Quiet:&lt;/strong&gt; when we need AWS-managed HA but not full layer 7 (L7) termination at the edge, NLB-style fits better than ALB; we don’t replace every self-managed tier with NLB, but it’s on the same “managed edge” side of the spectrum as ALB.&lt;/p&gt;

&lt;h3&gt;
  
  
  DNS failover (Route 53)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros / cons (for anyone using DNS failover):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Upside:&lt;/strong&gt; no instance-side EIP choreography; &lt;a href="https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-types.html" rel="noopener noreferrer"&gt;health-checked routing policies&lt;/a&gt; let AWS steer names.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downside:&lt;/strong&gt; DNS time to live (TTL) and caching stretch failover and failback; client stacks behave inconsistently; not a drop-in substitute for “one stable IP, instant swing.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At All Quiet:&lt;/strong&gt; DNS steering can complement other designs; we don’t rely on it alone when our mental model is exactly one Elastic IP jumping between two known EC2 nodes. That is what Keepalived plus the API covers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kubernetes / gateways (e.g. Amazon EKS)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros / cons (for anyone on Kubernetes):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Upside:&lt;/strong&gt; HA via &lt;a href="https://kubernetes.io/docs/concepts/services-networking/service/" rel="noopener noreferrer"&gt;Services&lt;/a&gt;, &lt;a href="https://kubernetes.io/docs/concepts/services-networking/ingress/" rel="noopener noreferrer"&gt;Ingress&lt;/a&gt; / &lt;a href="https://gateway-api.sigs.k8s.io/" rel="noopener noreferrer"&gt;Gateway API&lt;/a&gt;, and cloud LB integration, which gives different primitives than a bare-metal pair.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downside:&lt;/strong&gt; cluster operational tax; not every workload belongs there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At All Quiet:&lt;/strong&gt; this article describes a pair pattern centered on &lt;strong&gt;virtual machines (VMs)&lt;/strong&gt; because we still run meaningful tiers that way; where we use &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/what-is-eks.html" rel="noopener noreferrer"&gt;Amazon Elastic Kubernetes Service (EKS)&lt;/a&gt; or similar, ingress HA follows Kubernetes, not Keepalived on two fixed hosts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Elastic IP + Keepalived + EC2 API (this post)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros / cons (for anyone building like this):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Upside:&lt;/strong&gt; one stable public address in DNS; relatively few moving AWS objects; full control over timers and failover scripts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downside:&lt;/strong&gt; you own IAM, idempotent scripts, logging, monitoring, and ambiguous states deserve runbooks and checks such as &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeAddresses.html" rel="noopener noreferrer"&gt;&lt;code&gt;DescribeAddresses&lt;/code&gt;&lt;/a&gt;. Compared with a managed load balancer, cutover is not instantaneous on abrupt failure: traffic follows wherever the EIP is still associated until VRRP agrees on a new master, your health logic runs, and &lt;code&gt;notify_master&lt;/code&gt; finishes calling AWS. The gap depends on &lt;code&gt;advert_int&lt;/code&gt;, &lt;code&gt;vrrp_script&lt;/code&gt; intervals, preempt settings, and API behavior. Those knobs trade sensitivity against stability; sub‑millisecond failover is not what this pattern promises.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At All Quiet:&lt;/strong&gt; this is what we actually implemented for &lt;strong&gt;specific self-managed loadbalancer tiers&lt;/strong&gt;: Ansible-deployed Keepalived, scripts on &lt;code&gt;notify_master&lt;/code&gt;, &lt;a href="https://docs.aws.amazon.com/cdk/v2/guide/home.html" rel="noopener noreferrer"&gt;AWS Cloud Development Kit (CDK)&lt;/a&gt; and infrastructure as code (IaC) for the EIP and IAM. That is the same stack this article walks through at a pattern level.&lt;/p&gt;

&lt;h3&gt;
  
  
  How we pick among them
&lt;/h3&gt;

&lt;p&gt;Internally we ask: does this path’s &lt;strong&gt;service level objective (SLO) and budget&lt;/strong&gt; justify a managed LB? Do we need &lt;strong&gt;layer 7 (L7) features&lt;/strong&gt; only ALB gives us? Does &lt;strong&gt;scripted EIP failover&lt;/strong&gt; fit this path’s resilience expectations (see the EIP downside above)? If not, we promote the tier (ALB/NLB or another design); we don’t stretch EIP+Keepalived past where it fits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;We run self-managed LBs in some slices of our infra; we needed explicit public HA there, and EIP + Keepalived + EC2 API is our compact answer.&lt;/li&gt;
&lt;li&gt;VRRP decides who leads; &lt;code&gt;AssociateAddress&lt;/code&gt; decides where the EIP points.&lt;/li&gt;
&lt;li&gt;Managed ALB/NLB remain strong defaults when we want AWS to own HA at that layer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;We touch incident paths every day; when ingress misbehaves, people notice fast. If you operate similar edges, use this framing to decide when the pattern fits and when to promote that tier to managed load balancing instead.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>keepalived</category>
      <category>devops</category>
      <category>redundancy</category>
    </item>
    <item>
      <title>Why We Built a MongoDB-Message Queue and Reinvented the Wheel</title>
      <dc:creator>Mads Quist</dc:creator>
      <pubDate>Thu, 04 Jul 2024 04:33:19 +0000</pubDate>
      <link>https://dev.to/allquiet/why-we-built-a-mongodb-message-queue-and-reinvented-the-wheel-al3</link>
      <guid>https://dev.to/allquiet/why-we-built-a-mongodb-message-queue-and-reinvented-the-wheel-al3</guid>
      <description>&lt;p&gt;Hey👋&lt;/p&gt;

&lt;p&gt;I'm Mads Quist, founder of &lt;a href="https://allquiet.app?utm_source=DEV_post"&gt;All Quiet &lt;/a&gt;. We've implemented a home-grown message queue based on MongoDB and I'm here to talk about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why we re-invented the wheel&lt;/li&gt;
&lt;li&gt;How we re-invented the wheel&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  1. Why we re-invented the wheel
&lt;/h1&gt;

&lt;p&gt;Why do we need message queuing?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://allquiet.app?utm_source=DEV_post"&gt;All Quiet &lt;/a&gt; is a modern incident management platform, similar to &lt;a href="https://www.pagerduty.com"&gt;PagerDuty&lt;/a&gt;. Our platform requires features like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sending a double-opt-in email asynchronously after a user registers&lt;/li&gt;
&lt;li&gt;Sending a reminder email 24 hours after registration&lt;/li&gt;
&lt;li&gt;Sending push notifications with Firebase Cloud Messaging (FCM), which can fail due to network or load problems. As push notifications are crucial to our app, we need to retry sending them if there's an issue.&lt;/li&gt;
&lt;li&gt;Accepting emails from outside our integration and processing them into incidents. This process can fail, so we wanted to decouple it and process each email payload on a queue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4t6m6vzaxdr9coh6tmv.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4t6m6vzaxdr9coh6tmv.jpeg" alt="Image description" width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Our tech stack
&lt;/h2&gt;

&lt;p&gt;To understand our specific requirements, it's important to get some insights into our tech stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We run a monolithic web application based on .NET Core 7.
The .NET Core application runs in a Docker container.&lt;/li&gt;
&lt;li&gt;We run multiple containers in parallel.&lt;/li&gt;
&lt;li&gt;An HAProxy instance distributes HTTP requests equally to each container, ensuring a highly available setup.&lt;/li&gt;
&lt;li&gt;We use MongoDB as our underlying database, replicated across availability zones.&lt;/li&gt;
&lt;li&gt;All of the above components are hosted by AWS on generic EC2 VMs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why we re-invented the wheel
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;We desired a simple queuing mechanism that could run in multiple processes simultaneously while guaranteeing that each message was processed only once.&lt;/li&gt;
&lt;li&gt;We didn't need a pub/sub pattern.&lt;/li&gt;
&lt;li&gt;We didn't aim for a complex distributed system based on CQRS / event sourcing because, you know, the first rule of distributed systems is to not distribute.&lt;/li&gt;
&lt;li&gt;We wanted to keep things as simple as possible, following the philosophy of choosing "boring technology".&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ultimately, it's about minimizing the number of moving parts in your infrastructure. We aim to build fantastic features for our excellent customers, and it's imperative to maintain our services reliably. Managing a single database system to achieve more than five nines of uptime is challenging enough. So why burden yourself with managing an additional HA RabbitMQ cluster?&lt;/p&gt;

&lt;h2&gt;
  
  
  Why not just use AWS SQS?
&lt;/h2&gt;

&lt;p&gt;Yeah… cloud solutions like AWS SQS, Google Cloud Tasks, or Azure Queue Storage are fantastic! However, they would have resulted in vendor lock-in. We simply aspire to be independent and cost-effective while still providing a scalable service to our clients.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1k5msd7vcdptz7zj7bc2.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1k5msd7vcdptz7zj7bc2.jpeg" alt="Image description" width="680" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  2. How we re-invented the wheel
&lt;/h1&gt;

&lt;p&gt;What is a message queue?&lt;/p&gt;

&lt;p&gt;A message queue is a system that stores messages. Producers of messages store these in the queue, which are later dequeued by consumers for processing. This is incredibly beneficial for decoupling components, especially when processing messages is a resource-intensive task.&lt;/p&gt;

&lt;h2&gt;
  
  
  What characteristics should our queue show?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Utilizing MongoDB as our data storage&lt;/li&gt;
&lt;li&gt;Guaranteeing that each message is consumed only once&lt;/li&gt;
&lt;li&gt;Allowing multiple consumers to process messages simultaneously&lt;/li&gt;
&lt;li&gt;Ensuring that if message processing fails, retries are possible&lt;/li&gt;
&lt;li&gt;Enabling scheduling of message consumption for the future&lt;/li&gt;
&lt;li&gt;Not needing guaranteed ordering&lt;/li&gt;
&lt;li&gt;Ensuring high availability&lt;/li&gt;
&lt;li&gt;Ensuring messages and their states are durable and can withstand restarts or extended downtimes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MongoDB has significantly evolved over the years and can meet the criteria listed above.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;In the sections that follow, I'll guide you through the MongoDB-specific implementation of our message queue. While you'll need a client library suitable for your preferred programming language, such as NodeJS, Go, or C# in the case of All Quiet, the concepts I'll share are platform agnostic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Queues
&lt;/h3&gt;

&lt;p&gt;Each queue you want to utilize is represented as a dedicated collection in your MongoDB database.&lt;br&gt;
Message Model&lt;/p&gt;

&lt;p&gt;Here's an example of a processed message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "_id" : NumberLong(638269014234217933),
    "Statuses" : [
        {
            "Status" : "Processed",
            "Timestamp" : ISODate("2023-08-06T06:50:23.753+0000"),
            "NextReevaluation" : null
        },
        {
            "Status" : "Processing",
            "Timestamp" : ISODate("2023-08-06T06:50:23.572+0000"),
            "NextReevaluation" : null
        },
        {
            "Status" : "Enqueued",
            "Timestamp" : ISODate("2023-08-06T06:50:23.421+0000"),
            "NextReevaluation" : null
        }
    ],
    "Payload" : {
        "YourData" : "abc123"
    }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s look at each property of the message.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;_id&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;_id&lt;/code&gt; field is the canonical unique identifier property of MongoDB. Here, it contains a &lt;code&gt;NumberLong&lt;/code&gt;, not an &lt;code&gt;ObjectId&lt;/code&gt; . We need &lt;code&gt;NumberLong&lt;/code&gt; instead of &lt;code&gt;ObjectId&lt;/code&gt; because:&lt;/p&gt;

&lt;p&gt;While &lt;code&gt;ObjectId&lt;/code&gt; values should increase over time, they are not necessarily monotonic. This is because they:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and are generated by clients, which may have differing system clocks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In our C# implementation, we generate an &lt;code&gt;Id&lt;/code&gt; with millisecond precision and guaranteed ordering based on insertion time. Although we don't require strict processing order in a multi-consumer environment (similar to RabbitMQ), it's essential to maintain FIFO order when operating with just one consumer. Achieving this with &lt;code&gt;ObjectId&lt;/code&gt; is not feasible. If this isn't crucial for you, you can still use &lt;code&gt;ObjectId&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Statuses
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;Statuses&lt;/code&gt; property consists of an array containing the message processing history. At index &lt;code&gt;0&lt;/code&gt;, you'll find the current status, which is crucial for indexing.&lt;/p&gt;

&lt;p&gt;The status object itself contains three properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Status&lt;/code&gt;: Can be "Enqueued", "Processing", "Processed", or "Failed".&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Timestamp&lt;/code&gt;: This captures the current timestamp.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;NextReevaluation&lt;/code&gt;: Records when the next evaluation should occur, which is essential for both retries and future scheduled executions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Payload
&lt;/h3&gt;

&lt;p&gt;This property contains the specific payload of your message.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enqueuing a message
&lt;/h3&gt;

&lt;p&gt;Adding a message is a straightforward insert operation into the collection with the status set to &lt;code&gt;"Enqueued"&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For immediate processing, set &lt;code&gt;NextReevaluation&lt;/code&gt; to null.&lt;/li&gt;
&lt;li&gt;For future processing, set &lt;code&gt;NextReevaluation&lt;/code&gt; to a timestamp in the future, when you want your message to be processed.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db.yourQueueCollection.insert({
    "_id" : NumberLong(638269014234217933),
    "Statuses" : [
        {
            "Status" : "Enqueued",
            "Timestamp" : ISODate("2023-08-06T06:50:23.421+0000"),
            "NextReevaluation" : null
        }
    ],
    "Payload" : {
        "YourData" : "abc123"
    }
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dequeuing a message
&lt;/h3&gt;

&lt;p&gt;Dequeuing is slightly more complex but still relatively straightforward. It heavily relies on the concurrent atomic read and update capabilities of MongoDB.&lt;/p&gt;

&lt;p&gt;This essential feature of MongoDB ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each message is processed only once.&lt;/li&gt;
&lt;li&gt;Multiple consumers can safely process messages simultaneously.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db.yourQueueCollection.findAndModify({
   "query": {
      "$and": [
         {
            "Statuses.0.Status": "Enqueued"
         },
         {
            "Statuses.0.NextReevaluation": null
         }
      ]
   },
   "update": {
      "$push": {
         "Statuses": {
            "$each": [
               {
                  "Status": "Processing",
                  "Timestamp": ISODate("2023-08-06T06:50:23.800+0000"),
                  "NextReevaluation": null
               }
            ],
            "$position": 0
         }
      }
   }
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So we are reading one message that is in state &lt;code&gt;“Enqueued”&lt;/code&gt; and at the same time modify it by setting the status &lt;code&gt;“Processing”&lt;/code&gt; at position &lt;code&gt;0&lt;/code&gt;. Since this operation is atomic it will guarantee that the message will not be picked up by another consumer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Marking a message as processed
&lt;/h3&gt;

&lt;p&gt;Once the processing of the message is complete, it's a simple matter of updating the message status to &lt;code&gt;"Processed"&lt;/code&gt; using the message’s &lt;code&gt;id&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db.yourQueueCollection.findAndModify({
   "query": {
     "_id": NumberLong(638269014234217933)
   },
   "update": {
      "$push": {
         "Statuses": {
            "$each": [
               {
                  "Status": "Processed",
                  "Timestamp": ISODate("2023-08-06T06:50:24.100+0000"),
                  "NextReevaluation": null
               }
            ],
            "$position": 0
         }
      }
   }
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Marking a message as failed
&lt;/h3&gt;

&lt;p&gt;If processing fails, we need to mark the message accordingly. Often, you might want to retry processing the message. This can be achieved by re-enqueuing the message. In many scenarios, it makes sense to reprocess the message after a specific delay, such as 10 seconds, depending on the nature of the processing failure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db.yourQueueCollection.findAndModify({
   "query": {
     "_id": NumberLong(638269014234217933)
   },
   "update": {
      "$push": {
         "Statuses": {
            "$each": [
               {
                  "Status": "Failed",
                  "Timestamp": ISODate("2023-08-06T06:50:24.100+0000"),
                  "NextReevaluation": ISODate("2023-08-06T07:00:24.100+0000")
               }
            ],
            "$position": 0
         }
      }
   }
});

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The dequeuing loop
&lt;/h3&gt;

&lt;p&gt;We've established how we can easily enqueue and dequeue items from our "queue," which is, in fact, simply a MongoDB collection. We can even "schedule" messages for the future by leveraging the &lt;code&gt;NextReevaluation&lt;/code&gt; field.&lt;/p&gt;

&lt;p&gt;What's missing is how we will dequeue regularly. Consumers need to execute the &lt;code&gt;findAndModify&lt;/code&gt; command in some kind of loop. A straightforward approach would be to create an endless loop in which we dequeue and process a message. This method is straightforward and effective. However, it will exert considerable pressure on the database and the network.&lt;/p&gt;

&lt;p&gt;An alternative would be to introduce a delay, e.g., 100ms, between loop iterations. This will significantly reduce the load but will also decrease the speed of dequeuing.&lt;/p&gt;

&lt;p&gt;The solution to the problem is what MongoDB refers to as a &lt;a href="https://www.mongodb.com/docs/manual/changeStreams/"&gt;change stream&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  MongoDB Change Streams
&lt;/h3&gt;

&lt;p&gt;What are &lt;a href="https://www.mongodb.com/docs/manual/changeStreams/"&gt;change streams&lt;/a&gt;? I can’t explain it better than the guys at MongoDB:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Change streams allow applications to access real-time data changes […]. Applications can use change streams to subscribe to all data changes on a single collection […] and immediately react to them.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Great! What we can do is listen to newly created documents in our queue collection, which effectively means listening to newly enqueued messages&lt;/p&gt;

&lt;p&gt;This is dead simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const changeStream = db.yourQueueCollection.watch();
changeStream.on('insert', changeEvent =&amp;gt; {
  // Dequeue the message
  db.yourQueueCollection.findAndModify({
    "query": changeEvent.documentKey._id,
    "update": {
      "$push": {
         "Statuses": {
            "$each": [
               {
                  "Status": "Processing",
                  "Timestamp": ISODate("2023-08-06T06:50:24.100+0000"),
                  "NextReevaluation": null
               }
            ],
            "$position": 0
         }
      }
   }
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scheduled and Orphaned Messages
&lt;/h3&gt;

&lt;p&gt;The change stream approach, however, does not work for both scheduled and orphaned messages because there is obviously no change that we can listen to.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scheduled messages simply sit in the collection with the status &lt;code&gt;"Enqueued"&lt;/code&gt; and a &lt;code&gt;"NextReevaluation"&lt;/code&gt; field set to the future.&lt;/li&gt;
&lt;li&gt;Orphaned messages are those that were in the &lt;code&gt;"Processing"&lt;/code&gt; status when their consumer process died. They remain in the collection with the status &lt;code&gt;"Processing"&lt;/code&gt; but no consumer will ever change their status to &lt;code&gt;"Processed"&lt;/code&gt; or &lt;code&gt;"Failed"&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For these use cases, we need to revert to our simple loop. However, we can use a rather generous delay between iterations.&lt;/p&gt;

&lt;h1&gt;
  
  
  Wrapping it up
&lt;/h1&gt;

&lt;p&gt;"Traditional" databases, like MySQL, PostgreSQL, or MongoDB (which I also view as traditional), are incredibly powerful today. If used correctly (ensure your indexes are optimized!), they are swift, scale impressively, and are cost-effective on traditional hosting platforms.&lt;/p&gt;

&lt;p&gt;Many use cases can be addressed using just a database and your preferred programming language. It's not always necessary to have the "right tool for the right job," meaning maintaining a diverse set of tools like Redis, Elasticsearch, RabbitMQ, etc. Often, the maintenance overhead isn't worth it.&lt;/p&gt;

&lt;p&gt;While the solution proposed might not match the performance of, for instance, RabbitMQ, it's usually sufficient and can scale to a point that would mark significant success for your startup.&lt;/p&gt;

&lt;p&gt;Software engineering is about navigating trade-offs. Choose yours wisely.&lt;/p&gt;

</description>
      <category>mongodb</category>
      <category>csharp</category>
      <category>dotnet</category>
      <category>eventdriven</category>
    </item>
  </channel>
</rss>
