<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mads Quist</title>
    <description>The latest articles on DEV Community by Mads Quist (@mads_quist).</description>
    <link>https://dev.to/mads_quist</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1704496%2Fb77175fb-ee33-4d49-bb51-f5067ca9654d.png</url>
      <title>DEV Community: Mads Quist</title>
      <link>https://dev.to/mads_quist</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mads_quist"/>
    <language>en</language>
    <item>
      <title>Why We Built Live Call Routing the Lean Way</title>
      <dc:creator>Mads Quist</dc:creator>
      <pubDate>Thu, 25 Jun 2026 10:29:00 +0000</pubDate>
      <link>https://dev.to/mads_quist/why-we-built-live-call-routing-the-lean-way-4e9m</link>
      <guid>https://dev.to/mads_quist/why-we-built-live-call-routing-the-lean-way-4e9m</guid>
      <description>&lt;p&gt;When email is not enough, you need a human on the phone. Live Call Routing should not require a sales cycle or telephony markup. Here is how All Quiet built it the lean way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why We Built Live Call Routing the Lean Way
&lt;/h3&gt;

&lt;p&gt;Most on-call setups run on automated alerts, and that covers the majority of pages. It stops being enough when a production database goes down at 3:00 AM, or when a high-priority customer is in a full outage. A support email or chat message does not always cut it. Sometimes you need a human on the phone.&lt;/p&gt;

&lt;p&gt;If you have looked at &lt;a href="https://allquiet.app/glossary/what-is-live-call-routing" rel="noopener noreferrer"&gt;Live Call Routing&lt;/a&gt; in other incident tools, voice often sits behind enterprise pricing or a reseller markup. We wanted the opposite: a straight path from caller to on-call engineer, without a sales call in the middle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The All Quiet approach to voice&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No enterprise gating:&lt;/strong&gt; Live Call Routing is available on our &lt;a href="https://allquiet.app/pricing" rel="noopener noreferrer"&gt;Pro plan&lt;/a&gt;. No "call for quote" required.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bring Your Own Provider (BYOP):&lt;/strong&gt; Connect your existing &lt;a href="https://allquiet.app/integrations/inbound/twilio" rel="noopener noreferrer"&gt;Twilio&lt;/a&gt; account via API. You keep your numbers and your provider relationship.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero markup:&lt;/strong&gt; You pay &lt;a href="https://allquiet.app/integrations/inbound/twilio" rel="noopener noreferrer"&gt;Twilio&lt;/a&gt; directly. We do not take a cut or markup your telephony rates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dev-first config:&lt;/strong&gt; Manage your &lt;a href="https://allquiet.app/glossary/what-is-interactive-voice-response" rel="noopener noreferrer"&gt;Interactive Voice Response (IVR)&lt;/a&gt; via our UI or via &lt;a href="https://docs.allquiet.app/advanced/terraform" rel="noopener noreferrer"&gt;Terraform&lt;/a&gt; for infrastructure-as-code teams.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  The "contact sales for pricing" wall
&lt;/h4&gt;

&lt;p&gt;Setting up a live hotline or an IVR menu with some VC-backed competitors usually ends the same way: "Contact Sales for Enterprise Pricing."&lt;/p&gt;

&lt;p&gt;For those vendors, voice routing is a lever to push smaller teams into annual contracts and, often, to resell telephony at a margin. You end up paying a middleman for infrastructure you could run yourself.&lt;/p&gt;

&lt;p&gt;We are revenue-funded, not growth-at-all-costs. If you need a &lt;a href="https://allquiet.app/glossary/what-is-a-duty-phone" rel="noopener noreferrer"&gt;duty phone&lt;/a&gt;for your infrastructure, you should not need a procurement cycle to get one.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why we chose Bring Your Own Provider (BYOP)
&lt;/h4&gt;

&lt;p&gt;When we designed call routing, we had two options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Resell phone numbers and charge a premium on every minute.&lt;/li&gt;
&lt;li&gt;Build routing logic that plugs into the &lt;a href="https://allquiet.app/glossary/what-is-voice-over-ip" rel="noopener noreferrer"&gt;VoIP&lt;/a&gt; stack you already run.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We chose the second. All Quiet integrates with &lt;a href="https://allquiet.app/integrations/inbound/twilio" rel="noopener noreferrer"&gt;Twilio&lt;/a&gt; so you wire in your API keys, map your &lt;a href="https://allquiet.app/glossary/what-are-virtual-on-call-phone-numbers" rel="noopener noreferrer"&gt;virtual on-call numbers&lt;/a&gt;, and keep paying your provider's rates.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero telephony markup:&lt;/strong&gt; You pay Twilio directly. We handle &lt;a href="https://allquiet.app/glossary/what-is-inbound-call-routing" rel="noopener noreferrer"&gt;inbound call routing&lt;/a&gt; and on-call logic; we do not touch your minute billing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data stays with you:&lt;/strong&gt; &lt;a href="https://allquiet.app/glossary/what-is-call-logging" rel="noopener noreferrer"&gt;Call recordings&lt;/a&gt; and provider-side logs remain in your Twilio account. We run the routing and &lt;a href="https://allquiet.app/glossary/what-is-automated-incident-creation" rel="noopener noreferrer"&gt;incident creation&lt;/a&gt; so the right engineer gets the ring, without copying your voice data into a separate silo.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  SRE-first design: UI-friendly, Terraform-ready
&lt;/h3&gt;

&lt;p&gt;Enterprise phone systems are built for call centers. Menus go deep, labels use telecom jargon, and the admin UI assumes a dedicated phone team. We built for DevOps and SRE workflows instead.&lt;/p&gt;

&lt;h4&gt;
  
  
  1. IVRs should not require a certification
&lt;/h4&gt;

&lt;p&gt;Our visual builder lets you map Press 1 to Overnight Squad in a few clicks. The point is to set up a bridge during a mid-level incident without opening a vendor manual. Less cognitive load when the pager is already loud.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Infrastructure as Code (yes, even for your phone tree)
&lt;/h4&gt;

&lt;p&gt;A good UI helps for quick changes. Most teams still want version control, review, and repeatability for anything that affects production response paths.&lt;/p&gt;

&lt;p&gt;All Quiet is API-first and Terraform-ready. You can manage Live Call Routing the same way you manage AWS or GCP resources. If you already treat &lt;a href="https://allquiet.app/blog/infrastructure-as-code-is-not-an-add-on-for-incident-management" rel="noopener noreferrer"&gt;on-call configuration as code&lt;/a&gt;, your phone tree can follow the same workflow, including &lt;a href="https://allquiet.app/glossary/what-is-an-escalation-path-for-phone-calls" rel="noopener noreferrer"&gt;escalation paths&lt;/a&gt; when the first responder does not answer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Available for everyone, not just enterprise
&lt;/h3&gt;

&lt;p&gt;We did not put Live Call Routing behind an Enterprise tier. It ships on Pro because a five-person startup handling a customer outage deserves the same voice path as a larger org.&lt;/p&gt;

&lt;p&gt;All Quiet is a lean incident platform for the people who actually run the systems. Without a massive sales org to feed, we can ship features that solve on-call problems instead of features that inflate contract size.&lt;/p&gt;

&lt;p&gt;Want to wire up your first dev-friendly hotline? &lt;a href="https://allquiet.app/signup/start-free-trial" rel="noopener noreferrer"&gt;Start a free trial&lt;/a&gt; and connect your &lt;a href="https://allquiet.app/integrations/inbound/twilio" rel="noopener noreferrer"&gt;Twilio integration&lt;/a&gt; in a few minutes.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>incidents</category>
    </item>
    <item>
      <title>On-Call is the daily business; Incident Management is a Philosophy</title>
      <dc:creator>Mads Quist</dc:creator>
      <pubDate>Thu, 18 Jun 2026 10:27:00 +0000</pubDate>
      <link>https://dev.to/mads_quist/on-call-is-the-daily-business-incident-management-is-a-philosophy-4104</link>
      <guid>https://dev.to/mads_quist/on-call-is-the-daily-business-incident-management-is-a-philosophy-4104</guid>
      <description>&lt;h3&gt;
  
  
  If Your On-Call Strategy is Just "Make it Louder," We Need to Talk.
&lt;/h3&gt;

&lt;p&gt;What exactly is a "better pager"? Maybe it has a cleaner UI, a louder alert, a fancier dashboard? Or is it just the industry's equivalent to replacing a warning light with a brighter bulb and calling it innovation?&lt;/p&gt;

&lt;p&gt;The truth is that the pager was never the problem when it comes to &lt;a href="https://allquiet.app/incident-management" rel="noopener noreferrer"&gt;incident management software&lt;/a&gt;. Making it look better won't magically erase the chaos, and it certainly won't invite clarity. A coherent and structured system is the real golden goose; a philosophy that teams can follow, a new way of working that replaces the panic-inducing surprises with manageable, predictable events.&lt;/p&gt;

&lt;p&gt;In short: on-call is the "who," but incident management is the "how."&lt;/p&gt;

&lt;p&gt;On-call is straightforward: it's the schedule, the rotation, the person behind the phone when the alert screams that something's wrong. It's the human on the other end of the chaos whose dinner goes cold while they put out the fire.&lt;/p&gt;

&lt;p&gt;Incident management is everything that happens around that moment of panic: the structure that determines what's escalated, &lt;a href="https://allquiet.app/incident-response" rel="noopener noreferrer"&gt;how information flows between systems&lt;/a&gt;, who communicates with whom and how the team learns from what happened. It's the difference between "someone's been alerted" and "we know exactly how to respond to this."&lt;/p&gt;

&lt;p&gt;A healthy incident management philosophy answers questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What incidents are important enough to wake someone up?&lt;/li&gt;
&lt;li&gt;How do we make sure the right person gets the right alert?&lt;/li&gt;
&lt;li&gt;What information should the alert include?&lt;/li&gt;
&lt;li&gt;How do we communicate internally and externally?&lt;/li&gt;
&lt;li&gt;How do we learn from these events and prevent them in the future?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your system isn't answering these questions, then your pager is probably doing all the work... and that's usually when burnout happens.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why does philosophy matter for SRE or DevOps leaders?
&lt;/h3&gt;

&lt;p&gt;Here's something many leaders don't often say out loud:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Psychological safety is operational infrastructure, not an engineering luxury.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Teams need a clear incident management philosophy to follow, otherwise the emotional and cognitive load of engineers skyrockets. It usually manifests in one of two predictable (and equally damaging) ways.&lt;/p&gt;

&lt;h4&gt;
  
  
  Scenario A: Alert fatigue
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://www.darkreading.com/cyber-risk/56-of-large-companies-handle-1-000-security-alerts-each-day" rel="noopener noreferrer"&gt;Over half of large companies get 1000+ security alerts a day&lt;/a&gt;. A day. And 93% of them can't even be addressed on the same day.&lt;/p&gt;

&lt;p&gt;If your engineers are constantly bombarded with problems they physically can't solve, they'll either tune out or stop distinguishing between important and unimportant signals. Or, worst of all, they'll become completely numb to the noise (hello, burnout).&lt;/p&gt;

&lt;p&gt;The human brain wasn't meant to be crammed with as much information as it is today. No one can meaningfully respond to hundreds, let alone thousands, of alerts in an 8-hour workday, and they can't be expected to either. It'll only lead to exhausted engineers, missed incidents and a team that slowly loses trust in the alerting system (and their leaders).&lt;/p&gt;

&lt;p&gt;Engineers aren't falling asleep on the job because they're bored, but because they're exhausted. They can't be asked to do the impossible.&lt;/p&gt;

&lt;h4&gt;
  
  
  Scenario B: The needle-in-a-haystack
&lt;/h4&gt;

&lt;p&gt;Almost the opposite of scenario A, yet just as harmful, involves engineers trying to triage everything, all at once. They comb through every alert, every log line, every single anomaly and cross their tired fingers that they'll eventually catch the one that matters.&lt;/p&gt;

&lt;p&gt;But all this does is create a sense of failure. It perpetuates the idea that no matter how hard they work, they'll never keep up. The sheer volume of alerts means they're always behind, trying to stay afloat in a sea of noise without a life raft.&lt;/p&gt;

&lt;p&gt;And you don't have to be a genius to know where that ends up: they drown in the waves of problems they can't solve. It eats away at their confidence, motivation and psychological safety, making them feel incapable when, really, the system itself is unmanageable.&lt;/p&gt;

&lt;h4&gt;
  
  
  The real issue
&lt;/h4&gt;

&lt;p&gt;When all's said and done, the real problem is the system, not the people. And system problems need system thinking. Without a set of guiding principles, teams default to survival mode rather than logic. Survival mode isn't a long-term strategy and it's the quickest road to burnout, high turnover and operational chaos.&lt;/p&gt;

&lt;h3&gt;
  
  
  "I need an alert" vs "I need a system"
&lt;/h3&gt;

&lt;p&gt;The real deal: the mindset shift that separates resilient engineering organizations from those that are constantly fighting fires.&lt;/p&gt;

&lt;h4&gt;
  
  
  Surface-level fix
&lt;/h4&gt;

&lt;p&gt;"I need an alert" is the pager-centric mindset. It's the "solve the immediate symptom and deal with the outcome later" mentality that fails to address the underlying complexity of incident response. A simple pager can't solve:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prioritization: which issues matter most and why?&lt;/li&gt;
&lt;li&gt;Routing: who's best equipped to handle this?&lt;/li&gt;
&lt;li&gt;Context: what information does the responder need?&lt;/li&gt;
&lt;li&gt;Communication: who needs to be informed and why?&lt;/li&gt;
&lt;li&gt;Learning: what did we discover and how do we prevent it from happening again?&lt;/li&gt;
&lt;li&gt;Prevention: how do we strengthen the system long-term?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Relying on alerts alone is like seeing a dashboard light on your car and thinking, "Well, time to buy a new engine." Alerts tell you something happened, but not what, why or how to stop it from happening again.&lt;/p&gt;

&lt;h4&gt;
  
  
  Structural fix
&lt;/h4&gt;

&lt;p&gt;A real incident management system allows teams to respond effectively and sustainably by creating:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Focus: engineers see only what really matters.&lt;/li&gt;
&lt;li&gt;Continuity: incidents don't disappear into Slack threads.&lt;/li&gt;
&lt;li&gt;Predictability: everyone knows the drill and understands the playbook.&lt;/li&gt;
&lt;li&gt;Accountability: someone's responsible for handling a task without blame.&lt;/li&gt;
&lt;li&gt;Learning loops: incidents become learning opportunities instead of recurring nightmares.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is that golden moment where incident management stops being a "tool" and becomes a philosophy. It can shape culture, reduce stress and improve reliability (and your engineers will thank you for it).&lt;/p&gt;

&lt;h3&gt;
  
  
  How All Quiet helps teams build chaos-free philosophies
&lt;/h3&gt;

&lt;p&gt;Tools don't create philosophies, but they can reinforce them. All Quiet is built to support the kind of system modern engineering teams need; not by adding more noise, but by creating clarity.&lt;/p&gt;

&lt;h4&gt;
  
  
  Noise reduction that actually works
&lt;/h4&gt;

&lt;p&gt;Not every alert needs human attention; some resolve themselves, while others need three engineers sweating over them with an Olympic swimming pool of cappuccinos. Some alerts are even duplicates, and some simply aren't important at all. But how do you know which one is which when they all look the same at first glance?&lt;/p&gt;

&lt;p&gt;All Quiet knows. It helps teams &lt;a href="https://allquiet.app/smart-alert-handling" rel="noopener noreferrer"&gt;filter out the nois&lt;/a&gt;e so engineers can focus on the important stuff. It's not just reducing the number of alerts, but allowing engineers to put their trust in the alerting system itself. If the engineer knows the alert is meaningful, they respond faster and more confidently.&lt;/p&gt;

&lt;h4&gt;
  
  
  Built-in learning
&lt;/h4&gt;

&lt;p&gt;Every incident is an opportunity to strengthen the system. All Quiet makes it easy to capture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What happened&lt;/li&gt;
&lt;li&gt;Why it happened&lt;/li&gt;
&lt;li&gt;How to prevent it in the future.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Rather than constant stress cycles, incidents can be embedded in the organization's memory. They build a culture of continuous improvement where incidents can be fully understood, which ultimately leads to easy prevention for the future.&lt;/p&gt;

&lt;h4&gt;
  
  
  Routing based on actual knowledge
&lt;/h4&gt;

&lt;p&gt;When a real incident hits, All Quiet uses the alert's attributes, like the service, the component, the impact, to route it to the right person. So Susan in accounting won't suddenly be slapped with 31 alerts she has no idea what to do with; the right person will know exactly what to do. This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No more "everything goes to whoever's on-call"&lt;/li&gt;
&lt;li&gt;No more guessing who should handle what&lt;/li&gt;
&lt;li&gt;No more accidental routes (sorry, Susan)&lt;/li&gt;
&lt;li&gt;No more unnecessary escalations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The incident finds its home quickly, with the person who's fully equipped to fix it the fastest. And shortens the resolution time in the process.&lt;/p&gt;

&lt;h4&gt;
  
  
  Communication that builds trust
&lt;/h4&gt;

&lt;p&gt;Clear communication is a must-have for any business, but it's one of the most overlooked parts of incident management. With All Quiet, teams communicate internally, so everyone knows who's handling what, and externally through &lt;a href="https://allquiet.app/status-pages" rel="noopener noreferrer"&gt;status pages&lt;/a&gt; and &lt;a href="https://allquiet.app/integrations" rel="noopener noreferrer"&gt;outbound updates&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Believe it or not, transparent communication increases trust. In fact, &lt;a href="https://hbr.org/2017/01/the-neuroscience-of-trust" rel="noopener noreferrer"&gt;employees in high-trust workplaces experience 74% less stress and 40% less burnout&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;And customers love it too—they expect competence, not perfection. They want to know that when something breaks, the right person will pick it up and fix it rather than handing it off to someone else.&lt;/p&gt;

&lt;h4&gt;
  
  
  The result: a team that doesn't fear the pager
&lt;/h4&gt;

&lt;p&gt;At the end of the day, incident management isn't about louder alerts and shinier dashboards with all the bells and whistles. It's not even about who can stay awake the longest (cough DevOps engineers cough); it's about building a system that protects your people as much as your platform.&lt;/p&gt;

&lt;p&gt;Teams with clarity, structure and a shared philosophy feel like they can weather any storm, no matter how unpredictable. That way, engineers know what to do, leaders know what to expect and customers know they're in good hands.&lt;/p&gt;

&lt;p&gt;The end goal isn't less incidents, but less chaotic incidents; not perfect uptime, but predictable and sustainable responses; not heroics, but healthy and confident teams who trust the system they're working with. Strong philosophies mean the pager is just another tool rather than the entire strategy. The system supports the human behind it, on-call stops being something to fear and starts being something your team handles calmly and proudly.&lt;/p&gt;

&lt;p&gt;If your current approach feels like you're stranded on a desert island in the middle of the Atlantic, don't blame your engineers. Your system is asking too much and giving too little, but the right one (with the right tools to support it) can build an environment where incidents are manageable and your team can finally breathe again.&lt;/p&gt;

&lt;p&gt;A better pager won't get you there. A better system will. Make the choice easy and &lt;a href="https://meetings-eu1.hubspot.com/nkoeppl/allquiet-product-demo?uuid=70cac173-7385-4059-95cb-7bec57aa1baf" rel="noopener noreferrer"&gt;talk to us today&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>oncall</category>
      <category>incident</category>
    </item>
    <item>
      <title>Infrastructure as Code (IaC) is Not an Add-On</title>
      <dc:creator>Mads Quist</dc:creator>
      <pubDate>Tue, 16 Jun 2026 10:17:00 +0000</pubDate>
      <link>https://dev.to/mads_quist/infrastructure-as-code-iac-is-not-an-add-on-3ehb</link>
      <guid>https://dev.to/mads_quist/infrastructure-as-code-iac-is-not-an-add-on-3ehb</guid>
      <description>&lt;p&gt;When rotations drift and no one remembers who changed what, the pager still works but trust erodes. Incident management belongs in Git, reviewed and applied like the rest of your infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  If It's Not in Git, It Doesn't Exist: Why IaC Isn't An Add-On for &lt;a href="https://allquiet.app/incident-management" rel="noopener noreferrer"&gt;Incident Management Platforms&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Engineering teams are no stranger to the unwanted moment of opening a config page, tilting their heads and saying, "Huh... that's not what I expected."&lt;/p&gt;

&lt;p&gt;Is it a rotation that doesn't match the team doc? An integration that looks a bit different from the one in staging? Maybe it's a schedule that was definitely updated last quarter but now looks a bit scant.&lt;/p&gt;

&lt;p&gt;Nothing's broken, no one's being paged unnecessarily, the incident management software is doing exactly what it's supposed to, but something's out of sync and no one can remember when or why it changed.&lt;/p&gt;

&lt;p&gt;Since these small friction points don't cause outages, they're more likely to go unseen. But that's exactly where the problems start. They accumulate. They're a sign that the configuration has slowly drifted away from whatever the team thought the source of truth was; which brings us to Infrastructure as Code.&lt;/p&gt;

&lt;h4&gt;
  
  
  What is Infrastructure as Code (IaC)?
&lt;/h4&gt;

&lt;p&gt;IaC is the idea that your infrastructure, and all the little operational details around it, should be defined in code, stored in Git, reviewed like every other change and applied consistently. It replaces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"I swear I updated that"&lt;/li&gt;
&lt;li&gt;"Who clicked this?"&lt;/li&gt;
&lt;li&gt;"Why does staging look different from prod?"&lt;/li&gt;
&lt;li&gt;"Wait, when did this change?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...with a single, reliable answer: Git history.&lt;/p&gt;

&lt;p&gt;And it's nothing fancy either. It's simply a better way to align humans and systems. It's discipline; it's the decision to treat operational configuration with the same rigor as application code. That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No undocumented UI changes&lt;/li&gt;
&lt;li&gt;No relying on memory&lt;/li&gt;
&lt;li&gt;No tribal knowledge&lt;/li&gt;
&lt;li&gt;No "just tweak it real quick" edits.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And as teams grow (and responsibilities spread across platform, SRE, DevOps and product engineering), a shared, reviewable, auditable source of truth becomes non-negotiable.&lt;/p&gt;

&lt;p&gt;Interestingly, IaC adoption often starts with infrastructure provisioning. Operational workflows like incident management follow later as teams mature, even though things like &lt;a href="https://allquiet.app/on-call/scheduling" rel="noopener noreferrer"&gt;schedules&lt;/a&gt;, rotations, integrations and everything that keeps a team responsive, benefit from IaC just as much as VPSs, clusters or load balancers.&lt;/p&gt;

&lt;p&gt;That's where tools like All Quiet (and its &lt;a href="https://docs.allquiet.app/advanced/terraform" rel="noopener noreferrer"&gt;Terraform provider&lt;/a&gt;) come in. They give you a clean, modern incident management experience, all while allowing you to manage it the same way you manage the rest of your infrastructure (which is predictably, I hope).&lt;/p&gt;

&lt;h3&gt;
  
  
  The real problem with manual rotations
&lt;/h3&gt;

&lt;p&gt;We all love incident management tools. They keep teams responsive, informed, and most importantly, sane. But when you rely on manual updates for scheduling, rotations and integrations, you introduce some classic DevOps villains.&lt;/p&gt;

&lt;h4&gt;
  
  
  Configuration drift
&lt;/h4&gt;

&lt;p&gt;Silent and sneaky, it waits until your head hits the pillow to show itself. It slowly erodes your confidence in what's deployed and is the main reason staging and production sometimes feel like distant cousins instead of twins.&lt;/p&gt;

&lt;h4&gt;
  
  
  Human error
&lt;/h4&gt;

&lt;p&gt;Not because engineers are careless, they're just busy (sometimes too busy). Manual updates rely on memory, timing and attention, all of which are finite resources and ever-more dwindling during midnight crises.&lt;/p&gt;

&lt;h4&gt;
  
  
  Zero visibility
&lt;/h4&gt;

&lt;p&gt;"Who changed this? When? Why? How? Oh, it was me... right." If the answer requires Slack thread excavations, you've already lost.&lt;/p&gt;

&lt;h4&gt;
  
  
  Multi-team complexity
&lt;/h4&gt;

&lt;p&gt;The more teams you have, the more likely someone will accidentally summon chaos. Especially when each team has slightly different processes, naming conventions or expectations.&lt;/p&gt;

&lt;p&gt;Manual configuration isn't bad, it's just not scalable. And incident management is one of the last places you want surprises.&lt;/p&gt;

&lt;h3&gt;
  
  
  Terraform 101 (explained like you're a smart engineer who just wants the short version)
&lt;/h3&gt;

&lt;p&gt;Now we're getting to the good stuff.&lt;/p&gt;

&lt;p&gt;To keep things simple, Terraform is basically Git for your infrastructure's memory.&lt;/p&gt;

&lt;p&gt;You write down how you want your world to look, let's say your schedules, your &lt;a href="https://allquiet.app/integrations" rel="noopener noreferrer"&gt;integrations&lt;/a&gt; and your escalation paths, and Terraform waves a magic wand and makes it real. Sounds like pie in the sky but it's actually much simpler than it sounds. The workflow looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write the configuration: Declare what you want, not how you want it.&lt;/li&gt;
&lt;li&gt;Plan the change: Terraform shows you exactly what will happen before anything happens.&lt;/li&gt;
&lt;li&gt;Review the diff: Humans get to sanity-check the machine.&lt;/li&gt;
&lt;li&gt;Apply with confidence: Terraform updates the real world to match your code.&lt;/li&gt;
&lt;li&gt;Audit forever: Every change lives in Git. Forever.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Easy peasy.&lt;/p&gt;

&lt;p&gt;And thanks to the All Quiet Terraform provider, the same lifecycle applies to your incident management setup. Suddenly your on-call world becomes code, versioned, documented, safe from Friday-afternoon edits.&lt;/p&gt;

&lt;p&gt;If you want Terraform explained in a more philosophical, AI-generated way, the provider is surprisingly patient at explaining that too (I say with experience).&lt;/p&gt;

&lt;h3&gt;
  
  
  All Quiet + Terraform: A match made in DevOps heaven
&lt;/h3&gt;

&lt;p&gt;Here's where it gets a bit more fun (bear with me, now).&lt;/p&gt;

&lt;p&gt;Rather than treating IaC like a bolt-on, All Quiet's Terraform provider treats it like royalty. Everything you configure, from schedules to &lt;a href="https://allquiet.app/on-call/rotations" rel="noopener noreferrer"&gt;rotations&lt;/a&gt; and everything in between, can live in Git. It can go through pull requests and follow the same DevOps lifecycle your infrastructure already does (cheers in DevOp).&lt;/p&gt;

&lt;p&gt;A few reasons the dynamic duo works so well:&lt;/p&gt;

&lt;h4&gt;
  
  
  No more mystery changes
&lt;/h4&gt;

&lt;p&gt;Every update has a commit, a diff and a human attached. In other words, accountability becomes second-nature.&lt;/p&gt;

&lt;h4&gt;
  
  
  Centralized control
&lt;/h4&gt;

&lt;p&gt;Terraform enforces who can create integrations and schedules. You don't need to give every engineer admin access to your incident management tool.&lt;/p&gt;

&lt;h4&gt;
  
  
  Consistency across teams
&lt;/h4&gt;

&lt;p&gt;Everyone follows the same pattern, naming conventions and lifecycle, whether across two teams or 20.&lt;/p&gt;

&lt;h4&gt;
  
  
  Predictability
&lt;/h4&gt;

&lt;p&gt;Terraform doesn't forget to update the rota because it was hungry and the rugby starts at 7 p.m. (neither do your engineers... ideally).&lt;/p&gt;

&lt;p&gt;All Quiet gives you a clean, modern incident management experience, while Terraform keeps that experience consistent, scalable and drift-free.&lt;/p&gt;

&lt;h3&gt;
  
  
  What IaC unlocks for incident management
&lt;/h3&gt;

&lt;p&gt;IaC is much more than just a nice idea. It's a force multiplier that your future self (and on-call engineers) will thank you for. Here's why.&lt;/p&gt;

&lt;h4&gt;
  
  
  Predictability
&lt;/h4&gt;

&lt;p&gt;Git always knows what's deployed, which means you always know too.&lt;/p&gt;

&lt;h4&gt;
  
  
  Auditability
&lt;/h4&gt;

&lt;p&gt;No more detective work or moments of temporary amnesia when every change is fully documented.&lt;/p&gt;

&lt;h4&gt;
  
  
  Reproducibility
&lt;/h4&gt;

&lt;p&gt;Need a new team? A new rotation? Maybe a new integration? Copy, paste, apply. Done.&lt;/p&gt;

&lt;h4&gt;
  
  
  Governance without bureaucracy
&lt;/h4&gt;

&lt;p&gt;Centralized control without slowing teams down or creating bottlenecks.&lt;/p&gt;

&lt;h4&gt;
  
  
  Less cognitive load
&lt;/h4&gt;

&lt;p&gt;Your team doesn't have to remember how to "do the thing." They just write the code.&lt;/p&gt;

&lt;h4&gt;
  
  
  An example of managing on-call via code
&lt;/h4&gt;

&lt;p&gt;Here's a little taste of what managing on-call with Terraform might look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_team"&lt;/span&gt; &lt;span class="s2"&gt;"my_team"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"My Team"&lt;/span&gt;
  &lt;span class="nx"&gt;time_zone_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"America/Los_Angeles"&lt;/span&gt;
  &lt;span class="nx"&gt;incident_engagement_report_settings&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;day_of_week&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"mon"&lt;/span&gt;
    &lt;span class="nx"&gt;time&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"09:00"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;labels&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Product"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Services"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"Operations"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_schedule"&lt;/span&gt; &lt;span class="s2"&gt;"backend_team"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Backend Team On-Call"&lt;/span&gt;
  &lt;span class="nx"&gt;rotation&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;users&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"alice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"bob"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"charlie"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"1w"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Readable, reviewable, reproducible. And most importantly: no surprises.&lt;/p&gt;

&lt;h3&gt;
  
  
  The cultural shift to IaC as a first-class citizen
&lt;/h3&gt;

&lt;p&gt;The nuance that most IaC articles miss is that IaC isn't just tooling, but a mindset.&lt;/p&gt;

&lt;p&gt;Most organizations prioritize coding infrastructure, CI/CD and observability long before incident management, even though it's one of the most critical and high-impact systems you have. Incident management deserves the same lifecycle and version control as every other system; maybe even more.&lt;/p&gt;

&lt;p&gt;Think of it this way: if you can spin up a Kubernetes cluster via Terraform but can't page the right person in the middle of the night, is it really worth it?&lt;/p&gt;

&lt;h3&gt;
  
  
  The future is declarative (and much less painful)
&lt;/h3&gt;

&lt;p&gt;Incident management isn't chaotic by nature; on the contrary, it's meant to reduce chaos by telling you what's not calm. It only becomes chaotic when the underlying configuration drifts, mutates or hides in a UI somewhere.&lt;/p&gt;

&lt;p&gt;By treating incident management like code (especially with a provider built for the DevOps lifecycle), you calm the chaos. You invite consistency and visibility, which all lead to more control.&lt;/p&gt;

&lt;p&gt;And with All Quiet + Terraform, your setup respects your workflow, your teams and your sleep schedule. Your future on-call engineers will never know the panic-inducing chaos you saved them from. Keep those futures safe and &lt;a href="https://meetings-eu1.hubspot.com/nkoeppl/allquiet-product-demo?uuid=70cac173-7385-4059-95cb-7bec57aa1baf" rel="noopener noreferrer"&gt;talk to us today&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>devops</category>
    </item>
    <item>
      <title>Top Opsgenie Alternatives and Migration Targets: How to Transition in 2026</title>
      <dc:creator>Mads Quist</dc:creator>
      <pubDate>Tue, 09 Jun 2026 10:24:00 +0000</pubDate>
      <link>https://dev.to/mads_quist/top-opsgenie-alternatives-and-migration-targets-how-to-transition-in-2026-589a</link>
      <guid>https://dev.to/mads_quist/top-opsgenie-alternatives-and-migration-targets-how-to-transition-in-2026-589a</guid>
      <description>&lt;p&gt;Atlassian recently announced the official end-of-life for Opsgenie. Organizations must now prepare for a full service shutdown on April 5, 2027. To maintain reliable on-call schedules and incident response, teams need an effective Opsgenie migration strategy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flab81lmonzi1vk8jnoqi.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flab81lmonzi1vk8jnoqi.jpeg" alt="Opsgenie Migration Meme" width="500" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Quick Answer: Opsgenie End of Life (EOL) Facts&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Final Shutdown Date:&lt;/strong&gt; April 5, 2027. Support ends and Atlassian deletes all remaining data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New Subscription Cutoff:&lt;/strong&gt; June 4, 2025. No new trials or accounts after this date.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing Reality:&lt;/strong&gt; Many Opsgenie replacements cost more. With All Quiet, you typically gain modern features and reduce spend. &lt;a href="https://allquiet.app/customer-case-studies/uberspace-at-all-quiet" rel="noopener noreferrer"&gt;See the Uberspace customer story&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Official Atlassian Path:&lt;/strong&gt; Jira Service Management (JSM) serves as the migration destination for existing Atlassian customers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Opsgenie Migration Timeline: Key Dates
&lt;/h3&gt;

&lt;p&gt;Plan your budget and vendor selection according to these critical milestones.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Changes&lt;/th&gt;
&lt;th&gt;Action Item&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;June 4, 2025&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sales end for new subscriptions.&lt;/td&gt;
&lt;td&gt;Finalize your vendor shortlist.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;October 2025&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Early shutdowns for some JSM users.&lt;/td&gt;
&lt;td&gt;Begin data export for integrated accounts.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;April 17, 2026&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Potential read-only restrictions.&lt;/td&gt;
&lt;td&gt;Test your parallel alerting system.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;April 5, 2027&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Service shutdown and end of support.&lt;/td&gt;
&lt;td&gt;Complete all migration tasks.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Post-April 2027&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Atlassian deletes all customer data.&lt;/td&gt;
&lt;td&gt;Archive all audit logs and history.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Top Opsgenie Alternatives for 2026
&lt;/h3&gt;

&lt;p&gt;Smart teams use this transition to rethink their incident management stack. Use this comparison to find the best fit for your organization.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;All Quiet&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Calm alerting, simple setup, clear pricing.&lt;/td&gt;
&lt;td&gt;Focuses on essential agility over bloat.&lt;/td&gt;
&lt;td&gt;Teams of all sizes looking for clarity and simplicity in their incident response workflows.&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;JSM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Official Atlassian path, ticketing focus.&lt;/td&gt;
&lt;td&gt;High administrative overhead and costs.&lt;/td&gt;
&lt;td&gt;Atlassian-heavy enterprises.&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PagerDuty&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mature ecosystem, deep automation.&lt;/td&gt;
&lt;td&gt;Expensive and often noisy.&lt;/td&gt;
&lt;td&gt;Large scale enterprises.&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rootly&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Excellent Slack-first coordination.&lt;/td&gt;
&lt;td&gt;Requires a separate paging layer.&lt;/td&gt;
&lt;td&gt;Workflow-centric teams.&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;incident.io&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Great incident coordination, templates, timelines, and retros.&lt;/td&gt;
&lt;td&gt;Paging/on-call needs to be added as paid add-on.&lt;/td&gt;
&lt;td&gt;Product/engineering teams improving incident process maturity.&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  What to Look For When Replacing Opsgenie
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Migrate your intent, not your chaos.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Evaluate alternatives based on these critical factors to ensure your next tool provides a genuine upgrade:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Noise Control:&lt;/strong&gt; Look for grouping, deduplication, and precise routing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Usability:&lt;/strong&gt; Ensure schedules and overrides remain simple to manage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow Integration:&lt;/strong&gt; Link alerts to resolutions within a single interface.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Migration Surface:&lt;/strong&gt; Prioritize tools with API coverage and Terraform support.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total Cost:&lt;/strong&gt; Consider admin time and cognitive load, not just the bill.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Deep Dive: The Best Opsgenie Alternatives
&lt;/h3&gt;

&lt;h4&gt;
  
  
  All Quiet: The Modern Choice for Lean Teams
&lt;/h4&gt;

&lt;p&gt;Many teams find that legacy tools increase cognitive load during incidents. All Quiet takes a different approach. We designed the product to reduce noise, keep integrations tight, and make on-call schedules predictable again.&lt;/p&gt;

&lt;p&gt;Teams migrating to All Quiet benefit from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rapid Onboarding:&lt;/strong&gt; Configure your organization without a dedicated owner.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified Flow:&lt;/strong&gt; Manage the full lifecycle from alert to status pages.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Noise Suppression:&lt;/strong&gt; Use smart grouping to prevent alert fatigue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clear Total Cost of Ownership (TCO):&lt;/strong&gt; Reduce your total cost of ownership with transparent pricing and lean setup.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://allquiet.app/opsgenie-alternative" rel="noopener noreferrer"&gt;Compare All Quiet vs Opsgenie&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Opsgenie Migration Resources
&lt;/h4&gt;

&lt;p&gt;If you're actively migrating, these posts go deeper on strategy and implementation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://allquiet.app/blog/migrating-from-opsgenie-to-all-quiet" rel="noopener noreferrer"&gt;Step-by-Step Migration Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://allquiet.app/blog/migrating-from-opsgenie-to-all-quiet" rel="noopener noreferrer"&gt;Terraform (IaC) Guide Part I&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.toTerraform%20(IaC)%20Guide%20Part%20II"&gt;Terraform (IaC) Guide Part II&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a&gt;The SRE Perspective: All Quiet vs. ITSM Bloat&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Jira Service Management: The Official Path
&lt;/h4&gt;

&lt;p&gt;Atlassian moves Opsgenie features into Jira Service Management. This path works for organizations that prioritize ITSM processes and ticket-based operations. However, ticketing platforms often pull teams into heavy processes that can slow down incident response.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://allquiet.app/jira-service-management-alternative" rel="noopener noreferrer"&gt;Compare All Quiet vs Jira Service Management (JSM Premium)&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  PagerDuty: The Enterprise Standard
&lt;/h4&gt;

&lt;p&gt;PagerDuty offers a mature ecosystem with deep automation. It suits large organizations with massive scale. The primary challenge is complexity: without strict governance, teams often recreate the noise issues they intended to solve. In practice, a lot of teams only need a fraction of the platform (often ~20% of the features), but still have to pay the full price.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://allquiet.app/pagerduty-alternative" rel="noopener noreferrer"&gt;Compare All Quiet vs PagerDuty&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Specialized Coordination: Rootly and incident.io
&lt;/h4&gt;

&lt;p&gt;These tools focus on Slack-first coordination and post-incident hygiene. They excel at workflow maturity but treat on-call as an expensive paid add-on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://allquiet.app/rootly-alternative" rel="noopener noreferrer"&gt;Compare All Quiet vs Rootly&lt;/a&gt;&lt;br&gt;
&lt;a href="https://allquiet.app/incident-io-alternative" rel="noopener noreferrer"&gt;Compare All Quiet vs incident.io&lt;/a&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  Practical Opsgenie Migration Checklist
&lt;/h4&gt;

&lt;p&gt;Treat your migration as a parallel-run project to minimize risk during the transition.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audit&lt;/strong&gt;: Inventory all current teams, integrations, and routing rules.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Export&lt;/strong&gt;: Save your on-call history and audit logs early.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model&lt;/strong&gt;: Choose between a ticket-first (ITSM) or engineering-first (Alerting) model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel Run:&lt;/strong&gt; Route alerts to Opsgenie and your new tool to verify configuration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate&lt;/strong&gt;: Run game days to test escalations and ownership.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cutover&lt;/strong&gt;: Switch integrations one at a time with clear rollback steps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shutdown&lt;/strong&gt;: Confirm all data is archived before the 2027 deletion.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Final Thoughts
&lt;/h3&gt;

&lt;p&gt;The Opsgenie EOL deadline is an opportunity to move toward a calmer incident response culture. If you need a replacement that is fast to deploy and designed to reduce cognitive load, All Quiet is built for this moment.&lt;/p&gt;

</description>
      <category>opsgenie</category>
      <category>incident</category>
      <category>oncall</category>
      <category>devops</category>
    </item>
    <item>
      <title>Top Incident Management Solutions: Best Incident Management Software in 2026</title>
      <dc:creator>Mads Quist</dc:creator>
      <pubDate>Thu, 04 Jun 2026 10:25:00 +0000</pubDate>
      <link>https://dev.to/mads_quist/top-incident-management-solutions-best-incident-management-software-in-2026-3g6b</link>
      <guid>https://dev.to/mads_quist/top-incident-management-solutions-best-incident-management-software-in-2026-3g6b</guid>
      <description>&lt;h4&gt;
  
  
  Modern teams aren't struggling because they lack data, but because they're drowning in it. Here's how the top incident management tools in 2026 stack up.
&lt;/h4&gt;

&lt;p&gt;Every engineering team understands that moment where an incident hits and suddenly all sense of structure ceases to exist.&lt;/p&gt;

&lt;p&gt;Someone starts digging through logs, dashboards and old Slack threads in the hopes of uncovering the source of the problem, while someone else swears that everything was "fine yesterday." Meanwhile, your incident management tool is offering up everything but the one thing you actually need: clarity.&lt;/p&gt;

&lt;p&gt;And then, right in the middle of the world falling down around you, it hits you:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Modern teams aren't struggling because they lack data, but because they're drowning in it.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The tools built for yesterday's enterprise aren't the ones you need when you're in the middle of chaos. But the tools of today certainly are.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Old Guard vs the New Wave
&lt;/h3&gt;

&lt;p&gt;Incident management has split into two clear camps: the Old Guard and the New Wave. The former are powerful, feature-stuffed platforms built for large teams and scaling enterprises. They can do almost anything, if you have the budget, time, and patience to set them up.&lt;/p&gt;

&lt;p&gt;The latter takes the opposite approach. They're leaner, calmer, and designed for how teams actually work today. They create order, not chaos, and reduce noise rather than amplify it. In other words, alerting tools that don't have you biting your nails to the quick mid-incident.&lt;/p&gt;

&lt;p&gt;Teams aren't chasing "more features" anymore. They just want something fast, simple, and drama-free.&lt;/p&gt;

&lt;p&gt;Let's take a closer look at the top incident management tools in 2026.&lt;/p&gt;

&lt;h3&gt;
  
  
  At a Glance
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Pros&lt;/th&gt;
&lt;th&gt;Cons&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Pricing&lt;/th&gt;
&lt;th&gt;Complexity&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;All Quiet&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple setup, integrated workflows, unlimited notifications, low TCO&lt;/td&gt;
&lt;td&gt;Not built for extreme enterprise customization&lt;/td&gt;
&lt;td&gt;Small–mid teams, startups, scaleups&lt;/td&gt;
&lt;td&gt;Standard: $4.99/user/mo · Pro: $9.99 · Enterprise: custom&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PagerDuty&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enterprise-grade workflows, deep integrations, reliable alerting&lt;/td&gt;
&lt;td&gt;Expensive, complex, noisy if not tuned&lt;/td&gt;
&lt;td&gt;Large enterprises, SRE-heavy orgs&lt;/td&gt;
&lt;td&gt;Free · Professional: $21/user/mo · Business: $41 · Enterprise: custom&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Opsgenie&lt;/strong&gt; &lt;br&gt;&lt;em&gt;(deprecated 2027)&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;Strong inside Atlassian ecosystem, solid alerting&lt;/td&gt;
&lt;td&gt;Deprecation risk, Jira lock-in, migration required&lt;/td&gt;
&lt;td&gt;Jira-centric teams (short-term)&lt;/td&gt;
&lt;td&gt;Free · Essentials: $9.45/user/mo · Standard: $19.95 · Enterprise: $31.90&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ilert&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strong uptime monitoring, EU hosting, AI-assisted workflows&lt;/td&gt;
&lt;td&gt;Expensive, add-ons increase cost&lt;/td&gt;
&lt;td&gt;Monitoring-heavy teams, EU compliance&lt;/td&gt;
&lt;td&gt;Pro: €19/user/mo · Scale: €39 · Enterprise: €49&lt;/td&gt;
&lt;td&gt;Medium–High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Zenduty&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Highly customizable workflows, strong Slack/Teams integration&lt;/td&gt;
&lt;td&gt;Complex setup, higher learning curve&lt;/td&gt;
&lt;td&gt;Large support orgs, multi-team setups&lt;/td&gt;
&lt;td&gt;Starter: $5/user/mo · Growth: $14 · Enterprise: custom&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Want a deeper, tool-by-tool comparison?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://allquiet.app/pagerduty-alternative" rel="noopener noreferrer"&gt;All Quiet vs PagerDuty&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://allquiet.app/opsgenie-alternative" rel="noopener noreferrer"&gt;All Quiet vs Opsgenie&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://allquiet.app/ilert-alternative" rel="noopener noreferrer"&gt;All Quiet vs ilert&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://allquiet.app/zenduty-alternative" rel="noopener noreferrer"&gt;All Quiet vs Zenduty&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  The New Wave of Incident Management Tools
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Lean, integrated and modern
&lt;/h4&gt;

&lt;p&gt;Before we look at the Old Guard, let's start off with a tool that's carving a new direction for incident response in 2026. It's simpler, calmer, and built for real teams; but more importantly, it's quieter.&lt;/p&gt;

&lt;p&gt;All Quiet: The Modern Solution for Teams Seeking Less Noise and More Clarity&lt;br&gt;
It's not called All Quiet for fun. It's built to cut the nonsense and reduce noise. All Quiet doesn't like complexity, and it's not trying to be a control tower for every planet in the Milky Way. Its mission is simplicity: clear alerts, integrated workflows, and a calm on-call experience without the 200-setting scavenger hunt.&lt;/p&gt;

&lt;p&gt;And when it's the middle-of-the-night, your brain is running at 40%, and your cat is judging your life choices, the lower cognitive load can make all the difference.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pros vs Cons
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;All Quiet&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple, intuitive setup&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;No multi-day configuration needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integrated workflows&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Alerts → triage → resolution in one flow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noise reduction&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Designed to prevent alert fatigue&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast onboarding&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;New engineers can use it immediately&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transparent pricing&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;No surprise add-ons&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise-grade complexity&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Not built for extreme customization (by design)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Integrated Workflows
&lt;/h4&gt;

&lt;p&gt;Rather than separating it into individual steps, All Quiet brings incident reporting together: it treats it as one continuous flow, from alert to resolution, without you having to switch between multiple different tabs and tools.&lt;/p&gt;

&lt;p&gt;Here's what it looks like from start to finish:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Alerts come in clean, without duplicates or noise&lt;/li&gt;
&lt;li&gt;On-call schedules and escalations are automatically connected&lt;/li&gt;
&lt;li&gt;The right people get notified instantly&lt;/li&gt;
&lt;li&gt;Everything is handled in the same place as the alert&lt;/li&gt;
&lt;li&gt;Resolution steps are tracked without manual admin&lt;/li&gt;
&lt;li&gt;Post-incident follow-ups are built into the workflow&lt;/li&gt;
&lt;li&gt;You go back to dreaming about effortless incident response&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No integrations held together with superglue, no detective work, and no "where did this alert come from?" Just simple, smooth monitoring.&lt;/p&gt;

&lt;p&gt;And when you're currently paying for a monitoring tool, an alerting tool, an on-call tool, a collaboration tool, and a post-mortem tool… wouldn't one system that handles everything take the weight off?&lt;/p&gt;

&lt;h4&gt;
  
  
  Lower Cost of Ownership
&lt;/h4&gt;

&lt;p&gt;Subscription price matters, but what about the total cost of ownership? We're talking less time spent configuring, maintaining, and babysitting the tool, and more time spent actually shipping and building features. With All Quiet, setup takes hours, not weeks. There's no configuration marathon; onboarding is, quite simply, plug and play.&lt;/p&gt;

&lt;p&gt;And the noise reduction pays for itself: fewer alerts, fewer distractions, and far fewer sleepless nights. Plus, since workflows are fully integrated, you're not bouncing between ten different tools to resolve one incident.&lt;/p&gt;

&lt;p&gt;Best of all, All Quiet's pricing is predictable and low-budget. No hidden fees or add-ons once you get set up, and no budget creep. It provides low-maintenance software that does what it's supposed to do.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pricing
&lt;/h4&gt;

&lt;p&gt;All Quiet keeps their pricing intentionally simple, with no hidden add-ons or "enterprise-only" features, and no surprise fees for SMS or phone alerts.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan&lt;/th&gt;
&lt;th&gt;Price (per user/month)&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Key Features&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Standard&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$4.99&lt;/td&gt;
&lt;td&gt;Small teams&lt;/td&gt;
&lt;td&gt;Unlimited users, incidents, integrations, SMS/phone/email/push alerts, all monitoring types, on-call schedules, escalation policies, mobile apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$9.99&lt;/td&gt;
&lt;td&gt;Multi-team orgs&lt;/td&gt;
&lt;td&gt;Everything in Standard plus: status pages, OIDC + SCIM, Terraform provider, public REST API, cross-team collaboration, advanced reporting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Enterprise&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Custom&lt;/td&gt;
&lt;td&gt;Larger orgs with compliance needs&lt;/td&gt;
&lt;td&gt;Everything in Pro plus: advanced auditing/logging, custom onboarding, dedicated success channel, custom billing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;All plans include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Unlimited inbound &amp;amp; outbound integrations&lt;/li&gt;
&lt;li&gt;Unlimited monitors (HTTP, ping, heartbeat, cron)&lt;/li&gt;
&lt;li&gt;Unlimited teams&lt;/li&gt;
&lt;li&gt;Unlimited notifications (SMS, calls, email, push)&lt;/li&gt;
&lt;li&gt;Unlimited incidents&lt;/li&gt;
&lt;li&gt;Unlimited escalation policies&lt;/li&gt;
&lt;li&gt;iOS &amp;amp; Android apps with native DND-overrides&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Ideal for Small and Mid-Sized Teams
&lt;/h4&gt;

&lt;p&gt;All Quiet is built for teams that want a modern incident management tool that isn't bloated with complexity, and without the enterprise price or maintenance burden. It's perfect for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Startups&lt;/li&gt;
&lt;li&gt;Scale-ups&lt;/li&gt;
&lt;li&gt;SaaS companies&lt;/li&gt;
&lt;li&gt;DevOps teams&lt;/li&gt;
&lt;li&gt;SRE teams with limited bandwidth&lt;/li&gt;
&lt;li&gt;Any team that wants clarity over complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It works so well because it's not trying to be everything for everyone. It knows exactly who it's built for, and it does that job exceptionally well.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Old Guard
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Powerful but heavyweight
&lt;/h4&gt;

&lt;p&gt;Despite the new age shift, the Old Guard tools still dominate enterprise environments. And to be fair, they're powerful. They do what they promise, but they're more expensive, have steeper learning curves, and involve more maintenance than most small teams are equipped to deal with.&lt;/p&gt;

&lt;p&gt;Let's look closely at the four major players of the Old Guard and how they compare.&lt;/p&gt;

&lt;h4&gt;
  
  
  PagerDuty: The Enterprise Standard That's Overkill for Smaller Teams
&lt;/h4&gt;

&lt;p&gt;Perhaps the name that everyone knows, mostly because it's been around forever. Sure, it's full of useful features and reliable integrations, but for smaller teams, it can feel like trying to fly a 747 when all you really needed was a bike.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;PagerDuty&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple, intuitive setup&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Setup can take days or weeks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integrated workflows&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Strong, but often requires tuning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noise reduction&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Can be noisy without careful configuration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast onboarding&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Steeper learning curve&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transparent pricing&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Add-ons increase cost quickly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise-grade complexity&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Built for large orgs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Pricing
&lt;/h4&gt;

&lt;p&gt;While PagerDuty has a free option, it's limited and is more of a "starter kit" than a full-blown incident management software.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Price (per user/month)&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Limited features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Professional&lt;/td&gt;
&lt;td&gt;$21&lt;/td&gt;
&lt;td&gt;Basic on-call + alerting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Business&lt;/td&gt;
&lt;td&gt;$41&lt;/td&gt;
&lt;td&gt;Advanced workflows + analytics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise&lt;/td&gt;
&lt;td&gt;Custom&lt;/td&gt;
&lt;td&gt;Full automation + enterprise support&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Who It's Best For
&lt;/h4&gt;

&lt;p&gt;PagerDuty's target market is any organization where complexity is unavoidable. There are enough people to manage it, but they need direction and structure. It's best for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Large enterprises&lt;/li&gt;
&lt;li&gt;Companies with complex, multi-team escalation paths&lt;/li&gt;
&lt;li&gt;Orgs that rely on niche integrations and SLEs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not ideal for lean teams, cost-sensitive orgs, or anyone that values simplicity over configurability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Opsgenie: Great Inside Atlassian but Not Much Else (Plus Deprecation in 2027)
&lt;/h3&gt;

&lt;p&gt;Opsgenie is a classic for teams that live inside Jira. In the Atlassian ecosystem, everything flows naturally: alerts become Jira issues and workflows follow Jira logic. Everything works together in Atlassian's own little microverse.&lt;/p&gt;

&lt;p&gt;But outside that microverse, you're at sea without a life raft. And with deprecation looming, Opsgenie is no longer a long-term solution, and more of a temporary gap-filler.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pros vs Cons
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Opsgenie&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple, intuitive setup&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Setup depends heavily on Jira structure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integrated workflows&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Strong inside Atlassian only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noise reduction&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Alert logic tied to Jira rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast onboarding&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Requires Jira familiarity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transparent pricing&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Bundled with Atlassian plans&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise-grade complexity&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Good for large Jira orgs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Pricing
&lt;/h4&gt;

&lt;p&gt;Opsgenie's pricing is wrapped up in Atlassian's ecosystem, which works very well if you're all-in on Jira, but not so much if you're not.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Price (per user/month)&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Basic alerting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Essentials&lt;/td&gt;
&lt;td&gt;$9.45&lt;/td&gt;
&lt;td&gt;Limited workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Standard&lt;/td&gt;
&lt;td&gt;$19.95&lt;/td&gt;
&lt;td&gt;Most features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise&lt;/td&gt;
&lt;td&gt;$31.90&lt;/td&gt;
&lt;td&gt;Advanced integrations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Lock-In Issues
&lt;/h4&gt;

&lt;p&gt;Opsgenie is glued to Jira. If Jira slows down, Opsgenie slows down. If Jira goes down, Opsgenie goes with it. And if you ever want to migrate, you'll be untangling a web of workflows for weeks.&lt;/p&gt;

&lt;p&gt;In a nutshell: if Jira's having a bad day, so are you.&lt;/p&gt;

&lt;h4&gt;
  
  
  Deprecation Implications
&lt;/h4&gt;

&lt;p&gt;The bigger issue for Opsgenie enthusiasts is its expiration date. Atlassian has confirmed that Opsgenie will be completely out of action by April 2027, with final sales in June 2025. No matter when you adopt it, you'll be forced to migrate whether you like it or not.&lt;/p&gt;

&lt;p&gt;If you're still with Opsgenie, now's the time to explore modern alternatives before the window becomes uncomfortably small.&lt;/p&gt;

&lt;h4&gt;
  
  
  Who It's Best For
&lt;/h4&gt;

&lt;p&gt;Opsgenie is ideal for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jira-centric teams&lt;/li&gt;
&lt;li&gt;Atlassian-locked enterprises&lt;/li&gt;
&lt;li&gt;Anyone looking for a short-term solution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If long-term stability is the goal, Opsgenie simply isn't built for the future.&lt;/p&gt;

&lt;h3&gt;
  
  
  ilert: Europe's Pricey but Strong Uptime-Focused Solution
&lt;/h3&gt;

&lt;p&gt;ilert is Europe's uptime-monitoring specialist: a solid all-in-one for monitoring, alerting, and on-call. It's reliable and feature-rich, but it's pricey. Add a few users or advanced features and the budget starts stretching in ways smaller teams won't love.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pros vs Cons
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;ilert&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple, intuitive setup&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;More configuration required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integrated workflows&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Monitoring + on-call combined&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noise reduction&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Can be noisy with many monitors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast onboarding&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;More complex than newer tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transparent pricing&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Higher cost tiers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise-grade complexity&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Strong monitoring depth&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Pricing
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Price (per user/month)&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Pro&lt;/td&gt;
&lt;td&gt;€19&lt;/td&gt;
&lt;td&gt;Core features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scale&lt;/td&gt;
&lt;td&gt;€39&lt;/td&gt;
&lt;td&gt;Larger orgs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise&lt;/td&gt;
&lt;td&gt;€49&lt;/td&gt;
&lt;td&gt;Compliance + support&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Who It's Best For
&lt;/h4&gt;

&lt;p&gt;ilert is a great fit for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitoring-heavy teams&lt;/li&gt;
&lt;li&gt;EU-based companies with strict compliance needs&lt;/li&gt;
&lt;li&gt;Teams that want uptime + alerting in one tool&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Its premium pricing makes it less ideal for cost-sensitive teams or anyone who prefers simplicity over deep monitoring features.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zenduty: The Complex Workflow Wizard
&lt;/h3&gt;

&lt;p&gt;Zenduty is the most customizable tool in the Old Guard. It's a workflow builder's dream and a minimalist's nightmare. If you love all the bells and whistles of conditional logic, intricate escalation paths, and hundreds of different levers and switches to play with, then Zenduty is the tool for you.&lt;/p&gt;

&lt;h4&gt;
  
  
  Pros vs Cons
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Zenduty&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple, intuitive setup&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Steep learning curve&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integrated workflows&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Highly customizable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Noise reduction&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Requires careful tuning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fast onboarding&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Complex interface&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Transparent pricing&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Clear tiers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise-grade complexity&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Very flexible&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Pricing
&lt;/h4&gt;

&lt;p&gt;Unlike the product itself, Zenduty's pricing is relatively straightforward.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Price (per user/month)&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Starter&lt;/td&gt;
&lt;td&gt;$5&lt;/td&gt;
&lt;td&gt;Basic alerting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Growth&lt;/td&gt;
&lt;td&gt;$14&lt;/td&gt;
&lt;td&gt;Most features&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise&lt;/td&gt;
&lt;td&gt;Custom&lt;/td&gt;
&lt;td&gt;Advanced workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  Who It's Best For
&lt;/h4&gt;

&lt;p&gt;Zenduty is perfect for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Large support orgs&lt;/li&gt;
&lt;li&gt;Teams with complex, multi-team workflows&lt;/li&gt;
&lt;li&gt;Companies that want deep customization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Small teams looking for simplicity, or anyone who doesn't enjoy building workflows for fun, should look elsewhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  Final Thoughts
&lt;/h3&gt;

&lt;p&gt;Incident management: the alert handling process that's actually about handling reality. Teams are smaller yet deal with more responsibility, systems are more complex but still tied to legacy processes, and nobody has the patience (or hours in the day) for any of it.&lt;/p&gt;

&lt;p&gt;The Old Guard still has its place in enterprises with enough resources to handle their weight. They're powerful and bursting with features, but you'll pay in complexity, cognitive load, and a whole lot of maintenance.&lt;/p&gt;

&lt;p&gt;The New Wave's approach is different. Fewer levers, less configuration, and only one tab, which means more clarity, peace, and easy workflows. All Quiet sits at the top of the tier list because it solves the problem the giants created: by bringing everything together into one calm, integrated flow that teams can actually breathe in.&lt;/p&gt;

&lt;p&gt;Teams aren't asking "which tools have the most features?" anymore. They're asking "which tools keep my team sane?" The answer isn't complexity, but simplicity and silence.&lt;/p&gt;

&lt;p&gt;If you're ready to simplify your incident response and empower your team with the digital solution they've been missing, &lt;a href="https://meetings-eu1.hubspot.com/nkoeppl/allquiet-product-demo" rel="noopener noreferrer"&gt;talk to us today&lt;/a&gt;.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>PagerDuty’s 83% Stock Drop Since 2019 and What We Learned from It in 2026</title>
      <dc:creator>Mads Quist</dc:creator>
      <pubDate>Tue, 02 Jun 2026 10:21:00 +0000</pubDate>
      <link>https://dev.to/mads_quist/pagerdutys-83-stock-drop-since-2019-and-what-we-learned-from-it-in-2026-51d0</link>
      <guid>https://dev.to/mads_quist/pagerdutys-83-stock-drop-since-2019-and-what-we-learned-from-it-in-2026-51d0</guid>
      <description>&lt;p&gt;There’s nothing like a good old fashioned budget review to remind you what you’re actually spending money on… and how much of it.&lt;/p&gt;

&lt;p&gt;PagerDuty is one of those tools that makes you do a double-take when you open the invoice. You ask yourself: “Is this the monthly or the annual figure?” while wondering if either would even be acceptable.&lt;/p&gt;

&lt;p&gt;This collective eyebrow raise isn’t because PagerDuty is bad software, but because the world around it changed while PagerDuty stayed the same.&lt;/p&gt;

&lt;p&gt;To understand what happened, you have to look at the bigger picture; the world around PagerDuty that evolved while it remained static. Let’s see how their story unfolds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Act I: Setting the scene
&lt;/h2&gt;

&lt;p&gt;PagerDuty IPO’d in 2019 and by 2021 it was trading around $34. It came onto the scene like the big brother of incident response, knowing it had nailed a problem that everyone else was either ignoring or still trying to solve. PagerDuty was reliable; the kind of tool you bought to show off that you had your operational act together.&lt;/p&gt;

&lt;p&gt;Fast-forward to 2025/2026 and it’s suffered a multi-year collapse with a &lt;a href="https://simplywall.st/stocks/us/software/nyse-pd/pagerduty/news/has-pagerduty-pd-become-a-potential-opportunity-after-prolon" rel="noopener noreferrer"&gt;73% decline over 5 years&lt;/a&gt;. For a while, the market was all about PagerDuty, but then it did what it normally does: it changed its mind.&lt;/p&gt;

&lt;p&gt;In November 2025, PagerDuty’s stock plummeted &lt;a href="https://www.ainvest.com/news/pagerduty-pd-sudden-24-stock-drop-assessing-overreaction-long-term-investment-implications-2511/" rel="noopener noreferrer"&gt;24% in a single day&lt;/a&gt;. Their team had waved a small victory flag while publishing their Q3 results with GAAP profitability for the second straight quarter, which was great news! Until they pulled back their revenue guidance because of customers cutting seats and watching their budgets.&lt;/p&gt;

&lt;p&gt;Sure, the financial housekeeping seemed cleaner, as their margins were improving on the surface and EPS was behaving as it should. But then the curtain was drawn: they lowered their growth expectations and analysts went straight for the jugular. Suddenly, everyone was wondering whether PagerDuty could even keep its revenue engine running at the pace Wall Street expected.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;If your &lt;a href="https://allquiet.app/on-call" rel="noopener noreferrer"&gt;on-call alerts&lt;/a&gt; dropped this hard, you’d file an incident report.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Act II: Enterprise pricing in a market that’s lost confidence
&lt;/h2&gt;

&lt;p&gt;I don’t want to sugar-coat it. PagerDuty’s pricing has always been aspirational. They’ve positioned themselves as enterprise-grade, while CTOs increasingly see them as enterprise-inflated. It’s the kind of pricing that assumes you have a huge team, an even bigger budget and an extremely forgiving CFO (or one who’s asleep on the job).&lt;/p&gt;

&lt;p&gt;This approach worked for years, though. Enterprise pricing was part of the deal for DevOps starter packs, but then the pendulum swung. Seat-based pricing is now under more pressure than ever before. &lt;a href="https://www.ainvest.com/news/pagerduty-pd-sudden-24-stock-drop-assessing-overreaction-long-term-investment-implications-2511/" rel="noopener noreferrer"&gt;Analysts even said&lt;/a&gt; that seat license compression and slowing ARR growth were key factors in PagerDuty’s revenue slowdown.&lt;/p&gt;

&lt;p&gt;In other words, customers were trimming their usage rather than expanding it. But why do that when you can let the system take care of the groundwork for you? A huge amount of alert handling is just repetition: checking if the alert is real, gathering context, matching it to other signals, rinse and repeat. But a solid system can filter out the junk and enrich alerts with the right information, then connect related events and trigger the fixes. You instantly cut down on noise, along with &lt;a href="https://www.secure.com/blog/cybersecurity/automated-security-investigations" rel="noopener noreferrer"&gt;70% of security investigations&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can’t replace your analyst, but you absolutely can clear the fog so they can focus on alerts that matter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Act III: Do you really need the Cadillac?
&lt;/h3&gt;

&lt;p&gt;While PagerDuty was busy expanding its platform, &lt;a href="https://allquiet.app/blog/top-incident-management-solutions" rel="noopener noreferrer"&gt;modern alternatives&lt;/a&gt; were slowly creeping up on it with better open-source tools, more mature cloud providers and bootstrapped SaaS tools. Suddenly, you could get 80-90% of PagerDuty’s core functionality without paying enterprise prices.&lt;/p&gt;

&lt;p&gt;This wasn’t a symptom of PagerDuty being copied by competitors, but incident response itself becoming a solved problem. Cloud-native alerting like AWS, GCP and Azure had matured dramatically, while open-source tools like Alertmanager continued to get better and better. They were offering alerting pipelines that, five years ago, would’ve looked like something from Futurama.&lt;/p&gt;

&lt;p&gt;Incident response suddenly had flat pricing, transparent billing, no per-seat penalties and support teams that actually answered emails. Meanwhile, PagerDuty was still innovating. They rolled out &lt;a href="https://www.zacks.com/stock/news/2385360/pagerduty-declines-16-ytd-should-you-buy-the-stock-on-the-dip" rel="noopener noreferrer"&gt;AI-driven automation&lt;/a&gt; and workflow orchestrations, and even reported that 825 customers were spending $100k+ ARR in Q3 2024.&lt;/p&gt;

&lt;p&gt;But does innovation erase the pricing gap for lean teams?&lt;/p&gt;

&lt;p&gt;Big, fat no. And that’s when migrations start to happen.&lt;/p&gt;

&lt;h3&gt;
  
  
  Act IV: The not-so-quiet migration
&lt;/h3&gt;

&lt;p&gt;If you want to really understand the shift in the market, go to the belly of the beast. Look at Slack channels, look at Reddit threads–go to the very places where engineers actually tell the truth. You’ll see common themes emerging in the DevOps community and messages like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“We switched and nothing broke.”&lt;/li&gt;
&lt;li&gt;“We were paying for features we never used.”&lt;/li&gt;
&lt;li&gt;“We were punished with higher prices for growing.”&lt;/li&gt;
&lt;li&gt;“We replaced it with a lightweight SaaS and cloud-native tool.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reality is that companies weren’t fighting back or even arguing with PagerDuty. They simply… moved on. They realized that they, in fact, didn’t need the Cadillac. They turned away from longer sales cycles and SMB-weakness, towards cheaper software that worked well and didn’t require a whole board meeting to approve.&lt;/p&gt;

&lt;p&gt;Act V: The lean, mean, incident response machines&lt;/p&gt;

&lt;p&gt;Call the news stations, everyone; the bootstrapped competitors are having their moment in the sun. Incident response is entering a new era and the winners aren’t the biggest, shiniest platforms with features pouring out of the box. They’re the ones offering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flat pricing&lt;/li&gt;
&lt;li&gt;Transparent billing&lt;/li&gt;
&lt;li&gt;Faster support&lt;/li&gt;
&lt;li&gt;No need to justify their existence every fiscal year&lt;/li&gt;
&lt;li&gt;Alignment with modern engineering teams&lt;/li&gt;
&lt;li&gt;No pressure to satisfy Wall Street.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PagerDuty’s &lt;a href="https://www.investing.com/news/company-news/pagerduty-stock-hits-52week-low-at-1394-usd-93CH-4104355" rel="noopener noreferrer"&gt;52-week low of $13.94&lt;/a&gt; in 2025, a 34.67% YoY drop, wasn’t a failure, but a sign of customer and investor uncertainty. It proved that the old model of “enterprise pricing for everyone” simply wasn’t the go-to anymore. And unless companies are willing to evolve in line with that reality, they’ll fade into the background of those that thrive in the next decade.&lt;/p&gt;

&lt;p&gt;Final act: What this means for the future of incident response&lt;br&gt;
The incident response market is going through a big personality change. It’s moving from enterprise-first to developer-first, with big platforms, big bundles and big invoices. PagerDuty is a case study in what happens when pricing strategy diverges from customer value. If the bill keeps climbing but the value doesn’t, customers won’t bother complaining about it. They’ll just leave.&lt;/p&gt;

&lt;p&gt;And here’s the twist: even though multiple DCF analyses suggest PagerDuty is undervalued by 50-55%, investors still question its growth prospects. The future of incident response will be defined by tools that don’t overcharge or overcomplicate, and simply do what they say they’ll do without making you expense a limb.&lt;/p&gt;

&lt;p&gt;So are you paying for product or public market overhead?&lt;/p&gt;

&lt;p&gt;An &lt;a href="https://finance.yahoo.com/quote/PD/" rel="noopener noreferrer"&gt;82.8% stock-drop&lt;/a&gt; isn’t something investors will bat their eyelids at. A fall that hard forces customers to rethink what they’re actually paying for: the product? The innovation? The reliability? Or just the cost of being a public company with slowing growth and rising pressure?&lt;/p&gt;

&lt;p&gt;And not just customers, but CTOs too. They’re evaluating whether they’re using the features they pay for and if the pricing model makes sense for how their teams work today. They want to know if the vendor is stable enough to bet their operations on and what the ROI is compared to the alternatives (or just building it themselves).&lt;/p&gt;

&lt;p&gt;Your on-call budget shouldn’t be keeping you up at night. You’ve got enough alerts for that. But if you want a quieter life with less interruptions, &lt;a href="https://allquiet.app/" rel="noopener noreferrer"&gt;talk to us today&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>infrastructure</category>
      <category>incident</category>
      <category>pagerduty</category>
    </item>
    <item>
      <title>Beyond Vibe-Coding</title>
      <dc:creator>Mads Quist</dc:creator>
      <pubDate>Thu, 28 May 2026 10:59:00 +0000</pubDate>
      <link>https://dev.to/mads_quist/beyond-vibe-coding-k99</link>
      <guid>https://dev.to/mads_quist/beyond-vibe-coding-k99</guid>
      <description>&lt;h3&gt;
  
  
  Are You Paying for Reliability or Hype in the Age of Vibe-Coding and SaaS Subscriptions?
&lt;/h3&gt;

&lt;p&gt;In 2026, many engineering teams are coming to terms with an odd new reality: a growing percentage of their codebase wasn’t written by their own team… or even a human, for that matter. &lt;/p&gt;

&lt;p&gt;Rather than being pair-programmed, reviewed, tested, or even fully understood, it was AI-generated. Confidently, instantly, and sometimes a little chaotically, by something that simply “felt” like the code would work. In other words, it operates off vibes, not knowledge. &lt;/p&gt;

&lt;h4&gt;
  
  
  Welcome to the era of vibe-coding.
&lt;/h4&gt;

&lt;p&gt;Big companies like Microsoft and Google are already using AI to write 30% of their code. And it doesn’t exactly get the code wrong; it runs well, passes all the necessary tests, and even looks good enough that no one questions it. Until it behaves unexpectedly, that is. &lt;/p&gt;

&lt;p&gt;AI is making code shipping even easier than it ever was, but that doesn’t come for free: Day 2 operations are slowly becoming a nightmare, because when no one remembers writing the code, no one remembers how to fix it. &lt;/p&gt;

&lt;p&gt;It’s time like these where the conversation shifts from productivity to practicality, and more importantly, reliability. And that leads us to the question: is your incident management setup built to survive the new world you’re operating in? &lt;/p&gt;

&lt;h3&gt;
  
  
  The Hidden Cost of Vibe-Coding for Day 2 Operations
&lt;/h3&gt;

&lt;p&gt;I know what you’re thinking: yes, you absolutely can, and in some cases maybe you should, generate a lot of working code quickly and efficiently with AI. It’s a fast, confident solution that’s great when you’re prototyping, but much less great when you’re trying to understand why a service is suddenly swallowing 40% more memory on Thursdays. &lt;/p&gt;

&lt;p&gt;In fact, AI-generated code produces 1.7x more issues than human-written code. And relying fully on it is where many teams seem to go wrong. They end up creating things too fast with very little specification and run into a mountain of problems before the new code is even out of the box. But the truth is very simple:&lt;/p&gt;

&lt;p&gt;_AI has made writing code easier. But it hasn’t made owning that code any easier. &lt;br&gt;
_&lt;br&gt;
Instead, it’s introduced an onslaught of new problems for engineering managers and VPs, like: &lt;/p&gt;

&lt;p&gt;More edge-case failures&lt;br&gt;
More “no one touched this, why did it break?” moments &lt;br&gt;
More debugging sessions &lt;br&gt;
More operational load on teams who didn’t write, or understand, the original logic. &lt;/p&gt;

&lt;p&gt;Code is the foundation of how online services work. If the codebase becomes a black box, your incident management becomes the only reliable source of truth. &lt;/p&gt;

&lt;h3&gt;
  
  
  Why Your Incident Management Can’t Live on the Same Servers as Your App
&lt;/h3&gt;

&lt;p&gt;Enter: the build-vs-buy conversation. Plenty of teams have built their own alerting systems over the years and it’s easy to understand why. Engineers are good at building things and alerting seems simple enough. &lt;/p&gt;

&lt;p&gt;But the slightly uncomfortable reality is that if your alerting lives on the same infrastructure as your application, it’ll fail the same way your app fails. One single issue can cascade across multiple services, so when your platform goes down, your homegrown alerting goes with it. Identifying the root cause of the problem is like looking for a needle in a haystack, as any issues faced during development might not be reproducible in the production environment. &lt;/p&gt;

&lt;p&gt;Imagine using the foundation of your house to build a new roof. That’s what happens when your alerting and infrastructure co-exist in the same place. External, independent alerting isn’t a luxury, but a safety measure. If your cloud provider hiccups, your alerts hiccup with it. And if your AI-generated deployment throws a tantrum, your DIY alerting gets pulled into the meltdown. &lt;/p&gt;

&lt;p&gt;Separating the two means only having to deal with one catastrophe rather than two. It guarantees that when your platform is collapsing, your incident management (and engineering team) isn’t joining it. &lt;/p&gt;

&lt;h3&gt;
  
  
  The New Reality of Build vs Buy in 2026
&lt;/h3&gt;

&lt;p&gt;Engineering teams are perfectly capable of building their own alerting systems. They can even build databases, authentication systems, and CI/CD pipelines from scratch… but do they even want to? Are homegrown systems as reliable as plug-and-play? &lt;/p&gt;

&lt;p&gt;That’s where the build-vs-buy conversation gets more interesting. It’s more than just a cost comparison; it’s a question of control, scalability and integration. Building gives you bespoke tools that do exactly what you need them to do, and are operationally efficient. Buying, on the other hand, is standardized and easy; there’s no maintenance costs, as those sit with the provider, and security is almost guaranteed. &lt;/p&gt;

&lt;p&gt;Plus, DIY alerting comes with a few unavoidable truths:&lt;/p&gt;

&lt;p&gt;It shares the same failure modes as your platform &lt;br&gt;
It depends on the same infrastructure &lt;br&gt;
It inherits the same outages&lt;br&gt;
It expands your operational surface area&lt;br&gt;
It becomes one more thing to maintain when maintenance is already hard enough. &lt;/p&gt;

&lt;p&gt;Layer on the unpredictability of vibe-coded systems with incidents that are still less predictable and harder to solve, and the last thing you want to deal with is an alerting system that disappears the moment you need it most.&lt;/p&gt;

&lt;p&gt;That’s where All Quiet steps in as the adult in the room. &lt;/p&gt;

&lt;h3&gt;
  
  
  Why All Quiet is Built for the World Vibe-Coding Created
&lt;/h3&gt;

&lt;p&gt;All Quiet isn’t trying to be a fancy, flashy tool. It does what it says on the tin: keeps alert management quiet. It’s not trying to reinvent incident response with buzzwords or magic tricks; it’s built around one very simple and boring, yet extremely important, principle:&lt;/p&gt;

&lt;p&gt;_Your alerting should stay independent when everything else goes offline. _&lt;/p&gt;

&lt;p&gt;Nice and simple. And that principle shapes everything about how All Quiet works:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It runs independently from your infrastructure, so your alerts don’t vanish just because your servers took a last-minute holiday. &lt;/li&gt;
&lt;li&gt;It has different failure modes, meaning your app can crash, your cluster can melt, your cloud can create a hurricane, and All Quiet doesn’t bat an eyelid. &lt;/li&gt;
&lt;li&gt;It doesn’t rely on your servers, cloud provider or deployment pipeline, so your alerting isn’t married to the same weak dependencies as your product.
&lt;/li&gt;
&lt;li&gt;It doesn’t care if your AI-generated code crashed your entire platform, because it won’t be affected by the fallout. &lt;/li&gt;
&lt;li&gt;It’s designed to be the one stable thing in a house that’s falling down, because it doesn’t panic while everything does. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;AI-generated code means people don’t fully understand the very code they’re working with, so reliability becomes the only differentiator. You can have all the fancy dashboards and features you want, but they’re worth nothing if an incident creeps up and your code is hidden away in a treasure chest no one has the keys to. &lt;/p&gt;

&lt;p&gt;All Quiet is built to be reliable in every situation. When things are calm, it’s calm. When things are less calm, it’s calm. And where your entire system is swirling into a vortex of chaos and fire, it’s still calm. When vibe-coding has taken over the world, the last thing you want is an alerting system that runs solely on vibes too. &lt;/p&gt;

&lt;h3&gt;
  
  
  SaaS Cost Optimization in the Vibe-Coding Age
&lt;/h3&gt;

&lt;p&gt;There are thousands of SaaS tools out there, and hundreds of alerting tools. But there’s a new kind of bloat on the rise: tools that look great on paper but only exist because your AI-generated code keeps spitting out surprises. &lt;/p&gt;

&lt;p&gt;Engineering leaders are asking themselves smarter questions: What’s actually mission-critical? What’s just hype? What’s duplicating functionality? What’s costing us more than it’s saving us?&lt;/p&gt;

&lt;p&gt;But most importantly:&lt;/p&gt;

&lt;p&gt;_What’s the cost of downtime if our alerting fails with everything else? _&lt;/p&gt;

&lt;p&gt;The most expensive tool in your stack is the one that doesn’t work when you need it most. All Quiet isn’t about adding another subscription to your balance sheet, but removing uncertainty and wasted time from your operational risk. &lt;/p&gt;

&lt;h3&gt;
  
  
  The Future of Reliability Over Hype
&lt;/h3&gt;

&lt;p&gt;AI is like a self-cleaning oven. It’s getting smarter and better, which means vibe-coding will keep getting weirder. But engineering teams will keep shipping faster than ever. &lt;/p&gt;

&lt;p&gt;Despite that, the fundamentals won’t change: you still need to know when something breaks and you need to respond quickly. &lt;/p&gt;

&lt;p&gt;And you’ll always need an alerting system that doesn’t disappear the moment your platform does. Your team shouldn't be asking which tool has the most features, but which tool will still be standing when everything else fails. &lt;/p&gt;

&lt;p&gt;And the only answer is the one that doesn’t run on vibes too. It’s the one that does what it says it’ll do, and it does it independently of everything else. &lt;/p&gt;

&lt;p&gt;Which is exactly what All Quiet does. &lt;a href="https://allquiet.app/" rel="noopener noreferrer"&gt;Talk to us today to find out how. &lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>automation</category>
    </item>
    <item>
      <title>SaaS is dead, long live SaaS</title>
      <dc:creator>Mads Quist</dc:creator>
      <pubDate>Tue, 26 May 2026 09:59:00 +0000</pubDate>
      <link>https://dev.to/mads_quist/saas-is-dead-long-live-saas-34dg</link>
      <guid>https://dev.to/mads_quist/saas-is-dead-long-live-saas-34dg</guid>
      <description>&lt;h3&gt;
  
  
  While vibe-coding does favor build over buy for some products, it won't replace mission-critical tools in the foreseeable future
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Here’s my take on what gets killed, what survives, and the build-vs-buy rule that actually works:
&lt;/h4&gt;

&lt;p&gt;I believe that AI does not kill SaaS. It eliminates the “thin wrapper”: software that offers little more than a polished UI for simple automations. Today, any developer can stitch these tools together in an afternoon.&lt;/p&gt;

&lt;p&gt;Still, SaaS is not disappearing; it is evolving. While products that sell mere convenience will face extinction, products that sell reliable outcomes for high-stakes problems will thrive. This guide covers what survives the shift toward vibe-coding and how to decide whether to build or buy.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwloo4j67jhqfbz2s53jr.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwloo4j67jhqfbz2s53jr.jpg" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The New Baseline: When "Good Enough" Becomes Free&lt;br&gt;
Vibe-coding is great: It allows us to ship functional software at unprecedented speeds. Large Language Models (LLMs) now draft code, connect APIs, and generate UI scaffolding automatically.&lt;/p&gt;

&lt;p&gt;We can see that this shift raises the bar for SaaS vendors. When basic functionality is cheap, the value of a paid tool must come from reliability, compliance, and long-term maintenance. High prices do not just signal quality. They create a financial incentive for teams to build their own "80% solution." Just look at the stock price of tools like PagerDuty or Hubspot or other SaaS giants that have seen their stock prices drop significantly.&lt;/p&gt;

&lt;h3&gt;
  
  
  SaaS Categories Facing Extinction
&lt;/h3&gt;

&lt;p&gt;That said, I think vulnerable SaaS categories share three traits: a narrowly defined job-to-be-done, low failure costs, and easily verifiable outputs. Good examples are:&lt;/p&gt;

&lt;p&gt;Workflow Wrappers: Basic CRUD (Create, Read, Update, Delete) tools for internal approvals or team dashboards.&lt;/p&gt;

&lt;p&gt;Low-Stakes Content: Tools for text summarization, formatting, or template generation.&lt;/p&gt;

&lt;p&gt;Basic Analytics: Simple data visualization layers that sit on top of a data warehouse.&lt;/p&gt;

&lt;p&gt;So, how do SaaS players stay relevant? They need to be outstanding in the categories that are not vulnerable to vibe-coding.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Survivors: Infrastructure You Trust
&lt;/h3&gt;

&lt;p&gt;Based on the criteria mentioned above, my strong opinion is that SaaS products that sustain are those where the cost of being wrong is high. These are not features: they are mission-critical infrastructure.&lt;/p&gt;

&lt;h4&gt;
  
  
  1. Systems of Record and Compliance
&lt;/h4&gt;

&lt;p&gt;When audit logs, access controls, and data retention are legal requirements, a vibe-coded script is not an acceptable answer to an auditor. Specialized SaaS platforms provide the security posture and regulatory guarantees that custom-built scripts lack.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. High-Availability Operational Tools
&lt;/h4&gt;

&lt;p&gt;If a system failure must wake an engineer at 2 a.m., the tool must be battle-tested. Incident management is the primary example. Reliability is the core feature.&lt;/p&gt;

&lt;p&gt;In this category, vibe-coding errors are not cosmetic. They are existential. Failing to page the right person during a critical outage results in lost revenue and broken trust. All Quiet and other incident response systems offer reliability that a prototype cannot guarantee. And this guarantees are mission-critical.&lt;/p&gt;

&lt;p&gt;There’s also an architectural reality: your incident management tool must keep working when you are down. If you self-host it on the same infrastructure, identity provider, or network dependencies as your primary systems, you will lose alerting precisely when you need it most. The whole point is having a system that is on high alert when you aren't to let you know when something is broken. Running incident management as a truly independent system (even when self-hosted) adds meaningful overhead, which is one more reason teams often prefer buying a dedicated service built to stay up during your worst day.&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Complex Integration Ecosystems
&lt;/h4&gt;

&lt;p&gt;Last but not least: integration ecosystems. Yes, AI can reduce some glue code and speed up building custom connectors. But the real pain is rarely the first implementation — it’s the ongoing reliability work: API changes, new auth flows, rate limits, subtle edge cases, and “it worked yesterday” incidents when a vendor ships an update. In practice, you usually can’t just delete integrations without deleting requirements. When a tool sits in the middle of many other tools, you’re effectively signing up to maintain that surface area for years. SaaS vendors sell the promise that these connections keep working without your team babysitting them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Build-vs-Buy Decision Matrix
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Factor            Build (Vibe-Coding) Buy (SaaS)&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Risk of Failure&lt;/strong&gt; Low (Embarrassing)  High (Expensive/Existential)&lt;br&gt;
&lt;strong&gt;Ownership&lt;/strong&gt;   Internal Team           Vendor (SLA-backed)&lt;br&gt;
&lt;strong&gt;Verification&lt;/strong&gt;    Obvious and Simple  Subtle and Critical&lt;br&gt;
&lt;strong&gt;Integrations&lt;/strong&gt;    1 or 2 Static APIs  Many Evolving Vendors&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8w08b5ww7ki77dpvc24e.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8w08b5ww7ki77dpvc24e.jpg" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Final Rule of Thumb
&lt;/h3&gt;

&lt;p&gt;Build if vibe-coding achieves your outcome with acceptable risk for your business. Buy if the operational burden or maintenance will quietly exhaust your team because you need the product to be battle-tested.&lt;/p&gt;

&lt;p&gt;Modern SaaS wins when it removes long-term risk, not when it adds more features. You’re paying for reliability, security, compliance, and support, ensuring that the tool still works when your team is tired, busy, or in the middle of an outage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmvlc9vdt2t4nsid4j2l2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmvlc9vdt2t4nsid4j2l2.jpg" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AI makes building prototypes cheap. What doesn’t get cheap is owning the consequences: keeping it up, keeping it secure, keeping integrations working, and maintaining it for years. If you’re not willing to own that long-term responsibility, buy and focus your resources on building your core product.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Migrating from Opsgenie to All Quiet: A Full Terraform-First Guide</title>
      <dc:creator>Mads Quist</dc:creator>
      <pubDate>Tue, 12 May 2026 15:33:44 +0000</pubDate>
      <link>https://dev.to/allquiet/migrating-from-opsgenie-to-all-quiet-a-full-terraform-first-guide-1i1o</link>
      <guid>https://dev.to/allquiet/migrating-from-opsgenie-to-all-quiet-a-full-terraform-first-guide-1i1o</guid>
      <description>&lt;p&gt;Originally published on 12 May 2026 on the &lt;a href="https://allquiet.app/blog/migrating-from-opsgenie-to-all-quiet-terraform-first-guide" rel="noopener noreferrer"&gt;All Quiet Tech Blog.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If your Opsgenie config already lives in Terraform, you can migrate methodically instead of clicking two consoles side by side. This guide translates users, teams, integrations, on-call schedules, escalations, and routing into All Quiet - complete with example HCL, migration checklist, and tips for running both tools in parallel before you switch.&lt;/p&gt;

&lt;p&gt;With the recent changes in the Atlassian ecosystem, many SRE and DevOps teams are finding themselves at a crossroads: adapt to the increasing complexity of Jira Service Management (JSM) or move to a leaner, more focused incident management platform.&lt;/p&gt;

&lt;p&gt;At All Quiet, we believe incident management should stay close to the code. That's why our platform is built to be managed via Terraform from day one. In this guide, we'll walk through a complete technical migration from Opsgenie Terraform resources to All Quiet, resource by resource, with real HCL on both sides.&lt;/p&gt;

&lt;p&gt;If you are still weighing vendors before you change tooling, start with our overview of &lt;a href="https://allquiet.app/opsgenie-alternative" rel="noopener noreferrer"&gt;All Quiet as an Opsgenie alternative&lt;/a&gt;, then use this article for the Terraform resource mapping and cutover checklist.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a Terraform-First Migration?
&lt;/h2&gt;

&lt;p&gt;If you're already managing Opsgenie via Terraform, you have an advantage: your entire on-call configuration is already codified. Rather than clicking through two UIs in parallel, you can translate your .tf files directly from one provider to the other, terraform plan the result, and cut over with confidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Strategy: "Logic-First" Migration
&lt;/h2&gt;

&lt;p&gt;In Opsgenie, configuration is fragmented across six or more resource types: &lt;code&gt;opsgenie_user&lt;/code&gt;, &lt;code&gt;opsgenie_team&lt;/code&gt;, &lt;code&gt;opsgenie_api_integration&lt;/code&gt;, &lt;code&gt;opsgenie_schedule&lt;/code&gt;, &lt;code&gt;opsgenie_schedule_rotation&lt;/code&gt;, and &lt;code&gt;opsgenie_escalation&lt;/code&gt;. All Quiet centralizes this logic into fewer, more cohesive resources, most notably &lt;code&gt;allquiet_team_escalations&lt;/code&gt;, which unifies schedules, rotations, and escalation policies into a single resource that can't get out of sync.&lt;/p&gt;

&lt;p&gt;Here's the full resource mapping at a glance:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Opsgenie Resource&lt;/th&gt;
&lt;th&gt;All Quiet Resource&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;opsgenie_user&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;allquiet_user&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Standalone identity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;opsgenie_team&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;allquiet_team&lt;/code&gt; + &lt;code&gt;allquiet_team_membership&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;One membership resource per user–team pair&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;opsgenie_api_integration&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;allquiet_integration&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Team-owned, strongly typed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;opsgenie_schedule&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;allquiet_team_escalations&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Merged into unified resource&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;opsgenie_schedule_rotation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;allquiet_team_escalations&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Rotations live inside escalation tiers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;opsgenie_escalation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;allquiet_team_escalations&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Rules become escalation tiers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;(routing within integration)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;allquiet_routing&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Explicit routing resource&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  1. Setting Up the Providers
&lt;/h2&gt;

&lt;p&gt;First, initialize your environment. You'll need your All Quiet API Key, which you can generate in Organization Settings &amp;gt; API Keys (requires Owner or Administrator role). See our &lt;a href="https://docs.allquiet.app/advanced/terraform" rel="noopener noreferrer"&gt;Terraform setup docs&lt;/a&gt; for the full walkthrough.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;terraform&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;required_providers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;allquiet&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;source&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"AllQuietApp/allquiet"&lt;/span&gt;
      &lt;span class="nx"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&amp;gt;= 3.0.0"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="s2"&gt;"allquiet"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;api_key&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;allquiet_api_key&lt;/span&gt;
  &lt;span class="nx"&gt;api_region&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"eu"&lt;/span&gt; &lt;span class="c1"&gt;# or "us" — must match your organization's data region&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;variable&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_api_key"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;sensitive&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; We recommend creating your organization with a shared admin account (e.g., &lt;a href="mailto:admin@company.com"&gt;admin@company.com&lt;/a&gt;) rather than a personal email. This way, every "real" on-call user can be provisioned via Terraform, and you won't have a chicken-and-egg problem with the account that created the org.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Teams, Users, and Memberships
&lt;/h2&gt;

&lt;p&gt;In Opsgenie, users are standalone resources with roles, and team membership is defined inline within the team. This creates tight coupling, changing a user's team membership means editing the team resource.&lt;/p&gt;

&lt;p&gt;All Quiet separates these concerns into three distinct resources: the team, the user identity, and the membership link between them. Each membership is its own resource (&lt;code&gt;allquiet_team_membership&lt;/code&gt;), one resource per user–team pair. This allows for cleaner state management: adding or removing a single member doesn't trigger a plan change on the team or on any other member.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Opsgenie Way:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"opsgenie_user"&lt;/span&gt; &lt;span class="s2"&gt;"sre_lead"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;username&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"alex@company.com"&lt;/span&gt;
  &lt;span class="nx"&gt;full_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Alex Rivera"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"user"&lt;/span&gt;
  &lt;span class="nx"&gt;timezone&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Europe/Berlin"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"opsgenie_team"&lt;/span&gt; &lt;span class="s2"&gt;"devops"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"DevOps"&lt;/span&gt;
  &lt;span class="nx"&gt;description&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Core DevOps and SRE team"&lt;/span&gt;

  &lt;span class="nx"&gt;member&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sre_lead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
    &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"admin"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;member&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;backend_eng&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
    &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"user"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The All Quiet Way:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 1. Define the team&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_team"&lt;/span&gt; &lt;span class="s2"&gt;"devops"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"DevOps"&lt;/span&gt;
  &lt;span class="nx"&gt;time_zone_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Europe/Berlin"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Define user identities&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_user"&lt;/span&gt; &lt;span class="s2"&gt;"sre_lead"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;email&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"alex@company.com"&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Alex Rivera"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_user"&lt;/span&gt; &lt;span class="s2"&gt;"backend_eng"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;email&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"jordan@company.com"&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Jordan Lee"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Link each user to the team via a dedicated membership resource (one per user–team pair)&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_team_membership"&lt;/span&gt; &lt;span class="s2"&gt;"devops_sre_lead"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;team_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;user_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sre_lead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Administrator"&lt;/span&gt; &lt;span class="c1"&gt;# "Administrator" or "Member"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_team_membership"&lt;/span&gt; &lt;span class="s2"&gt;"devops_backend_eng"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;team_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;user_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;backend_eng&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Member"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What changed:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Opsgenie's admin / user team roles map to All Quiet's Administrator / Member.&lt;/li&gt;
&lt;li&gt;Each membership is its own resource (&lt;code&gt;allquiet_team_membership&lt;/code&gt;), so adding or removing a single team member is a targeted change, no cascading diffs on the team or other members.&lt;/li&gt;
&lt;li&gt;Users provisioned via Terraform receive an invite to set their password. If you use SSO (OIDC, Google, or Microsoft), configure that first, see the &lt;a href="https://docs.allquiet.app/advanced/sso" rel="noopener noreferrer"&gt;SSO docs&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Integrations: Team-Owned and Strongly Typed
&lt;/h2&gt;

&lt;p&gt;Opsgenie requires separate resources for API integrations and their subsequent notification or routing actions. The integration itself is often a loose endpoint that you wire to teams via responders blocks.&lt;/p&gt;

&lt;p&gt;All Quiet treats integrations as team-owned endpoints with strongly typed integration types, so there's no guesswork about payload format.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Opsgenie Way:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"opsgenie_api_integration"&lt;/span&gt; &lt;span class="s2"&gt;"grafana"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Grafana-Alerts"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Grafana"&lt;/span&gt;
  &lt;span class="nx"&gt;owner_team_id&lt;/span&gt;     &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;enabled&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;allow_write_access&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="nx"&gt;responders&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"team"&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The All Quiet Way:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_integration"&lt;/span&gt; &lt;span class="s2"&gt;"grafana"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;team_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Grafana Production"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Grafana"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it, no responders block, no allow_write_access flag. The integration belongs to the team, and the team's escalation policy handles notification logic.&lt;/p&gt;

&lt;p&gt;Common type mappings:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Opsgenie type&lt;/th&gt;
&lt;th&gt;All Quiet type&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Datadog&lt;/td&gt;
&lt;td&gt;Datadog&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prometheus&lt;/td&gt;
&lt;td&gt;Prometheus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grafana&lt;/td&gt;
&lt;td&gt;Grafana&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch&lt;/td&gt;
&lt;td&gt;AmazonCloudWatch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API / Webhook&lt;/td&gt;
&lt;td&gt;Webhook&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The full list of supported integration types is available at: &lt;a href="https://allquiet.app/api/public/v1/inbound-integration/types" rel="noopener noreferrer"&gt;https://allquiet.app/api/public/v1/inbound-integration/types&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Treat the integration type and team assignment as fixed once created, changing them may require destroying and re-creating the resource, which generates a new webhook URL. Plan these carefully. After &lt;code&gt;terraform apply&lt;/code&gt;, the new webhook URL will be available. For each supported inbound integration type you can download a default Terraform snippet for payload mapping: use &lt;code&gt;https://allquiet.app/api/integrations/terraform/default/&amp;lt;Type&amp;gt;.tf&lt;/code&gt; where &lt;code&gt;&amp;lt;Type&amp;gt;&lt;/code&gt; is the exact integration type identifier (see the inbound integration types list above). For example, Datadog is &lt;a href="https://allquiet.app/api/integrations/terraform/default/Datadog.tf" rel="noopener noreferrer"&gt;https://allquiet.app/api/integrations/terraform/default/Datadog.tf&lt;/a&gt;; Grafana, Webhook, AmazonCloudWatch, and every other supported type follow the same URL pattern with their own type name.&lt;/p&gt;

&lt;p&gt;If you need to customize how payloads map to incidents (e.g., extracting severity from a specific JSON field), use the &lt;code&gt;allquiet_integration_mapping&lt;/code&gt; resource. If you don't define one, All Quiet uses sensible defaults for each integration type. The mapping supports JSONPath, XPath, regex, and static values and every incident maps to three key attributes: Status (Open/Resolved), Severity (Minor/Warning/Critical), and an optional Title.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Migration tip:&lt;/strong&gt; You can run both Opsgenie and All Quiet integrations in parallel during the transition period. Point your monitoring tools at both webhook URLs until you're confident in the All Quiet setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. On-Call Schedules and Escalations: The Unified Resource
&lt;/h2&gt;

&lt;p&gt;This is the most significant architectural difference between the two providers, and the biggest win in your Terraform-first migration.&lt;/p&gt;

&lt;p&gt;In Opsgenie, on-call configuration is spread across three separate resources that reference each other by ID. In All Quiet, it's all one resource: &lt;code&gt;allquiet_team_escalations&lt;/code&gt;. This resource follows a clear hierarchy: Escalation Tiers → Schedules → Rotations, which mirrors how on-call actually works: you have layers of people to notify (tiers), each layer has time-based coverage windows (schedules), and people rotate through those windows (rotations).&lt;/p&gt;

&lt;h3&gt;
  
  
  The Opsgenie Way (3 resources, fragile cross-references):
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"opsgenie_schedule"&lt;/span&gt; &lt;span class="s2"&gt;"devops_oncall"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"DevOps On-Call"&lt;/span&gt;
  &lt;span class="nx"&gt;timezone&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Europe/Berlin"&lt;/span&gt;
  &lt;span class="nx"&gt;enabled&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="nx"&gt;owner_team_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"opsgenie_schedule_rotation"&lt;/span&gt; &lt;span class="s2"&gt;"devops_weekly"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;schedule_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_schedule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops_oncall&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Weekly Rotation"&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"weekly"&lt;/span&gt;
  &lt;span class="nx"&gt;length&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
  &lt;span class="nx"&gt;start_date&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2024-01-01T09:00:00Z"&lt;/span&gt;

  &lt;span class="nx"&gt;participant&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"user"&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sre_lead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;participant&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"user"&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;backend_eng&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"opsgenie_escalation"&lt;/span&gt; &lt;span class="s2"&gt;"devops_escalation"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"DevOps Escalation"&lt;/span&gt;
  &lt;span class="nx"&gt;owner_team_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;rules&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;condition&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"if-not-acked"&lt;/span&gt;
    &lt;span class="nx"&gt;notify_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"default"&lt;/span&gt;
    &lt;span class="nx"&gt;delay&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;

    &lt;span class="nx"&gt;recipient&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"schedule"&lt;/span&gt;
      &lt;span class="nx"&gt;id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_schedule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops_oncall&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;rules&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;condition&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"if-not-acked"&lt;/span&gt;
    &lt;span class="nx"&gt;notify_type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"default"&lt;/span&gt;
    &lt;span class="nx"&gt;delay&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;

    &lt;span class="nx"&gt;recipient&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"user"&lt;/span&gt;
      &lt;span class="nx"&gt;id&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;opsgenie_user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;repeat&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;wait_interval&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;
    &lt;span class="nx"&gt;count&lt;/span&gt;                  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="nx"&gt;reset_recipient_states&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's 3 resources, 50+ lines, with IDs threaded between them. Delete the schedule without updating the escalation and you get a dangling reference.&lt;/p&gt;

&lt;h3&gt;
  
  
  The All Quiet Way (1 resource, self-contained):
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_team_escalations"&lt;/span&gt; &lt;span class="s2"&gt;"devops_oncall"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;team_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;escalation_tiers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;# TIER 1: On-call rotation — alert the person on duty&lt;/span&gt;
    &lt;span class="nx"&gt;repeats&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="nx"&gt;repeats_after_minutes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;

    &lt;span class="nx"&gt;schedules&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"DevOps Weekly Rotation"&lt;/span&gt;

      &lt;span class="nx"&gt;rotation_settings&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;rotation_mode&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"auto"&lt;/span&gt;
        &lt;span class="nx"&gt;auto_rotation_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;         &lt;span class="c1"&gt;# One person on-call at a time&lt;/span&gt;
        &lt;span class="nx"&gt;repeats&lt;/span&gt;            &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"weekly"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="nx"&gt;rotations&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;members&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;team_membership_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team_membership&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops_sre_lead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nx"&gt;members&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;team_membership_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team_membership&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops_backend_eng&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;escalation_tiers&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;# TIER 2: If Tier 1 exhausts its repeats, escalate to the manager&lt;/span&gt;
    &lt;span class="nx"&gt;repeats&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="nx"&gt;schedules&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Manager Escalation"&lt;/span&gt;

      &lt;span class="nx"&gt;rotations&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;members&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;team_membership_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team_membership&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops_manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything that was spread across &lt;code&gt;opsgenie_schedule&lt;/code&gt;, &lt;code&gt;opsgenie_schedule_rotation&lt;/code&gt;, and &lt;code&gt;opsgenie_escalation&lt;/code&gt; is now a single &lt;code&gt;allquiet_team_escalations&lt;/code&gt; resource. Schedules and rotations live inside escalation tiers, so there's no way for them to become orphaned. Every person, whether they're in a rotating schedule or a single-person escalation target, is referenced via their &lt;code&gt;team_membership_id&lt;/code&gt;, which keeps the dependency graph clean.&lt;/p&gt;

&lt;p&gt;Key mapping:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Opsgenie concept&lt;/th&gt;
&lt;th&gt;All Quiet equivalent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Escalation rule with delay&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;escalation_tiers&lt;/code&gt; with &lt;code&gt;repeats_after_minutes&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;opsgenie_schedule&lt;/code&gt; (time windows)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;schedules&lt;/code&gt; block within a tier, define on-call times (e.g., weekdays 08:00–18:00)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;opsgenie_schedule_rotation&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;rotation_settings&lt;/code&gt; within a &lt;code&gt;schedules&lt;/code&gt; block, auto or explicit mode&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rotation participants&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;rotations&lt;/code&gt; → &lt;code&gt;members&lt;/code&gt; → &lt;code&gt;team_membership_id&lt;/code&gt; (always via membership)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiple schedules for follow-the-sun&lt;/td&gt;
&lt;td&gt;Multiple &lt;code&gt;schedules&lt;/code&gt; blocks in the same tier, each covering different hours/days&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;repeat&lt;/code&gt; block on escalation&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;repeats&lt;/code&gt; on the relevant tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recipient: schedule&lt;/td&gt;
&lt;td&gt;Tier with a &lt;code&gt;schedules&lt;/code&gt; block containing rotations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recipient: user&lt;/td&gt;
&lt;td&gt;Tier with one schedule, one rotation, one member&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Advanced patterns:&lt;/strong&gt; Round-robin alerting distributes incidents evenly when multiple people are on-call simultaneously. On-call overrides, both personal and team-level, can be managed via the &lt;code&gt;allquiet_on_call_override&lt;/code&gt; Terraform resource without touching the escalation config. See the &lt;a href="https://docs.allquiet.app/essentials/escalations" rel="noopener noreferrer"&gt;escalation docs&lt;/a&gt; for the full set of options.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note on complexity:&lt;/strong&gt; Opsgenie's delay/repeat model and All Quiet's tier-level repeats / repeats_after_minutes / auto_escalation_after_minutes don't map one-to-one in every case. Simple escalations translate cleanly, but complex multi-rule Opsgenie policies may need case-by-case tuning. We recommend testing each escalation path with a synthetic incident before cutting over.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note on time restrictions:&lt;/strong&gt; Opsgenie's time_restriction blocks on rotations (time-of-day and weekday-and-time-of-day) map to All Quiet's schedule on-call times. In All Quiet, each schedule defines its active hours and days directly (e.g., "Monday–Friday, 09:00–17:00"), which is more intuitive than Opsgenie's separate restriction blocks. Review these during migration.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Routing: The Incident Traffic Controller
&lt;/h2&gt;

&lt;p&gt;In Opsgenie, routing logic is often buried inside the integration itself (via responders blocks) or handled by notification policies. In All Quiet, routing is an explicit, first-class resource with a powerful rules engine. Each rule has three components: Conditions (when to trigger), Actions (what to do), and Channels (how to notify).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"allquiet_routing"&lt;/span&gt; &lt;span class="s2"&gt;"prod_alerts"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;team_id&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;allquiet_team&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;devops&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;display_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Production Alert Routing"&lt;/span&gt;

  &lt;span class="nx"&gt;rules&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;# Mute Slack for test environment alerts, only send email&lt;/span&gt;
      &lt;span class="nx"&gt;conditions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;statuses&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Open"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;attributes&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Environment"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;operator&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"="&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Test"&lt;/span&gt; &lt;span class="p"&gt;}]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;actions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;change_severity&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Minor"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;channels&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;notification_channels&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Email"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;# For escalated critical incidents, also trigger the PagerDuty outbound webhook&lt;/span&gt;
      &lt;span class="nx"&gt;conditions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;statuses&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Open"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;severities&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Critical"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;intents&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Escalated"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;channels&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;outbound_integrations&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;allquiet_outbound_integration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pagerduty_webhook&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Valid notification_channels values are "Email", "Push", "SMS", and "VoiceCall".&lt;/p&gt;

&lt;p&gt;Routing conditions can filter on severity, status, specific integrations, incident intents (created, escalated, resolved), custom payload attributes, and even time-of-day restrictions. Actions include discarding incidents, changing severity, assigning to other teams (within an Organization), adding interactions, and delaying execution. This replaces Opsgenie's scattered notification policies with a single, auditable, version-controlled resource.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pro tip:&lt;/strong&gt; For multi-team setups, you can create a "root team" that owns your integrations and uses routing rules to fan out incidents to the appropriate team based on payload attributes. See the &lt;a href="https://docs.allquiet.app/advanced/routing" rel="noopener noreferrer"&gt;routing docs&lt;/a&gt; for detailed examples.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration Checklist
&lt;/h2&gt;

&lt;p&gt;Here's the step-by-step order we recommend. The key dependency is that &lt;code&gt;allquiet_team_membership&lt;/code&gt; requires both the team and user to exist, and &lt;code&gt;allquiet_team_escalations&lt;/code&gt; references membership IDs, so teams, users, and memberships must all be in place before you build escalation tiers.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Set up the All Quiet Organization and generate your API key.&lt;/li&gt;
&lt;li&gt;Create teams (&lt;code&gt;allquiet_team&lt;/code&gt;) and provision users (&lt;code&gt;allquiet_user&lt;/code&gt;). These two have no dependency on each other and can be created in any order or in parallel.&lt;/li&gt;
&lt;li&gt;Link users to teams (&lt;code&gt;allquiet_team_membership&lt;/code&gt;), one resource per user–team pair. This requires both the team and user to exist.&lt;/li&gt;
&lt;li&gt;Create integrations (&lt;code&gt;allquiet_integration&lt;/code&gt;) for each monitoring source. Note the new webhook URLs.&lt;/li&gt;
&lt;li&gt;Customize payload mappings (&lt;code&gt;allquiet_integration_mapping&lt;/code&gt;) if the defaults don't fit your payload structure.&lt;/li&gt;
&lt;li&gt;Configure notification preferences (&lt;code&gt;allquiet_user_incident_notification_settings&lt;/code&gt;), this controls how each user gets alerted (push, SMS, voice call, email) and with what delay.&lt;/li&gt;
&lt;li&gt;Build escalation policies (&lt;code&gt;allquiet_team_escalations&lt;/code&gt;) by merging your Opsgenie schedules, rotations, and escalation rules into unified tiers. Rotations reference &lt;code&gt;team_membership_id&lt;/code&gt;, so memberships must exist first.&lt;/li&gt;
&lt;li&gt;Set up routing rules (&lt;code&gt;allquiet_routing&lt;/code&gt;) for any advanced alert routing.&lt;/li&gt;
&lt;li&gt;Set up outbound integrations (&lt;code&gt;allquiet_outbound_integration&lt;/code&gt;) for Slack, Microsoft Teams, or webhook notifications.&lt;/li&gt;
&lt;li&gt;Run both systems in parallel, point your monitoring tools at both Opsgenie and All Quiet webhook URLs for a burn-in period. Trigger test incidents to verify the full notification chain.&lt;/li&gt;
&lt;li&gt;Cut over, update webhook URLs to point only to All Quiet, then terraform destroy the Opsgenie resources.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why Switch to All Quiet?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bootstrapped and independent.&lt;/strong&gt; We aren't beholden to Private Equity, Venture Capital firms or enterprise conglomerates. We build for SREs, not for quarterly earnings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure as Code, natively.&lt;/strong&gt; Our Terraform provider isn't an afterthought, it's built to be the primary way you manage your on-call setup. Resources provisioned via Terraform are locked in the web app to prevent drift.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost efficiency.&lt;/strong&gt; Stop paying the "Atlassian Tax." All Quiet provides the same high-availability alerting at a fraction of the cost, with a transparent pricing model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Less fragmentation, less drift.&lt;/strong&gt; Opsgenie spreads on-call logic across separate schedule, rotation, and escalation resources that reference each other by ID. All Quiet collapses that into a single &lt;code&gt;allquiet_team_escalations&lt;/code&gt; resource, fewer cross-references means fewer ways for your Terraform state to diverge from reality.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ready to simplify your stack? Check out our &lt;a href="https://docs.allquiet.app/advanced/terraform" rel="noopener noreferrer"&gt;Terraform Provider documentation&lt;/a&gt; and start your migration today.&lt;/p&gt;

</description>
      <category>opsgenie</category>
      <category>terraform</category>
      <category>devops</category>
      <category>sre</category>
    </item>
    <item>
      <title>AWS Elastic IP failover with Keepalived: how we keep self-managed loadbalancers redundant</title>
      <dc:creator>Mads Quist</dc:creator>
      <pubDate>Mon, 11 May 2026 16:28:24 +0000</pubDate>
      <link>https://dev.to/allquiet/aws-elastic-ip-failover-with-keepalived-how-we-keep-self-managed-loadbalancers-redundant-489i</link>
      <guid>https://dev.to/allquiet/aws-elastic-ip-failover-with-keepalived-how-we-keep-self-managed-loadbalancers-redundant-489i</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on 10 May 2026 on the &lt;a href="https://allquiet.app/blog/elastic-ip-failover-with-keepalived-aws-ec2" rel="noopener noreferrer"&gt;All Quiet Tech Blog&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;At &lt;strong&gt;All Quiet&lt;/strong&gt; we build &lt;strong&gt;incident management&lt;/strong&gt;: alerting, on-call rotations, escalation, status pages, and integrations with the monitoring stacks teams already run. A meaningful slice of my job is keeping the boring edges boring, especially &lt;strong&gt;ingress&lt;/strong&gt;, when something breaks.&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;parts of our stack&lt;/strong&gt; we &lt;strong&gt;run loadbalancers ourselves&lt;/strong&gt; on EC2 instead of putting every path behind an AWS-managed balancer. We do that in part to &lt;strong&gt;avoid leaning too hard on higher-level AWS abstractions&lt;/strong&gt; for those tiers: we still rely on EC2 for reliable virtual machines, and we keep the design close to &lt;strong&gt;portable building blocks&lt;/strong&gt; so we could run the same pattern in &lt;strong&gt;another data center or provider&lt;/strong&gt; without a ground-up redesign. Once we made that choice, we still had a plain &lt;strong&gt;high availability (HA)&lt;/strong&gt; problem for the &lt;strong&gt;active-passive&lt;/strong&gt; pair: &lt;strong&gt;keep the public edge redundant.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For those tiers we use a &lt;strong&gt;small pattern&lt;/strong&gt;: a stable &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-ip-addresses-eip.html" rel="noopener noreferrer"&gt;Elastic IP&lt;/a&gt; (EIP), the address we publish in DNS and the stand-in for a &lt;strong&gt;floating IP&lt;/strong&gt; on a traditional network; &lt;a href="https://www.keepalived.org/" rel="noopener noreferrer"&gt;Keepalived&lt;/a&gt; running &lt;a href="https://www.rfc-editor.org/rfc/rfc5798" rel="noopener noreferrer"&gt;Virtual Router Redundancy Protocol (VRRP)&lt;/a&gt; between peers; and the &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/APIReference/Welcome.html" rel="noopener noreferrer"&gt;EC2 API&lt;/a&gt;, mainly &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_AssignPrivateIpAddresses.html" rel="noopener noreferrer"&gt;&lt;code&gt;AssignPrivateIpAddresses&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_AssociateAddress.html" rel="noopener noreferrer"&gt;&lt;code&gt;AssociateAddress&lt;/code&gt;&lt;/a&gt;, to &lt;strong&gt;move&lt;/strong&gt; that EIP when mastership changes. We wire this with &lt;a href="https://docs.ansible.com/ansible/latest/getting_started/index.html" rel="noopener noreferrer"&gt;Ansible&lt;/a&gt; and the &lt;a href="https://docs.aws.amazon.com/cdk/v2/guide/home.html" rel="noopener noreferrer"&gt;AWS Cloud Development Kit (CDK)&lt;/a&gt; in our infrastructure repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem in AWS terms
&lt;/h2&gt;

&lt;p&gt;I grew up with patterns where a “floating IP” moves at layer 2 (L2) with gratuitous Address Resolution Protocol (ARP). &lt;a href="https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html" rel="noopener noreferrer"&gt;Amazon Virtual Private Cloud (VPC)&lt;/a&gt; doesn’t work like your favorite rack fabric: public routing for Elastic IPs is enforced by AWS’s control plane, tied to a specific Elastic Network Interface (ENI) and private address on an instance.&lt;/p&gt;

&lt;p&gt;So we split responsibilities deliberately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Between our servers&lt;/strong&gt;, we use Keepalived / VRRP, almost always &lt;strong&gt;unicast&lt;/strong&gt;, to decide which node is primary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Against AWS&lt;/strong&gt;, we run a script on &lt;code&gt;notify_master&lt;/code&gt; that calls the command-line interface (CLI) or API so the EIP actually attaches to the winner.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If we did only VRRP virtual-address tricks without &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_AssociateAddress.html" rel="noopener noreferrer"&gt;&lt;code&gt;AssociateAddress&lt;/code&gt;&lt;/a&gt;, we would not fix customer-visible public routing for that EIP. If we did only API moves without Keepalived, we’d lack a clean distributed agreement story on the pair. &lt;strong&gt;We need both layers.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture at a glance
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                    Elastic IP (stable in DNS)
                              │
                              ▼
              ┌───────────────────────────────┐
              │  EC2: EIP associated here     │
              │  (AssociateAddress, etc.)     │
              └───────────────────────────────┘
                              │
                  Our LB tier (e.g. HAProxy / nginx)
                              │
                           backends
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On both nodes we run Keepalived with unicast peers, priorities, and a &lt;code&gt;vrrp_script&lt;/code&gt; that reflects whether our LB process is actually alive (&lt;code&gt;systemctl&lt;/code&gt;, &lt;code&gt;curl&lt;/code&gt; to localhost, or whatever probe matches reality). When a node becomes MASTER, &lt;code&gt;notify_master&lt;/code&gt; runs our failover shell script: ensure a &lt;strong&gt;secondary private IP&lt;/strong&gt;, then associate the allocation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation sketch
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Kernel:&lt;/strong&gt; we often enable &lt;code&gt;ip_forward&lt;/code&gt; / &lt;code&gt;ip_nonlocal_bind&lt;/code&gt; where our &lt;a href="http://www.haproxy.org/" rel="noopener noreferrer"&gt;HAProxy&lt;/a&gt; or &lt;a href="https://nginx.org/en/docs/" rel="noopener noreferrer"&gt;nginx&lt;/a&gt; layout needs it. We validate per role, not globally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security groups:&lt;/strong&gt; &lt;a href="https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml" rel="noopener noreferrer"&gt;protocol 112&lt;/a&gt; (Virtual Router Redundancy Protocol, VRRP) allowed between peers, not the open internet.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keepalived:&lt;/strong&gt; &lt;code&gt;notify_master&lt;/code&gt; logs to a rotated file; credentials via an &lt;a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html" rel="noopener noreferrer"&gt;Identity and Access Management (IAM) instance profile&lt;/a&gt; where we can.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instance identity:&lt;/strong&gt; in production we fetch &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html" rel="noopener noreferrer"&gt;instance metadata&lt;/a&gt; using &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html" rel="noopener noreferrer"&gt;Instance Metadata Service version 2 (IMDSv2)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Example Keepalived skeleton (placeholders only, not a drop-in):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;global_defs&lt;/span&gt; {
    &lt;span class="n"&gt;enable_script_security&lt;/span&gt;
    &lt;span class="n"&gt;script_user&lt;/span&gt; &lt;span class="n"&gt;root&lt;/span&gt;
    &lt;span class="n"&gt;vrrp_startup_delay&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
}

&lt;span class="n"&gt;vrrp_script&lt;/span&gt; &lt;span class="n"&gt;check_service&lt;/span&gt; {
    &lt;span class="n"&gt;script&lt;/span&gt; &lt;span class="s2"&gt;"/usr/bin/systemctl is-active --quiet nginx"&lt;/span&gt;
    &lt;span class="n"&gt;interval&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
    &lt;span class="n"&gt;weight&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;
}

&lt;span class="n"&gt;vrrp_instance&lt;/span&gt; &lt;span class="n"&gt;VI_1&lt;/span&gt; {
    &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="n"&gt;MASTER&lt;/span&gt;
    &lt;span class="n"&gt;interface&lt;/span&gt; &lt;span class="n"&gt;eth0&lt;/span&gt;
    &lt;span class="n"&gt;unicast_src_ip&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;10&lt;/span&gt;
    &lt;span class="n"&gt;unicast_peer&lt;/span&gt; {
        &lt;span class="m"&gt;10&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;0&lt;/span&gt;.&lt;span class="m"&gt;11&lt;/span&gt;
    }
    &lt;span class="n"&gt;virtual_router_id&lt;/span&gt; &lt;span class="m"&gt;51&lt;/span&gt;
    &lt;span class="n"&gt;priority&lt;/span&gt; &lt;span class="m"&gt;200&lt;/span&gt;
    &lt;span class="n"&gt;advert_int&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;track_script&lt;/span&gt; {
        &lt;span class="n"&gt;check_service&lt;/span&gt;
    }
    &lt;span class="n"&gt;notify_master&lt;/span&gt; &lt;span class="s2"&gt;"/etc/keepalived/aws-failover.sh &amp;gt;&amp;gt; /var/log/keepalived/aws-failover.log"&lt;/span&gt;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example failover script shape (replace IDs and IPs; use IMDSv2 in production):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/usr/bin/env bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-euo&lt;/span&gt; pipefail

&lt;span class="nv"&gt;ALLOCATION_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"eipalloc-REPLACE_ME"&lt;/span&gt;
&lt;span class="nv"&gt;PRIVATE_IP_SECONDARY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"10.0.0.50"&lt;/span&gt;
&lt;span class="nv"&gt;INTERFACE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"eth0"&lt;/span&gt;

&lt;span class="nv"&gt;INSTANCE_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-sf&lt;/span&gt; http://169.254.169.254/latest/meta-data/instance-id&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

ip addr add &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PRIVATE_IP_SECONDARY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/32"&lt;/span&gt; dev &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;INTERFACE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true

&lt;/span&gt;&lt;span class="nv"&gt;NI_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 describe-instances &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--instance-ids&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;INSTANCE_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Reservations[0].Instances[0].NetworkInterfaces[0].NetworkInterfaceId'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

aws ec2 assign-private-ip-addresses &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--network-interface-id&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NI_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--private-ip-addresses&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PRIVATE_IP_SECONDARY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

aws ec2 associate-address &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--allocation-id&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ALLOCATION_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--instance-id&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;INSTANCE_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--private-ip-address&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PRIVATE_IP_SECONDARY&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tradeoffs of managing the edge ourselves
&lt;/h2&gt;

&lt;p&gt;When we &lt;strong&gt;self-manage&lt;/strong&gt; load balancer tiers instead of defaulting to AWS-managed front doors, we still need to evaluate the usual architectures: application or network load balancers, DNS failover, Kubernetes ingress, or the Elastic IP + Keepalived pattern this post describes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Application Load Balancer (ALB)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros / cons (for anyone choosing ALB):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Upside:&lt;/strong&gt; AWS-managed HA, &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/application/target-group-health-checks.html" rel="noopener noreferrer"&gt;health checks&lt;/a&gt;, Transport Layer Security (TLS) with &lt;a href="https://docs.aws.amazon.com/acm/latest/userguide/acm-overview.html" rel="noopener noreferrer"&gt;AWS Certificate Manager (ACM)&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/waf/latest/developerguide/waf-chapter.html" rel="noopener noreferrer"&gt;AWS Web Application Firewall (WAF)&lt;/a&gt;, and a clear scaling story for HTTP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downside:&lt;/strong&gt; cost at scale, less hands-on control over every packet and knob than raw EC2.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At All Quiet:&lt;/strong&gt; we rely on managed load balancing for paths where we want AWS to own HA end-to-end, including customer-facing HTTP. We treat ALB-class tooling as the default when we do not want to operate the edge ourselves.&lt;/p&gt;

&lt;h3&gt;
  
  
  Network Load Balancer (NLB)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros / cons (for anyone choosing NLB):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Upside:&lt;/strong&gt; &lt;a href="https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html" rel="noopener noreferrer"&gt;TCP/UDP transparency&lt;/a&gt;, static IPs per Availability Zone (AZ), low listener overhead compared to full layer 7 (L7).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downside:&lt;/strong&gt; fewer HTTP-specific features than ALB; still another billable and operated AWS component.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At All Quiet:&lt;/strong&gt; when we need AWS-managed HA but not full layer 7 (L7) termination at the edge, NLB-style fits better than ALB; we don’t replace every self-managed tier with NLB, but it’s on the same “managed edge” side of the spectrum as ALB.&lt;/p&gt;

&lt;h3&gt;
  
  
  DNS failover (Route 53)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros / cons (for anyone using DNS failover):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Upside:&lt;/strong&gt; no instance-side EIP choreography; &lt;a href="https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover-types.html" rel="noopener noreferrer"&gt;health-checked routing policies&lt;/a&gt; let AWS steer names.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downside:&lt;/strong&gt; DNS time to live (TTL) and caching stretch failover and failback; client stacks behave inconsistently; not a drop-in substitute for “one stable IP, instant swing.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At All Quiet:&lt;/strong&gt; DNS steering can complement other designs; we don’t rely on it alone when our mental model is exactly one Elastic IP jumping between two known EC2 nodes. That is what Keepalived plus the API covers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kubernetes / gateways (e.g. Amazon EKS)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros / cons (for anyone on Kubernetes):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Upside:&lt;/strong&gt; HA via &lt;a href="https://kubernetes.io/docs/concepts/services-networking/service/" rel="noopener noreferrer"&gt;Services&lt;/a&gt;, &lt;a href="https://kubernetes.io/docs/concepts/services-networking/ingress/" rel="noopener noreferrer"&gt;Ingress&lt;/a&gt; / &lt;a href="https://gateway-api.sigs.k8s.io/" rel="noopener noreferrer"&gt;Gateway API&lt;/a&gt;, and cloud LB integration, which gives different primitives than a bare-metal pair.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downside:&lt;/strong&gt; cluster operational tax; not every workload belongs there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At All Quiet:&lt;/strong&gt; this article describes a pair pattern centered on &lt;strong&gt;virtual machines (VMs)&lt;/strong&gt; because we still run meaningful tiers that way; where we use &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/what-is-eks.html" rel="noopener noreferrer"&gt;Amazon Elastic Kubernetes Service (EKS)&lt;/a&gt; or similar, ingress HA follows Kubernetes, not Keepalived on two fixed hosts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Elastic IP + Keepalived + EC2 API (this post)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Pros / cons (for anyone building like this):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Upside:&lt;/strong&gt; one stable public address in DNS; relatively few moving AWS objects; full control over timers and failover scripts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Downside:&lt;/strong&gt; you own IAM, idempotent scripts, logging, monitoring, and ambiguous states deserve runbooks and checks such as &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeAddresses.html" rel="noopener noreferrer"&gt;&lt;code&gt;DescribeAddresses&lt;/code&gt;&lt;/a&gt;. Compared with a managed load balancer, cutover is not instantaneous on abrupt failure: traffic follows wherever the EIP is still associated until VRRP agrees on a new master, your health logic runs, and &lt;code&gt;notify_master&lt;/code&gt; finishes calling AWS. The gap depends on &lt;code&gt;advert_int&lt;/code&gt;, &lt;code&gt;vrrp_script&lt;/code&gt; intervals, preempt settings, and API behavior. Those knobs trade sensitivity against stability; sub‑millisecond failover is not what this pattern promises.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At All Quiet:&lt;/strong&gt; this is what we actually implemented for &lt;strong&gt;specific self-managed loadbalancer tiers&lt;/strong&gt;: Ansible-deployed Keepalived, scripts on &lt;code&gt;notify_master&lt;/code&gt;, &lt;a href="https://docs.aws.amazon.com/cdk/v2/guide/home.html" rel="noopener noreferrer"&gt;AWS Cloud Development Kit (CDK)&lt;/a&gt; and infrastructure as code (IaC) for the EIP and IAM. That is the same stack this article walks through at a pattern level.&lt;/p&gt;

&lt;h3&gt;
  
  
  How we pick among them
&lt;/h3&gt;

&lt;p&gt;Internally we ask: does this path’s &lt;strong&gt;service level objective (SLO) and budget&lt;/strong&gt; justify a managed LB? Do we need &lt;strong&gt;layer 7 (L7) features&lt;/strong&gt; only ALB gives us? Does &lt;strong&gt;scripted EIP failover&lt;/strong&gt; fit this path’s resilience expectations (see the EIP downside above)? If not, we promote the tier (ALB/NLB or another design); we don’t stretch EIP+Keepalived past where it fits.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;We run self-managed LBs in some slices of our infra; we needed explicit public HA there, and EIP + Keepalived + EC2 API is our compact answer.&lt;/li&gt;
&lt;li&gt;VRRP decides who leads; &lt;code&gt;AssociateAddress&lt;/code&gt; decides where the EIP points.&lt;/li&gt;
&lt;li&gt;Managed ALB/NLB remain strong defaults when we want AWS to own HA at that layer.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;We touch incident paths every day; when ingress misbehaves, people notice fast. If you operate similar edges, use this framing to decide when the pattern fits and when to promote that tier to managed load balancing instead.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>keepalived</category>
      <category>devops</category>
      <category>redundancy</category>
    </item>
    <item>
      <title>Running our test environment fully on Nanos</title>
      <dc:creator>Mads Quist</dc:creator>
      <pubDate>Thu, 18 Sep 2025 16:19:15 +0000</pubDate>
      <link>https://dev.to/mads_quist/running-our-test-environment-fully-on-nanos-12f9</link>
      <guid>https://dev.to/mads_quist/running-our-test-environment-fully-on-nanos-12f9</guid>
      <description>&lt;p&gt;We've probably all had this dream of waking up in the morning and finding ourselves with 10,000 new users on our platform. (At least as a Co-Founder &amp;amp; CTO, that's the dream) 😁&lt;/p&gt;

&lt;p&gt;So we asked ourselves: can we run a dedicated environment of our platform entirely on nanos?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Spoiler:&lt;/strong&gt; it worked, and it taught us a lot about where our real inefficiencies were hiding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here are some of these learnings:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The Reality of Scaling:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;When your startup does grow and you start scaling your infrastructure, what really happens? Often it turns out that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Important indexes are missing, leading to CPU spikes.&lt;/li&gt;
&lt;li&gt;Query results are loaded into memory without paging, causing out-of-memory errors.&lt;/li&gt;
&lt;li&gt;Other inefficiencies appear that raw cloud scaling alone cannot solve.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result is often not a sleek, hyper-scalable system but an extremely expensive piece of cloud infrastructure.&lt;/p&gt;

&lt;p&gt;The best part about scaling: &lt;strong&gt;Downsizing&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;What's great about the cloud is not just that it scales up. It also lets you scale down.&lt;/p&gt;

&lt;p&gt;Recently, we ran a load test we called &lt;strong&gt;nano testing&lt;/strong&gt;. The idea was simple: take our infrastructure and run it on AWS EC2 nano instances. That is as small as it gets on AWS.&lt;/p&gt;

&lt;p&gt;A nano instance has 512MB RAM and 2 virtual CPUs. That reminds me of my gaming PC from the year 2000. For comparison, your smartphone today probably has 10–20 times more RAM. We then dumped in about four times the amount of test data we normally process.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can It Run on Nanos?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The fun begins when you try to make your platform run decently on nanos. It does not have to be blazing fast. The key question is: does it run at all?&lt;/p&gt;

&lt;p&gt;If some API queries take a few seconds, or some asynchronous jobs and emails are processed with a bit of lag, that is perfectly fine. If your system can handle this, it means you have built something resilient.&lt;/p&gt;

&lt;p&gt;Running on nanos lets you pinpoint weaknesses in your platform very clearly. Fixes are often low-hanging fruit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add batch processing here.&lt;/li&gt;
&lt;li&gt;Add an index there.&lt;/li&gt;
&lt;li&gt;Add throttling in another place.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Our Approach at All Quiet&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At All Quiet, we now run a dedicated testing environment entirely on nanos. This allows us to observe scaling issues early and fix them before they turn into real problems.&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>testing</category>
      <category>aws</category>
    </item>
    <item>
      <title>Why We Built a MongoDB-Message Queue and Reinvented the Wheel</title>
      <dc:creator>Mads Quist</dc:creator>
      <pubDate>Thu, 04 Jul 2024 04:33:19 +0000</pubDate>
      <link>https://dev.to/allquiet/why-we-built-a-mongodb-message-queue-and-reinvented-the-wheel-al3</link>
      <guid>https://dev.to/allquiet/why-we-built-a-mongodb-message-queue-and-reinvented-the-wheel-al3</guid>
      <description>&lt;p&gt;Hey👋&lt;/p&gt;

&lt;p&gt;I'm Mads Quist, founder of &lt;a href="https://allquiet.app?utm_source=DEV_post"&gt;All Quiet &lt;/a&gt;. We've implemented a home-grown message queue based on MongoDB and I'm here to talk about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why we re-invented the wheel&lt;/li&gt;
&lt;li&gt;How we re-invented the wheel&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  1. Why we re-invented the wheel
&lt;/h1&gt;

&lt;p&gt;Why do we need message queuing?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://allquiet.app?utm_source=DEV_post"&gt;All Quiet &lt;/a&gt; is a modern incident management platform, similar to &lt;a href="https://www.pagerduty.com"&gt;PagerDuty&lt;/a&gt;. Our platform requires features like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sending a double-opt-in email asynchronously after a user registers&lt;/li&gt;
&lt;li&gt;Sending a reminder email 24 hours after registration&lt;/li&gt;
&lt;li&gt;Sending push notifications with Firebase Cloud Messaging (FCM), which can fail due to network or load problems. As push notifications are crucial to our app, we need to retry sending them if there's an issue.&lt;/li&gt;
&lt;li&gt;Accepting emails from outside our integration and processing them into incidents. This process can fail, so we wanted to decouple it and process each email payload on a queue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4t6m6vzaxdr9coh6tmv.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4t6m6vzaxdr9coh6tmv.jpeg" alt="Image description" width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Our tech stack
&lt;/h2&gt;

&lt;p&gt;To understand our specific requirements, it's important to get some insights into our tech stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We run a monolithic web application based on .NET Core 7.
The .NET Core application runs in a Docker container.&lt;/li&gt;
&lt;li&gt;We run multiple containers in parallel.&lt;/li&gt;
&lt;li&gt;An HAProxy instance distributes HTTP requests equally to each container, ensuring a highly available setup.&lt;/li&gt;
&lt;li&gt;We use MongoDB as our underlying database, replicated across availability zones.&lt;/li&gt;
&lt;li&gt;All of the above components are hosted by AWS on generic EC2 VMs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why we re-invented the wheel
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;We desired a simple queuing mechanism that could run in multiple processes simultaneously while guaranteeing that each message was processed only once.&lt;/li&gt;
&lt;li&gt;We didn't need a pub/sub pattern.&lt;/li&gt;
&lt;li&gt;We didn't aim for a complex distributed system based on CQRS / event sourcing because, you know, the first rule of distributed systems is to not distribute.&lt;/li&gt;
&lt;li&gt;We wanted to keep things as simple as possible, following the philosophy of choosing "boring technology".&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ultimately, it's about minimizing the number of moving parts in your infrastructure. We aim to build fantastic features for our excellent customers, and it's imperative to maintain our services reliably. Managing a single database system to achieve more than five nines of uptime is challenging enough. So why burden yourself with managing an additional HA RabbitMQ cluster?&lt;/p&gt;

&lt;h2&gt;
  
  
  Why not just use AWS SQS?
&lt;/h2&gt;

&lt;p&gt;Yeah… cloud solutions like AWS SQS, Google Cloud Tasks, or Azure Queue Storage are fantastic! However, they would have resulted in vendor lock-in. We simply aspire to be independent and cost-effective while still providing a scalable service to our clients.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1k5msd7vcdptz7zj7bc2.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1k5msd7vcdptz7zj7bc2.jpeg" alt="Image description" width="680" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  2. How we re-invented the wheel
&lt;/h1&gt;

&lt;p&gt;What is a message queue?&lt;/p&gt;

&lt;p&gt;A message queue is a system that stores messages. Producers of messages store these in the queue, which are later dequeued by consumers for processing. This is incredibly beneficial for decoupling components, especially when processing messages is a resource-intensive task.&lt;/p&gt;

&lt;h2&gt;
  
  
  What characteristics should our queue show?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Utilizing MongoDB as our data storage&lt;/li&gt;
&lt;li&gt;Guaranteeing that each message is consumed only once&lt;/li&gt;
&lt;li&gt;Allowing multiple consumers to process messages simultaneously&lt;/li&gt;
&lt;li&gt;Ensuring that if message processing fails, retries are possible&lt;/li&gt;
&lt;li&gt;Enabling scheduling of message consumption for the future&lt;/li&gt;
&lt;li&gt;Not needing guaranteed ordering&lt;/li&gt;
&lt;li&gt;Ensuring high availability&lt;/li&gt;
&lt;li&gt;Ensuring messages and their states are durable and can withstand restarts or extended downtimes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MongoDB has significantly evolved over the years and can meet the criteria listed above.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;In the sections that follow, I'll guide you through the MongoDB-specific implementation of our message queue. While you'll need a client library suitable for your preferred programming language, such as NodeJS, Go, or C# in the case of All Quiet, the concepts I'll share are platform agnostic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Queues
&lt;/h3&gt;

&lt;p&gt;Each queue you want to utilize is represented as a dedicated collection in your MongoDB database.&lt;br&gt;
Message Model&lt;/p&gt;

&lt;p&gt;Here's an example of a processed message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
    "_id" : NumberLong(638269014234217933),
    "Statuses" : [
        {
            "Status" : "Processed",
            "Timestamp" : ISODate("2023-08-06T06:50:23.753+0000"),
            "NextReevaluation" : null
        },
        {
            "Status" : "Processing",
            "Timestamp" : ISODate("2023-08-06T06:50:23.572+0000"),
            "NextReevaluation" : null
        },
        {
            "Status" : "Enqueued",
            "Timestamp" : ISODate("2023-08-06T06:50:23.421+0000"),
            "NextReevaluation" : null
        }
    ],
    "Payload" : {
        "YourData" : "abc123"
    }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s look at each property of the message.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;_id&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;_id&lt;/code&gt; field is the canonical unique identifier property of MongoDB. Here, it contains a &lt;code&gt;NumberLong&lt;/code&gt;, not an &lt;code&gt;ObjectId&lt;/code&gt; . We need &lt;code&gt;NumberLong&lt;/code&gt; instead of &lt;code&gt;ObjectId&lt;/code&gt; because:&lt;/p&gt;

&lt;p&gt;While &lt;code&gt;ObjectId&lt;/code&gt; values should increase over time, they are not necessarily monotonic. This is because they:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Only contain one second of temporal resolution, so ObjectId values created within the same second do not have a guaranteed ordering, and are generated by clients, which may have differing system clocks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In our C# implementation, we generate an &lt;code&gt;Id&lt;/code&gt; with millisecond precision and guaranteed ordering based on insertion time. Although we don't require strict processing order in a multi-consumer environment (similar to RabbitMQ), it's essential to maintain FIFO order when operating with just one consumer. Achieving this with &lt;code&gt;ObjectId&lt;/code&gt; is not feasible. If this isn't crucial for you, you can still use &lt;code&gt;ObjectId&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Statuses
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;Statuses&lt;/code&gt; property consists of an array containing the message processing history. At index &lt;code&gt;0&lt;/code&gt;, you'll find the current status, which is crucial for indexing.&lt;/p&gt;

&lt;p&gt;The status object itself contains three properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Status&lt;/code&gt;: Can be "Enqueued", "Processing", "Processed", or "Failed".&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Timestamp&lt;/code&gt;: This captures the current timestamp.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;NextReevaluation&lt;/code&gt;: Records when the next evaluation should occur, which is essential for both retries and future scheduled executions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Payload
&lt;/h3&gt;

&lt;p&gt;This property contains the specific payload of your message.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enqueuing a message
&lt;/h3&gt;

&lt;p&gt;Adding a message is a straightforward insert operation into the collection with the status set to &lt;code&gt;"Enqueued"&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For immediate processing, set &lt;code&gt;NextReevaluation&lt;/code&gt; to null.&lt;/li&gt;
&lt;li&gt;For future processing, set &lt;code&gt;NextReevaluation&lt;/code&gt; to a timestamp in the future, when you want your message to be processed.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db.yourQueueCollection.insert({
    "_id" : NumberLong(638269014234217933),
    "Statuses" : [
        {
            "Status" : "Enqueued",
            "Timestamp" : ISODate("2023-08-06T06:50:23.421+0000"),
            "NextReevaluation" : null
        }
    ],
    "Payload" : {
        "YourData" : "abc123"
    }
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dequeuing a message
&lt;/h3&gt;

&lt;p&gt;Dequeuing is slightly more complex but still relatively straightforward. It heavily relies on the concurrent atomic read and update capabilities of MongoDB.&lt;/p&gt;

&lt;p&gt;This essential feature of MongoDB ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each message is processed only once.&lt;/li&gt;
&lt;li&gt;Multiple consumers can safely process messages simultaneously.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db.yourQueueCollection.findAndModify({
   "query": {
      "$and": [
         {
            "Statuses.0.Status": "Enqueued"
         },
         {
            "Statuses.0.NextReevaluation": null
         }
      ]
   },
   "update": {
      "$push": {
         "Statuses": {
            "$each": [
               {
                  "Status": "Processing",
                  "Timestamp": ISODate("2023-08-06T06:50:23.800+0000"),
                  "NextReevaluation": null
               }
            ],
            "$position": 0
         }
      }
   }
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So we are reading one message that is in state &lt;code&gt;“Enqueued”&lt;/code&gt; and at the same time modify it by setting the status &lt;code&gt;“Processing”&lt;/code&gt; at position &lt;code&gt;0&lt;/code&gt;. Since this operation is atomic it will guarantee that the message will not be picked up by another consumer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Marking a message as processed
&lt;/h3&gt;

&lt;p&gt;Once the processing of the message is complete, it's a simple matter of updating the message status to &lt;code&gt;"Processed"&lt;/code&gt; using the message’s &lt;code&gt;id&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db.yourQueueCollection.findAndModify({
   "query": {
     "_id": NumberLong(638269014234217933)
   },
   "update": {
      "$push": {
         "Statuses": {
            "$each": [
               {
                  "Status": "Processed",
                  "Timestamp": ISODate("2023-08-06T06:50:24.100+0000"),
                  "NextReevaluation": null
               }
            ],
            "$position": 0
         }
      }
   }
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Marking a message as failed
&lt;/h3&gt;

&lt;p&gt;If processing fails, we need to mark the message accordingly. Often, you might want to retry processing the message. This can be achieved by re-enqueuing the message. In many scenarios, it makes sense to reprocess the message after a specific delay, such as 10 seconds, depending on the nature of the processing failure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db.yourQueueCollection.findAndModify({
   "query": {
     "_id": NumberLong(638269014234217933)
   },
   "update": {
      "$push": {
         "Statuses": {
            "$each": [
               {
                  "Status": "Failed",
                  "Timestamp": ISODate("2023-08-06T06:50:24.100+0000"),
                  "NextReevaluation": ISODate("2023-08-06T07:00:24.100+0000")
               }
            ],
            "$position": 0
         }
      }
   }
});

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The dequeuing loop
&lt;/h3&gt;

&lt;p&gt;We've established how we can easily enqueue and dequeue items from our "queue," which is, in fact, simply a MongoDB collection. We can even "schedule" messages for the future by leveraging the &lt;code&gt;NextReevaluation&lt;/code&gt; field.&lt;/p&gt;

&lt;p&gt;What's missing is how we will dequeue regularly. Consumers need to execute the &lt;code&gt;findAndModify&lt;/code&gt; command in some kind of loop. A straightforward approach would be to create an endless loop in which we dequeue and process a message. This method is straightforward and effective. However, it will exert considerable pressure on the database and the network.&lt;/p&gt;

&lt;p&gt;An alternative would be to introduce a delay, e.g., 100ms, between loop iterations. This will significantly reduce the load but will also decrease the speed of dequeuing.&lt;/p&gt;

&lt;p&gt;The solution to the problem is what MongoDB refers to as a &lt;a href="https://www.mongodb.com/docs/manual/changeStreams/"&gt;change stream&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  MongoDB Change Streams
&lt;/h3&gt;

&lt;p&gt;What are &lt;a href="https://www.mongodb.com/docs/manual/changeStreams/"&gt;change streams&lt;/a&gt;? I can’t explain it better than the guys at MongoDB:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Change streams allow applications to access real-time data changes […]. Applications can use change streams to subscribe to all data changes on a single collection […] and immediately react to them.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Great! What we can do is listen to newly created documents in our queue collection, which effectively means listening to newly enqueued messages&lt;/p&gt;

&lt;p&gt;This is dead simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const changeStream = db.yourQueueCollection.watch();
changeStream.on('insert', changeEvent =&amp;gt; {
  // Dequeue the message
  db.yourQueueCollection.findAndModify({
    "query": changeEvent.documentKey._id,
    "update": {
      "$push": {
         "Statuses": {
            "$each": [
               {
                  "Status": "Processing",
                  "Timestamp": ISODate("2023-08-06T06:50:24.100+0000"),
                  "NextReevaluation": null
               }
            ],
            "$position": 0
         }
      }
   }
});
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scheduled and Orphaned Messages
&lt;/h3&gt;

&lt;p&gt;The change stream approach, however, does not work for both scheduled and orphaned messages because there is obviously no change that we can listen to.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scheduled messages simply sit in the collection with the status &lt;code&gt;"Enqueued"&lt;/code&gt; and a &lt;code&gt;"NextReevaluation"&lt;/code&gt; field set to the future.&lt;/li&gt;
&lt;li&gt;Orphaned messages are those that were in the &lt;code&gt;"Processing"&lt;/code&gt; status when their consumer process died. They remain in the collection with the status &lt;code&gt;"Processing"&lt;/code&gt; but no consumer will ever change their status to &lt;code&gt;"Processed"&lt;/code&gt; or &lt;code&gt;"Failed"&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For these use cases, we need to revert to our simple loop. However, we can use a rather generous delay between iterations.&lt;/p&gt;

&lt;h1&gt;
  
  
  Wrapping it up
&lt;/h1&gt;

&lt;p&gt;"Traditional" databases, like MySQL, PostgreSQL, or MongoDB (which I also view as traditional), are incredibly powerful today. If used correctly (ensure your indexes are optimized!), they are swift, scale impressively, and are cost-effective on traditional hosting platforms.&lt;/p&gt;

&lt;p&gt;Many use cases can be addressed using just a database and your preferred programming language. It's not always necessary to have the "right tool for the right job," meaning maintaining a diverse set of tools like Redis, Elasticsearch, RabbitMQ, etc. Often, the maintenance overhead isn't worth it.&lt;/p&gt;

&lt;p&gt;While the solution proposed might not match the performance of, for instance, RabbitMQ, it's usually sufficient and can scale to a point that would mark significant success for your startup.&lt;/p&gt;

&lt;p&gt;Software engineering is about navigating trade-offs. Choose yours wisely.&lt;/p&gt;

</description>
      <category>mongodb</category>
      <category>csharp</category>
      <category>dotnet</category>
      <category>eventdriven</category>
    </item>
  </channel>
</rss>
