<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mithun Shanbhag</title>
    <description>The latest articles on DEV Community by Mithun Shanbhag (@mithunshanbhag).</description>
    <link>https://dev.to/mithunshanbhag</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F420695%2F6c81c610-3d0a-4ebe-87a1-f0a65ee17ebf.jpeg</url>
      <title>DEV Community: Mithun Shanbhag</title>
      <link>https://dev.to/mithunshanbhag</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mithunshanbhag"/>
    <language>en</language>
    <item>
      <title>Introducing the CloudSkew Professional Plan</title>
      <dc:creator>Mithun Shanbhag</dc:creator>
      <pubDate>Sat, 26 Dec 2020 12:08:44 +0000</pubDate>
      <link>https://dev.to/cloudskew/introducing-the-cloudskew-professional-plan-56og</link>
      <guid>https://dev.to/cloudskew/introducing-the-cloudskew-professional-plan-56og</guid>
      <description>&lt;p&gt;&lt;a href="https://www.cloudskew.com"&gt;CloudSkew&lt;/a&gt; now introduces a professional plan for cloud architects. Subscribe to create unlimited architecture diagrams &amp;amp; reusable templates.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tBer_cFJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/pages/pricing/pricing-table-01.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tBer_cFJ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/pages/pricing/pricing-table-01.png" alt="cloudskew professional plan"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bonus: All early adopters &amp;amp; power users have been upgraded/grandparented for free for the next 6 months!&lt;/p&gt;

&lt;p&gt;Please refer to our &lt;a href="https://www.cloudskew.com/docs/pricing.html"&gt;pricing page&lt;/a&gt;, &lt;a href="https://www.cloudskew.com/docs/frequently-asked-questions.html"&gt;FAQs page&lt;/a&gt;, &lt;a href="https://www.cloudskew.com/about/terms-of-service.html"&gt;terms of service&lt;/a&gt; and &lt;a href="https://www.cloudskew.com/about/privacy-policy.html"&gt;privacy policy&lt;/a&gt; for more details.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>aws</category>
      <category>kubernetes</category>
      <category>googlecloud</category>
    </item>
    <item>
      <title>CloudSkew featured at Azure Community Conference</title>
      <dc:creator>Mithun Shanbhag</dc:creator>
      <pubDate>Sat, 26 Dec 2020 11:14:40 +0000</pubDate>
      <link>https://dev.to/cloudskew/cloudskew-featured-at-azure-community-conference-li2</link>
      <guid>https://dev.to/cloudskew/cloudskew-featured-at-azure-community-conference-li2</guid>
      <description>&lt;p&gt;A big thanks to the &lt;a href="https://www.azconf.dev/"&gt;Azure Community Conference&lt;/a&gt; for featuring CloudSkew. Here's the recording in which &lt;a href="https://twitter.com/mithunshanbhag"&gt;Mithun Shanbhag&lt;/a&gt; (CloudSkew creator) talks about the internals of &lt;a href="https://www.cloudskew.com"&gt;cloudskew.com&lt;/a&gt; and how it was built on top of Azure.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/isRLHQZbs08"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>azure</category>
      <category>community</category>
    </item>
    <item>
      <title>CloudSkew crosses 20K user signups</title>
      <dc:creator>Mithun Shanbhag</dc:creator>
      <pubDate>Sat, 05 Sep 2020 10:58:14 +0000</pubDate>
      <link>https://dev.to/cloudskew/cloudskew-crosses-20k-user-signups-5ak9</link>
      <guid>https://dev.to/cloudskew/cloudskew-crosses-20k-user-signups-5ak9</guid>
      <description>&lt;p&gt;&lt;a href="https://www.cloudskew.com"&gt;cloudskew.com&lt;/a&gt; is acquiring users at a steady clip and just crossed 20K user signups this week.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--QvN_BIWU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/38-milestone-20k-user-signups.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--QvN_BIWU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/38-milestone-20k-user-signups.png" alt="20K user signups"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>milestones</category>
    </item>
    <item>
      <title>CloudSkew featured at Cloud Community Days conference</title>
      <dc:creator>Mithun Shanbhag</dc:creator>
      <pubDate>Mon, 20 Jul 2020 20:33:04 +0000</pubDate>
      <link>https://dev.to/cloudskew/cloudskew-featured-at-cloud-community-days-conference-5f4e</link>
      <guid>https://dev.to/cloudskew/cloudskew-featured-at-cloud-community-days-conference-5f4e</guid>
      <description>&lt;p&gt;A big thanks to the &lt;a href="https://ccdays.konfhub.com/"&gt;Cloud Community Days&lt;/a&gt; conference for featuring CloudSkew. Here's the recording in which &lt;a href="https://twitter.com/mithunshanbhag"&gt;Mithun Shanbhag&lt;/a&gt; (CloudSkew creator) talks about the internals of &lt;a href="https://www.cloudskew.com"&gt;cloudskew.com&lt;/a&gt; and how it was built on top of Azure (skip to the &lt;a href="https://youtu.be/_dZwMidN9wY?t=13742"&gt;3:49:00 mark&lt;/a&gt; for the start of the talk).&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/_dZwMidN9wY"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>azure</category>
      <category>community</category>
    </item>
    <item>
      <title>CloudSkew featured on the Cloud Lunch &amp; Learn show</title>
      <dc:creator>Mithun Shanbhag</dc:creator>
      <pubDate>Mon, 20 Jul 2020 20:05:04 +0000</pubDate>
      <link>https://dev.to/cloudskew/cloudskew-appears-on-the-cloud-lunch-learn-show-1445</link>
      <guid>https://dev.to/cloudskew/cloudskew-appears-on-the-cloud-lunch-learn-show-1445</guid>
      <description>&lt;p&gt;A big thanks to the Cloud Lunch &amp;amp; Learn show for featuring CloudSkew last week. Here's the recording in which &lt;a href="https://twitter.com/mithunshanbhag"&gt;Mithun Shanbhag&lt;/a&gt; (CloudSkew creator) and &lt;a href="https://twitter.com/HmsBarona"&gt;Hugo Barona&lt;/a&gt; (host) talk about the internals of &lt;a href="https://www.cloudskew.com"&gt;cloudskew.com&lt;/a&gt; and how it was built on top of Azure.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/aZyPlXiXsqY"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Please do subscribe to Cloud Lunch &amp;amp; Learn's &lt;a href="https://www.youtube.com/channel/UCHZeZzSlTtmfgPozIq8J2Kw"&gt;youtube channel&lt;/a&gt; for great Azure content!&lt;/p&gt;

</description>
      <category>azure</category>
      <category>architecture</category>
      <category>podcast</category>
      <category>community</category>
    </item>
    <item>
      <title>How Cloudskew.com was built on Azure</title>
      <dc:creator>Mithun Shanbhag</dc:creator>
      <pubDate>Tue, 14 Jul 2020 04:47:17 +0000</pubDate>
      <link>https://dev.to/cloudskew/how-i-built-cloudskew-com-on-azure-2170</link>
      <guid>https://dev.to/cloudskew/how-i-built-cloudskew-com-on-azure-2170</guid>
      <description>&lt;h1&gt;
  
  
  CloudSkew Architecture
&lt;/h1&gt;

&lt;p&gt;CloudSkew is a free online diagram editor for sketching cloud architecture diagrams (&lt;a href="https://www.youtube.com/watch?v=d-lIrtaFUe0"&gt;see a quick demo video&lt;/a&gt;). Icons for AWS, Azure, GCP, Kubernetes, Alibaba Cloud, Oracle Cloud (OCI) etc are already preloaded in the app. All diagrams are securely saved in the cloud. Here are some &lt;a href="//./../docs/samples.md"&gt;sample diagrams&lt;/a&gt; created with CloudSkew. The full list of CloudSkew's features &amp;amp; capabilities can be seen &lt;a href="//../docs/features.md"&gt;here&lt;/a&gt;. Currently, the product is in public preview.&lt;/p&gt;

&lt;p&gt;In this document, we'll do a deep-dive on CloudSkew's building blocks while also discussing the lessons learnt, key decisions &amp;amp; trade offs made &lt;em&gt;(this living document will be frequently updated as the architecture evolves)&lt;/em&gt;. The diagram below represents the overall architecture of CloudSkew.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Foso-Ru5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/cloudskew-architecture.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Foso-Ru5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/cloudskew-architecture.png" alt="cloudskew architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;small&gt;&lt;b&gt;CloudSkew Architecture&lt;/b&gt;&lt;/small&gt;&lt;/p&gt;

&lt;p&gt;CloudSkew's infrastructure has been built on top of various Azure services - snapped together like lego blocks. Let's now take a look at the individual pieces.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This article is a part of the #FestiveTechCalendar2020 &lt;a href="https://festivetechcalendar.com/"&gt;(https://festivetechcalendar.com/)&lt;/a&gt;, #AppliedCloudStoriesContest &lt;a href="https://www.cloudstories.dev"&gt;(aka.ms/applied-cloud-stories)&lt;/a&gt; and #AzureDevStories &lt;a href="http://konf.me/ds"&gt;(http://konf.me/ds)&lt;/a&gt; initiatives.&lt;/p&gt;

&lt;p&gt;The video below recaps how CloudSkew was built. You can skip reading the rest of the article and watch this video instead! &lt;iframe width="710" height="399" src="https://www.youtube.com/embed/Q8xriCUFmaA"&gt;
&lt;/iframe&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Apps
&lt;/h2&gt;

&lt;p&gt;At it's core, CloudSkew's front-end consists of two web apps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="///README.md"&gt;The landing page&lt;/a&gt; is a static VuePress site, with all pages authored in markdown. The default VuePress theme is used without any customization, although we're loading some marketplace plugins for image zoom, google analytics, sitemap generation etc. All images on this site are loaded from a CDN. The choice of VuePress for SSG was mainly down to its simplicity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://app.cloudskew.com"&gt;The diagram editor&lt;/a&gt; is an Angular 8 SPA written in TypeScript &lt;em&gt;(more details on the internals of this app will be shared in future articles)&lt;/em&gt;. To access the app, users are required to login using their GitHub or LinkedIn credentials. This app too loads all its static assets from a CDN, while relying on the back-end web APIs for fetching dynamic content. The choice of Angular as the front-end framework was mainly driven by our familiarity with it from prior projects.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Web APIs
&lt;/h2&gt;

&lt;p&gt;The back-end consists of two web API apps, both authored using ASP.NET Core 3.1:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;The &lt;strong&gt;CloudSkew APIs&lt;/strong&gt; facilitates CRUD operations over diagrams, diagram templates and user profiles.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;strong&gt;DiagramHelper APIs&lt;/strong&gt; are required for printing or exporting (as PNG/JPG) diagrams. These APIs are isolated in a separate app since the memory footprint is higher causing the process to recycle more often.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using ASP.NET Core's &lt;a href="https://docs.microsoft.com/en-us/aspnet/core/fundamentals/middleware/?view=aspnetcore-3.1"&gt;middleware&lt;/a&gt;, we ensure that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JWT authentication is enforced. Use of &lt;a href="https://docs.microsoft.com/en-us/aspnet/core/security/authorization/policies?view=aspnetcore-3.1"&gt;policy-based authorization&lt;/a&gt; for RBAC ensures that claims mapping to user permissions are present in the JWT.&lt;/li&gt;
&lt;li&gt;Only the diagram editor (front-end app) can invoke these APIs (&lt;a href="https://docs.microsoft.com/en-us/aspnet/core/security/cors?view=aspnetcore-3.1"&gt;CORS settings&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Brotli &lt;a href="https://docs.microsoft.com/en-us/aspnet/core/performance/response-compression?view=aspnetcore-3.1"&gt;response compression&lt;/a&gt; is enabled for reducing payload sizes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The web APIs are stateless and operate under the assumption that they can be restarted/redeployed any time. No sticky sessions &amp;amp; affinities, no in-memory state, all state is persisted to DBs using &lt;a href="https://docs.microsoft.com/en-us/ef/core/"&gt;EF Core&lt;/a&gt; (an ORM).&lt;/p&gt;

&lt;p&gt;Separate DTO/REST and DBContext/SQL models are maintained for all entities, with &lt;a href="https://automapper.org/"&gt;AutoMapper&lt;/a&gt; rules being used for conversions between the two.&lt;/p&gt;

&lt;h2&gt;
  
  
  Identity, AuthN &amp;amp; AuthZ
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://auth0.com/"&gt;Auth0&lt;/a&gt; is used as the (OIDC compliant) identity platform for CloudSkew. Users can login via Github or LinkedIn; the handshake with these identity providers is managed by Auth0 itself. Using implicit flow, ID and access tokens (JWTs) are granted to the diagram editor app. The &lt;a href="https://auth0.com/docs/libraries/auth0js/v9"&gt;Auth0.JS SDK&lt;/a&gt; makes all this really trivial to implement. All calls to the back-end web APIs use the access token as the bearer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--N0qUVF8U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/auth0-social-connections.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--N0qUVF8U--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/auth0-social-connections.png" alt="auth0 social connections"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Auth0 creates &amp;amp; maintains the user profiles for all signed-up users. Authorization/RBAC is managed by assigning &lt;a href="https://auth0.com/docs/authorization/concepts/rbac"&gt;Auth0 roles&lt;/a&gt; to these user profiles. Each role contains a collections of permissions that can be assigned to the users (they show up as custom claims in the JWTs).&lt;/p&gt;

&lt;p&gt;Auth0 &lt;a href="https://auth0.com/docs/rules"&gt;rules&lt;/a&gt; are used to inject custom claims in the JWT and whitelist/blacklist users.&lt;/p&gt;

&lt;h2&gt;
  
  
  Databases
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/sql-database/sql-database-technical-overview"&gt;SQL Azure&lt;/a&gt; is used for persisting user data; primarily three entities: &lt;code&gt;Diagram&lt;/code&gt;, &lt;code&gt;DiagramTemplate&lt;/code&gt; and &lt;code&gt;UserProfile&lt;/code&gt;. User credentials are not stored in CloudSkew's database (that part is handled by Auth0). User contact details like emails are MD5 hashed.&lt;/p&gt;

&lt;p&gt;Because of CloudSkew's auto-save feature, updates to the &lt;code&gt;Diagram&lt;/code&gt; table happens very frequently. Some steps have been taken to optimize this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://rxjs-dev.firebaseapp.com/api/operators/debounceTime"&gt;Debouncing&lt;/a&gt; the auto-save requests from the diagram editor UI to the Web API.&lt;/li&gt;
&lt;li&gt;Use of a queue for load-leveling the update requests (see this section for details).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the preview version, the SQL Azure SKU being used in production is &lt;code&gt;Standard/S0 with 20 DTUs (single database)&lt;/code&gt;. Currently, the DB is only available in one region. Auto-failover groups &amp;amp; active geo-replication (read-replicas) are not being used at present.&lt;/p&gt;

&lt;p&gt;SQL Azure's &lt;a href="https://docs.microsoft.com/en-us/azure/sql-database/sql-database-automated-backups?tabs=single-database"&gt;built-in geo-redundant DB backups&lt;/a&gt; offer weekly full DB backups, differential DB backups every 12 hours and transaction log backups every 5 - 10 minutes. SQL Azure internally stores the backups in RA-GRS storage for 7 days. RTO is 12 hrs and RPO is 1 hr. Perhaps less than ideal, but we'll look to improve matters here once CloudSkew's usage grows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pCzHvqqR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/sql-azure-pitr-backups.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pCzHvqqR--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/sql-azure-pitr-backups.png" alt="sql azure pitr backups"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/cosmos-db/introduction"&gt;Azure CosmosDB&lt;/a&gt;'s usage is purely experimental at this point, mainly for the analysis of anonymized, read-only user data in &lt;a href="https://docs.microsoft.com/en-us/azure/cosmos-db/graph-introduction"&gt;graph format over gremlin APIs&lt;/a&gt; &lt;em&gt;(more details on this will be shared in a future article)&lt;/em&gt;. Technically speaking, this database can be removed without any impact to user-facing features.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hosting &amp;amp; Storage
&lt;/h2&gt;

&lt;p&gt;Two &lt;a href="https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-overview"&gt;Azure Storage Accounts&lt;/a&gt; are provisioned for hosting the front-end apps: landing page &amp;amp; diagram editor. The apps are served via the &lt;code&gt;$web&lt;/code&gt; blob containers for static sites.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--bj88YzjB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-storage-static-website.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--bj88YzjB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-storage-static-website.png" alt="azure storage static website"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two more storage accounts are provisioned for serving the static content (mostly icon SVGs) and user-uploaded images (PNG, JPG files) as blobs.&lt;/p&gt;

&lt;p&gt;Two &lt;a href="https://docs.microsoft.com/en-us/azure/app-service/containers/app-service-linux-intro"&gt;Azure App Services on Linux&lt;/a&gt; are also provisioned for hosting the containerized back-end web APIs. Both app services share the same &lt;a href="https://docs.microsoft.com/en-us/azure/app-service/overview-hosting-plans"&gt;App Service Plan&lt;/a&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For CloudSkew's preview version we're using the &lt;code&gt;B1 (100 ACU, 1.75 GB Mem)&lt;/code&gt; plan which unfortunately does not include automatic horizontal scale-outs (i.e. scale-outs have to be done manually).&lt;/li&gt;
&lt;li&gt;Managed Identity is enabled for both app services, required for accessing the Key Vault.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Always On&lt;/code&gt; settings have been enabled.&lt;/li&gt;
&lt;li&gt;An &lt;a href="https://docs.microsoft.com/en-in/azure/container-registry/container-registry-intro"&gt;Azure Container Registry&lt;/a&gt; is also provisioned. The deployment pipeline packages the API apps as docker images and pushes to the container registry. The app services pull from it (using webhook notifications).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1o02WoRS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-container-registry.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1o02WoRS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-container-registry.png" alt="azure container registry"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Caching &amp;amp; Compression
&lt;/h2&gt;

&lt;p&gt;An &lt;a href="https://docs.microsoft.com/en-us/azure/cdn/cdn-overview"&gt;Azure CDN profile&lt;/a&gt; is provisioned with four endpoints, the first two using the hosted front-end apps (landing page &amp;amp; diagram editor) as origins and the other two pointing to the storage accounts (for icon SVGs &amp;amp; user-uploaded images).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8_Z8HJQh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-cdn-profile-endpoints.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8_Z8HJQh--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-cdn-profile-endpoints.png" alt="azure cdn profile endpoints"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In addition to caching at global POPs, &lt;a href="https://docs.microsoft.com/en-us/azure/cdn/cdn-improve-performance"&gt;content compression at POPs&lt;/a&gt; is also enabled.&lt;/p&gt;

&lt;h2&gt;
  
  
  Subdomains &amp;amp; DNS records
&lt;/h2&gt;

&lt;p&gt;All CDN endpoints have &lt;code&gt;&amp;lt;subdomain&amp;gt;.cloudskew.com&lt;/code&gt; custom domain hostnames enabled on them. This is facilitated by using &lt;a href="https://docs.microsoft.com/en-in/azure/dns/dns-overview"&gt;Azure DNS&lt;/a&gt; to create CNAME records that map &lt;code&gt;&amp;lt;subdomain&amp;gt;.cloudskew.com&lt;/code&gt; to their CDN endpoint counterparts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JfdvQ6YN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-dns-cname-records.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JfdvQ6YN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-dns-cname-records.png" alt="azure dns cname records"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  HTTPS &amp;amp; TLS Certificates
&lt;/h2&gt;

&lt;p&gt;Custom domain HTTPS is enabled and the TLS certificates are &lt;a href="https://docs.microsoft.com/en-us/azure/cdn/cdn-custom-ssl?tabs=option-1-default-enable-https-with-a-cdn-managed-certificate"&gt;managed by Azure CDN itself&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;HTTP-to-HTTPS redirection is also enforced via &lt;a href="https://docs.microsoft.com/en-us/azure/cdn/cdn-standard-rules-engine"&gt;CDN rules&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Externalized Configuration &amp;amp; Self-Bootstrapping
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-in/azure/key-vault/general/overview"&gt;Azure Key Vault&lt;/a&gt; is used as a secure, external, central key-value store. This helps decouple back-end web API apps from their configuration settings &lt;em&gt;(passwords, connection strings, endpoint urls, IP addresses, hostnames etc)&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The web API apps have &lt;a href="https://docs.microsoft.com/en-us/azure/key-vault/general/managed-identity"&gt;managed identities&lt;/a&gt; which are RBAC'ed for Key Vault access.&lt;/p&gt;

&lt;p&gt;The web API apps self-bootstrap by reading their configuration settings from the Key Vault at startup. The handshake with the Key Vault is facilitated using the &lt;a href="https://docs.microsoft.com/en-us/aspnet/core/security/key-vault-configuration?view=aspnetcore-3.1"&gt;Key Vault Configuration Provider&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--nf862o0n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/key-vault-config-provider.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--nf862o0n--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/key-vault-config-provider.png" alt="azure key vault config provider"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Queue-Based Load Leveling
&lt;/h2&gt;

&lt;p&gt;Even after debouncing calls to the API, the volume of PUT (UPDATE) requests generated by auto-save feature causes the SQL Azure DB's &lt;a href="https://docs.microsoft.com/en-us/azure/sql-database/sql-database-service-tiers-dtu"&gt;DTU consumption&lt;/a&gt; to spike, resulting in service degradation. To smooth out this burst of requests, an &lt;a href="https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-messaging-overview"&gt;Azure Service bus&lt;/a&gt; is used as an intermediate buffer. Instead of writing directly to the DB, the web API instead queues up all PUT requests into the service bus; to be drained asynchronously later.&lt;/p&gt;

&lt;p&gt;An &lt;a href="https://docs.microsoft.com/en-us/azure/azure-functions/functions-overview"&gt;Azure Function app&lt;/a&gt; is responsible for serially dequeueing the brokered messages off the bus using the &lt;a href="https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-service-bus-trigger?tabs=csharp"&gt;service bus trigger&lt;/a&gt;. Once the function receives a &lt;a href="https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-performance-improvements?tabs=net-standard-sdk#receive-mode"&gt;peek-locked&lt;/a&gt; messages, it commits the PUT (UPDATE) to the SQL Azure DB. If the function fails to process any messages, the messages automatically gets pushed onto the service bus' &lt;a href="https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-dead-letter-queues"&gt;dead-letter queue&lt;/a&gt;. An Azure monitor alert is triggered when this happens.&lt;/p&gt;

&lt;p&gt;The Azure Function app shares the same app service plan as the back-end web APIs (i.e. uses the &lt;a href="https://docs.microsoft.com/en-us/azure/azure-functions/functions-scale#app-service-plan"&gt;dedicated app service plan&lt;/a&gt; instead of the regular consumption plan)&lt;/p&gt;

&lt;p&gt;Overall this &lt;a href="https://docs.microsoft.com/en-us/azure/architecture/patterns/queue-based-load-leveling"&gt;queue-based load-leveling pattern&lt;/a&gt; has helped plateau the load on the Sql Azure DB.&lt;/p&gt;

&lt;h2&gt;
  
  
  APM
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview"&gt;Application Insights SDK&lt;/a&gt; is used by the diagram editor (front-end &lt;a href="(https://devblogs.microsoft.com/premier-developer/angular-how-to-add-application-insights-to-an-angular-spa/)"&gt;Angular SPA&lt;/a&gt;) to get some user insights.&lt;/p&gt;

&lt;p&gt;E.g. We're interested in tracking the names of icons that the users couldn't find in the icon palette (via the icon search box). This helps us add these frequently searched icons into the palette later on.&lt;/p&gt;

&lt;p&gt;App Insight's &lt;a href="https://docs.microsoft.com/en-us/azure/azure-monitor/app/api-custom-events-metrics"&gt;custom events&lt;/a&gt; help us log such information. &lt;a href="https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/samples"&gt;KQL queries&lt;/a&gt; are used to mine the aggregated data.&lt;/p&gt;

&lt;p&gt;The App Insight SDK is also used for &lt;a href="https://docs.microsoft.com/en-us/azure/azure-monitor/app/api-custom-events-metrics#tracktrace"&gt;logging traces&lt;/a&gt;. The log verbosity is configured via app config (externalized config using Azure Key Vault).&lt;/p&gt;

&lt;h2&gt;
  
  
  Infrastructure Monitoring
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/azure-portal/azure-portal-dashboards"&gt;Azure Portal Dashboards&lt;/a&gt; are used to visualize metrics from the various azure resources deployed by CloudSkew.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--8qixpgA8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-portal-dashboard.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--8qixpgA8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-portal-dashboard.png" alt="azure portal dashboards"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Incident Management
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/azure-monitor/overview"&gt;Azure Monitor's&lt;/a&gt; &lt;a href="https://docs.microsoft.com/en-us/azure/azure-monitor/platform/alerts-metric-overview"&gt;metric-based alerts&lt;/a&gt; are being used to get incident notifications over email &amp;amp; slack. Some examples of conditions that trigger alerts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[Sev 0] 5xx errors in the web APIs required for printing/exporting diagrams.&lt;/li&gt;
&lt;li&gt;[Sev 1] 5xx errors in other CloudSkew web APIs&lt;/li&gt;
&lt;li&gt;[Sev 1] Any messages in the Service Bus' dead-letter queue.&lt;/li&gt;
&lt;li&gt;[Sev 2] Response time of web APIs crossing specified thresholds.&lt;/li&gt;
&lt;li&gt;[Sev 2] Spikes in DTU consumption in SQL Azure DBs.&lt;/li&gt;
&lt;li&gt;[Sev 3] Spikes in E2E latency for blob storage requests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Metrics are evaluated/sampled at 15 mins frequency with 1 hr aggregation windows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--x0i9Ws2p--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-monitor-metric-alerts.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--x0i9Ws2p--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-monitor-metric-alerts.png" alt="azure monitor metric alerts"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Currently, 100% of the incoming metrics are sampled. Over time, as usage grows, we'll start filtering out outliers at P99.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Resource Provisioning
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.terraform.io/docs/index.html"&gt;Terraform&lt;/a&gt; scripts are used to provision all of the Azure resources &amp;amp; services shown in the architecture diagram (storage accounts, app services, CDN, DNS zone, container registry, functions, sql server, service bus etc). Use of terraform allows us to easily achieve parity in dev, test &amp;amp; prod environments. Although these three environments are mostly identical clones of each other, there are some minor differences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Across the dev, test and prod environments, the app configuration data stored in the Key Vaults will have the same key names but different values. This helps apps to bootstrap accordingly.&lt;/li&gt;
&lt;li&gt;The dev environments are ephemeral, created on demand and are disposed when not in use.&lt;/li&gt;
&lt;li&gt;For cost reasons, smaller resource SKUs are used in dev &amp;amp; test environments (e.g. Basic/B 5 DTUs SQL Azure in test environment as compared to Standard/S0 20 DTU in production).
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;The Auth0 tenant has been set up manually since there are no terraform providers for it. However it looks like it might be possible to automate the provisioning using &lt;a href="https://auth0.com/docs/extensions/deploy-cli"&gt;Auth0's Deploy CLI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;CloudSkew's provisioning script are being migrated from terraform to &lt;a href="https://www.pulumi.com/"&gt;pulumi&lt;/a&gt;. This article will be updated as soon as the migration is complete.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Continuous Integration
&lt;/h2&gt;

&lt;p&gt;The source code is split across multiple private &lt;a href="https://docs.microsoft.com/en-us/azure/devops/repos/get-started/what-is-repos?view=azure-devops"&gt;Azure Repos&lt;/a&gt;. The &lt;em&gt;"one repository per app"&lt;/em&gt; rule of thumb is enforced here. An app is deployed to dev, test &amp;amp; prod environments from the same repo.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Q5MdgCCo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-repos-multiple-repos.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Q5MdgCCo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-repos-multiple-repos.png" alt="multiple azure repos"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feature development &amp;amp; bug fixes happen in private/feature branches which are ultimately merged into master branches via pull requests.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/get-started/what-is-azure-pipelines?view=azure-devops"&gt;Azure Pipelines&lt;/a&gt; are used for continuous integration: checkins are built, unit tested, packaged and deployed to the test environment. CI pipelines are automatically triggered both on pull request creation as well as checkins to master branches.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TExU7cSS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-pipelines-bdt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TExU7cSS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-pipelines-bdt.png" alt="azure pipelines continuous integration"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The pipelines are &lt;a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/yaml-schema?view=azure-devops&amp;amp;tabs=schema%2Cparameter-schema"&gt;authored in YAML&lt;/a&gt; and executed on &lt;a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/agents/hosted?view=azure-devops"&gt;Microsoft-hosted Ubuntu agents&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Azure pipelines' &lt;a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/tasks/?view=azure-devops"&gt;built-in tasks&lt;/a&gt; are heavily leveraged for deploying changes to azure app services, functions, storage accounts, container registry etc. Access to azure resource is authorized via &lt;a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/library/service-endpoints?view=azure-devops&amp;amp;tabs=yaml"&gt;service connections&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Bx4JbxzK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-pipelines-continuous-integration.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Bx4JbxzK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-pipelines-continuous-integration.png" alt="azure pipelines continuous integration"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment &amp;amp; Release
&lt;/h2&gt;

&lt;p&gt;The deployment &amp;amp; release process is very simple at moment (blue-green deployments, canary deployments and feature flags are not being used). Checkins that pass the CI process become eligible for release to production environment.&lt;/p&gt;

&lt;p&gt;Azure Pipelines &lt;a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/process/deployment-jobs?view=azure-devops"&gt;deployment jobs&lt;/a&gt; are used to target the releases to production environment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EgSpeDA_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-pipelines-deployment-jobs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EgSpeDA_--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-pipelines-deployment-jobs.png" alt="azure pipelines manual approval"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/devops/pipelines/process/approvals?view=azure-devops&amp;amp;tabs=check-pass"&gt;Manual approvals&lt;/a&gt; are used to authorize the releases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1jfymk2g--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-pipelines-manual-approval.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1jfymk2g--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://assets.cloudskew.com/assets/pages/cloudskew-architecture/azure-pipelines-manual-approval.png" alt="azure pipelines manual approval"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Future Architectural Changes
&lt;/h2&gt;

&lt;p&gt;As more &lt;a href="//./../docs/features.md#planned-features"&gt;features will be added&lt;/a&gt; and as usage grows, some architectural enhancements will have to be considered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HA with multi-regional deployments and using &lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-overview"&gt;Traffic Manager&lt;/a&gt; for routing traffic.&lt;/li&gt;
&lt;li&gt;Move to a higher App Service SKU to avail of slot swapping, horizontal auto-scaling etc.&lt;/li&gt;
&lt;li&gt;Use of caching in the back-end (&lt;a href="https://azure.microsoft.com/en-in/services/cache/"&gt;Azure Cache for Redis&lt;/a&gt;, ASP.NET's &lt;a href="https://docs.microsoft.com/en-us/aspnet/core/performance/caching/memory?view=aspnetcore-3.1"&gt;IMemoryCache&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Changes to the deployment &amp;amp; release model with blue-green deployments and adoption of feature flags etc.&lt;/li&gt;
&lt;li&gt;PowerBI/Grafana dashboard for tracking business KPIs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Again, any of these enhancements will ultimately be need-driven.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Notes
&lt;/h2&gt;

&lt;p&gt;CloudSkew is in very early stages of development and there are some simple thumb rules it abides by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Preferring PaaS/serverless over IaaS&lt;/strong&gt;: Pay as you go, no server management overhead &lt;em&gt;(aside: this is also why K8s clusters are not in the picture yet)&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preferring microservices over monoliths&lt;/strong&gt;: Individual lego blocks can be independently deployed &amp;amp; scaled up/out.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Always keeping the infrastructure stable&lt;/strong&gt;: Everything infra-related is automated: from provisioning to scaling to monitoring. An "it just works" infra helps maintain the core focus on user-facing features.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Releasing Frequently&lt;/strong&gt;: The goal is to rapidly go from idea -&amp;gt; development -&amp;gt; deployment -&amp;gt; release. Having ultra-simple CI, deployment &amp;amp; release processes go a long way in helping achieve that.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No premature optimization&lt;/strong&gt;: All changes for making things more "efficient" is done just-in-time and has to be need-driven &lt;em&gt;(e.g: Redis cache is currently not required at the back-end since API response times are within acceptable thresholds)&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When CloudSkew reaches critical mass in the future, this playbook will of course have to be modified.&lt;/p&gt;

&lt;p&gt;Please feel free to &lt;a href="//mailto:support@cloudskew.com"&gt;email us&lt;/a&gt; in case you have any questions, comments or suggestions regarding this article. Happy Diagramming!&lt;/p&gt;

</description>
      <category>azure</category>
      <category>architecture</category>
      <category>microservices</category>
      <category>serverless</category>
    </item>
    <item>
      <title>High Availability in Azure: App Service, Function Apps</title>
      <dc:creator>Mithun Shanbhag</dc:creator>
      <pubDate>Fri, 10 Jul 2020 21:00:43 +0000</pubDate>
      <link>https://dev.to/cloudskew/high-availability-in-azure-app-service-function-apps-149f</link>
      <guid>https://dev.to/cloudskew/high-availability-in-azure-app-service-function-apps-149f</guid>
      <description>&lt;h2&gt;
  
  
  Azure App Service Apps (web apps)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9qVYpEA8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/13-azure-app-service.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9qVYpEA8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/13-azure-app-service.png" alt="azure storage account" width="205" height="205"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An &lt;a href="https://docs.microsoft.com/en-us/azure/app-service/overview-hosting-plans"&gt;Azure App Service Plan&lt;/a&gt; is pinned to a specific &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#region"&gt;Azure Region&lt;/a&gt;. Any &lt;a href="https://docs.microsoft.com/en-us/azure/app-service/overview"&gt;App Service Apps&lt;/a&gt; created in the App Service Plan will be provisioned in that same region. If your app needs additional redundancies in other regions or &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#geography"&gt;geographies&lt;/a&gt;, you'll have to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Provision them yourself (you'll need to create new App Service Plans in those regions, if they don't already exist).&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-traffic-management-2af0"&gt;Azure Traffic Manager&lt;/a&gt; to route traffic to all available redundancies (you can only specify one App Service endpoint per region in a Traffic Manager profile). &lt;a href="https://docs.microsoft.com/en-us/azure/app-service/web-sites-traffic-manager#app-service-and-traffic-manager-profiles"&gt;More details here&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TBu_ycE1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/14-azure-app-service-redundancy.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TBu_ycE1--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/14-azure-app-service-redundancy.jpg" alt="azure app service redundancy" width="723" height="464"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://azure.microsoft.com/en-in/support/legal/sla/app-service/v1_4/"&gt;SLA for Azure App Services&lt;/a&gt; guarantee a 99.95% uptime for each regional deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Azure Function Apps
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XPeD-lPi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/15-azure-functions.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XPeD-lPi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/15-azure-functions.png" alt="azure functions" width="205" height="205"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/azure-functions/functions-overview"&gt;Azure Function Apps&lt;/a&gt; too have regional deployments. If you're using the &lt;a href="https://docs.microsoft.com/en-us/azure/azure-functions/functions-scale#consumption-plan"&gt;consumption plan&lt;/a&gt;, then you explicitly specify the region. If on the &lt;a href="https://docs.microsoft.com/en-us/azure/azure-functions/functions-scale#app-service-plan"&gt;App Service Plan&lt;/a&gt;, then the region is the same as that of the App Service Plan.&lt;/p&gt;

&lt;p&gt;Similar to App Services above, any additional redundancies will have to be explicitly created and traffic to these will have to be routed via Azure Traffic Manager.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/app-service-web-app/multi-region"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2A8sTkVA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/16-azure-functions-redundancy.jpg" alt="azure functions redundancy" width="700" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://azure.microsoft.com/en-us/support/legal/sla/functions/v1_1/"&gt;SLA for Azure Functions&lt;/a&gt; guarantee a 99.95% uptime for each regional deployment (for both app service plan and consumption plan).&lt;/p&gt;

&lt;h2&gt;
  
  
  Miscellaneous
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Horizontally scaled instances
&lt;/h3&gt;

&lt;p&gt;As &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#what-about-vm-scale-sets"&gt;I've previously mentioned&lt;/a&gt;, horizontal auto-scaling exists to address performance concerns rather than high-availability concerns.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;App Service Apps&lt;/strong&gt;&lt;/em&gt;: When horizontal auto-scaling is enabled on a parent App Service Plan, additional instances are created, and each instance hosts all App Service Apps contained in the parent App Service Plan. All instances are created in the same WebSpace. The App Service's integrated load-balancer (non-accessible) manages the traffic. Note that all scaled out instances of an app will still have the same endpoint URL.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Function Apps&lt;/strong&gt;&lt;/em&gt;: Based on a combination of factors (trigger types, rate of incoming requests, language/runtime and perhaps the &lt;a href="https://github.com/Azure/azure-functions-host/wiki/Host-Health-Monitor"&gt;host health-monitor stats&lt;/a&gt;), the &lt;a href="https://docs.microsoft.com/en-in/azure/azure-functions/functions-scale#runtime-scaling"&gt;scale controller&lt;/a&gt; will create additional instances of an Azure Function App (max limit of 200 instances). Note that the scaling unit is the Function App (host) itself and not individual functions.  &lt;/p&gt;

&lt;p&gt;Bonus reading:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Read more about the &lt;a href="https://docs.microsoft.com/en-in/azure/azure-subscription-service-limits#app-service-limits"&gt;scaling limits imposed on App Service Apps&lt;/a&gt; based on &lt;a href="https://azure.microsoft.com/en-us/pricing/details/app-service/windows/"&gt;pricing tiers&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Read more about &lt;a href="https://stackoverflow.com/a/49651618"&gt;ARR affinity&lt;/a&gt; and &lt;a href="https://azure.microsoft.com/en-in/blog/disabling-arrs-instance-affinity-in-windows-azure-web-sites/"&gt;ARRAffinity cookies&lt;/a&gt; for scaled out instances.&lt;/li&gt;
&lt;li&gt;You can now enable &lt;a href="https://docs.microsoft.com/en-us/azure/app-service/manage-scale-per-app"&gt;per-app horizontal scaling&lt;/a&gt;. More details &lt;a href="https://markheath.net/post/per-app-scaling-app-service"&gt;in this blog post&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Read more about the &lt;a href="https://docs.microsoft.com/en-in/azure/azure-functions/functions-scale#understanding-scaling-behaviors"&gt;scaling behavior&lt;/a&gt; of Function Apps.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The "Always On" setting
&lt;/h3&gt;

&lt;p&gt;If you have an App Service App or a Function App associated with an App Service Plan in the production or isolated tier, then you should consider enabling the "always on" setting. This ensures that your app is always running and never unloaded (default behavior is to deactivate/unload idle apps to conserve resources).&lt;/p&gt;

&lt;p&gt;Notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This setting is not available for App Service Apps in dev/test tier.&lt;/li&gt;
&lt;li&gt;Idle Function Apps in the consumption plan will be subject to &lt;a href="https://blogs.msdn.microsoft.com/appserviceteam/2018/02/07/understanding-serverless-cold-start/"&gt;cold start latency&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--UxyW5wCW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/17-azure-app-service-always-on.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--UxyW5wCW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/17-azure-app-service-always-on.jpg" alt="azure app service always on" width="434" height="317"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloning and Moving App Service Apps
&lt;/h3&gt;

&lt;p&gt;Using Azure Powershell, it is possible to &lt;a href="https://docs.microsoft.com/en-us/azure/app-service/app-service-web-app-cloning"&gt;create clones of existing App Service App&lt;/a&gt; within the same region or in a new region. Please note that there are some &lt;a href="https://docs.microsoft.com/en-us/azure/app-service/app-service-web-app-cloning#current-restrictions"&gt;caveats/restrictions&lt;/a&gt; though.&lt;/p&gt;

&lt;p&gt;You can also &lt;a href="https://docs.microsoft.com/en-us/azure/app-service/app-service-plan-manage#move-an-app-to-another-app-service-plan"&gt;move an App Service App to another App Service plan&lt;/a&gt; as long as both the source plan and the destination plan are within the same WebSpace.&lt;/p&gt;

&lt;p&gt;FWIW, I've never tried this out myself.&lt;/p&gt;

&lt;p&gt;And yes, like any other Azure Resource, App Service Plans and App Service Apps can be moved between resource groups.&lt;/p&gt;

&lt;h3&gt;
  
  
  WebSpaces
&lt;/h3&gt;

&lt;p&gt;WebSpaces are units of deployment for Azure App Service Plans. An &lt;a href="https://docs.microsoft.com/en-us/azure/app-service/app-service-plan-manage#move-an-app-to-another-app-service-plan"&gt;App Service Plan's WebSpace&lt;/a&gt; is identified by the combination of its resource group and the region in its deployed. Any additional App Service Plan deployments to the same resource group + region combination gets assigned to the same WebSpace. See &lt;a href="https://github.com/projectkudu/kudu/wiki/ResourceGroup-VS.-WebSpace"&gt;more details here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To see the WebSpace associated with an App Service App or App Service Plan, navigate to that resource in the Azure Resource Explorer (via the &lt;a href="https://portal.azure.com/#blade/HubsExtension/ArmExplorerBlade"&gt;Azure Portal&lt;/a&gt; or via the &lt;a href="https://resources.azure.com/"&gt;website&lt;/a&gt;) and see the &lt;code&gt;WebSpace&lt;/code&gt; and &lt;code&gt;SelfLink&lt;/code&gt; properties.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>highavailability</category>
    </item>
    <item>
      <title>High Availability in Azure: Traffic Management</title>
      <dc:creator>Mithun Shanbhag</dc:creator>
      <pubDate>Fri, 10 Jul 2020 20:59:53 +0000</pubDate>
      <link>https://dev.to/cloudskew/high-availability-in-azure-traffic-management-2af0</link>
      <guid>https://dev.to/cloudskew/high-availability-in-azure-traffic-management-2af0</guid>
      <description>&lt;h2&gt;
  
  
  Azure Traffic Manager
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WfH202iH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/08-azure-traffic-manager.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WfH202iH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/08-azure-traffic-manager.png" alt="azure storage account" width="256" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/"&gt;Azure Traffic Manager&lt;/a&gt; routes a client's DNS query to an appropriate service endpoint, selected based on a combination of factors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;traffic routing methods (user selected)&lt;/li&gt;
&lt;li&gt;health of the endpoints (user configured probing/monitoring rules)&lt;/li&gt;
&lt;li&gt;latency tables (internally maintained map of ip address ranges to regions)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-how-it-works#how-clients-connect-using-traffic-manager"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--V3iJ29v2--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/10-azure-traffic-manager-dns.jpg" alt="azure traffic manager internals" width="743" height="563"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Some scenarios that can be addressed with Azure Traffic Manager are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;always routing to primary endpoint (with failover to secondary when primary endpoint's health degrades).&lt;/li&gt;
&lt;li&gt;always routing to endpoint with lowest latency.&lt;/li&gt;
&lt;li&gt;always routing to specific regional endpoint for data sovereignty compliance.&lt;/li&gt;
&lt;li&gt;enabling blue/green deployments with weighted routing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;a href="https://azure.microsoft.com/en-in/support/legal/sla/traffic-manager/v1_0/"&gt;Azure Traffic Manager SLA&lt;/a&gt; is 99.99%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things it is not (or does not do)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;not a gateway or a proxy. Traffic between the client and the service endpoint does not pass through the traffic manager. Once the traffic manager points a client to a service endpoint, the client communicates with the endpoint directly.&lt;/li&gt;
&lt;li&gt;not a layer-7 (application level) solution.&lt;/li&gt;
&lt;li&gt;not a DNS server.&lt;/li&gt;
&lt;li&gt;not a WAF.&lt;/li&gt;
&lt;li&gt;does not offer TLS termination / SSL offload.&lt;/li&gt;
&lt;li&gt;does not offer sticky sessions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The holy trinity
&lt;/h2&gt;

&lt;p&gt;Azure Traffic Manager is used in conjunction with &lt;a href="https://docs.microsoft.com/en-us/azure/application-gateway/"&gt;Azure Application Gateways&lt;/a&gt; and &lt;a href="https://docs.microsoft.com/en-us/azure/load-balancer/"&gt;Azure Load Balancers&lt;/a&gt;. Here is a &lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-load-balancing-azure"&gt;nice article&lt;/a&gt; that explains how the trio complement each other.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-load-balancing-azure#scenario"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--2dIYtYZS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/09-azure-load-balancing-options.jpg" alt="azure load balancing options" width="798" height="490"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Traffic routing methods
&lt;/h2&gt;

&lt;p&gt;The official docs capture all the &lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-routing-methods"&gt;traffic routing methods&lt;/a&gt; in great detail. However let me provide a quick recap below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pS3ADV3C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/11-azure-traffic-manager-routing-methods.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pS3ADV3C--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/11-azure-traffic-manager-routing-methods.jpg" alt="azure traffic manager routing methods" width="585" height="326"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  performance routing
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;(&lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-routing-methods#performance-traffic-routing-method"&gt;official docs&lt;/a&gt; | &lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/tutorial-traffic-manager-improve-website-response"&gt;tutorial&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Use this when you need to route traffic to a service endpoint with the lowest network latency (as measured from the client IP address).&lt;/p&gt;

&lt;p&gt;The Azure Traffic Manager maintains an internal "latency table", that maps the latencies of IP address ranges to various Azure Regions. Upon an incoming recursive DNS request, it looks up the client's IP address and detects the IP address range that it falls under. For that address range, it picks up an available service endpoint from an Azure region with the lowest possible latency. If multiple service endpoints are detected within the same Azure region, then the Azure Traffic Manager distributes traffic evenly across them.&lt;/p&gt;

&lt;h3&gt;
  
  
  priority routing
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;(&lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-routing-methods#priority-traffic-routing-method"&gt;official docs&lt;/a&gt; | &lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-configure-priority-routing-method"&gt;tutorial&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Use this when you want to route all traffic to a primary endpoint (with a secondary on standby).&lt;/p&gt;

&lt;p&gt;All service endpoints are assigned a priority number (value between 1 and 1000 with 1 being highest priority and 1000 being lowest). The primary gets assigned the highest priority (i.e. lowest number) and as a result all traffic gets routed to it. If the primary's health degrades, all traffic gets routed to the secondary, which has the next highest priority. Manual fail-overs can be initiated by bumping the secondary to higher priority.&lt;/p&gt;

&lt;h3&gt;
  
  
  weighted routing
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;(&lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-routing-methods#weighted-traffic-routing-method"&gt;official docs&lt;/a&gt; | &lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/tutorial-traffic-manager-weighted-endpoint-routing"&gt;tutorial&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Use this when you need to do staggered roll-outs, blue/green deployments.&lt;/p&gt;

&lt;p&gt;All service endpoints are assigned a weight (value between 1 and 1000, 1 being lowest weight and 1000 being highest). The traffic manager will attempt to route traffic to available service endpoints based on weighted priorities.&lt;/p&gt;

&lt;p&gt;Note: Weighted routing is different from priority routing mentioned above. In priority routing, only the highest priority endpoint is selected and others are ignored (until the highest priority endpoint's health degrades). With weighted routing, the traffic manager does route traffic to all endpoints, but uses the assigned weights to choose a specific endpoint on each incoming request.&lt;/p&gt;

&lt;h3&gt;
  
  
  geographic routing
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;(&lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-routing-methods#geographic-traffic-routing-method"&gt;official docs&lt;/a&gt; | &lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-configure-geographic-routing-method"&gt;tutorial&lt;/a&gt; | &lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-faqs#traffic-manager-geographic-traffic-routing-method"&gt;faqs&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Use this when you need to geo-fence your users to specific regions/geographies (for data sovereignty reasons etc).&lt;/p&gt;

&lt;p&gt;Per configuration, client requests will get serviced by endpoints from the specified region (this may or may not be the endpoint with lowest latency). Regional endpoints can be assigned at the following granularities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;world (highest granularity)&lt;/li&gt;
&lt;li&gt;regional grouping (roughly the same as &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#geography"&gt;Azure Geographies&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;country&lt;/li&gt;
&lt;li&gt;state (lowest granularity, only available for USA, Canada and Australia as of the time of writing this blog post).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lookup always starts from the lowest granularity goes to highest granularity and first match found is returned.&lt;/p&gt;

&lt;h3&gt;
  
  
  subnet routing
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;(&lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-routing-methods#subnet-traffic-routing-method"&gt;official docs&lt;/a&gt; | &lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/tutorial-traffic-manager-subnet-routing"&gt;tutorial&lt;/a&gt; | &lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-faqs#traffic-manager-subnet-traffic-routing-method"&gt;faqs&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Use this when you need to map specific client IP address ranges to specific service endpoints.&lt;/p&gt;

&lt;h3&gt;
  
  
  multivalue routing
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;(&lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-routing-methods#multivalue-traffic-routing-method"&gt;official docs&lt;/a&gt; | &lt;a href="https://docs.microsoft.com/en-us/azure/traffic-manager/traffic-manager-faqs#traffic-manager-multivalue-traffic-routing-method"&gt;faqs&lt;/a&gt;)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Just mentioning it for completeness sake; I haven't actually used it ever.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>highavailability</category>
    </item>
    <item>
      <title>High Availability in Azure: Storage Redundancies</title>
      <dc:creator>Mithun Shanbhag</dc:creator>
      <pubDate>Fri, 10 Jul 2020 20:58:05 +0000</pubDate>
      <link>https://dev.to/cloudskew/high-availability-in-azure-storage-redundancies-3n9f</link>
      <guid>https://dev.to/cloudskew/high-availability-in-azure-storage-redundancies-3n9f</guid>
      <description>&lt;h2&gt;
  
  
  Azure Storage Account
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xkaAvi7v--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/05-azure-storage-account.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xkaAvi7v--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/05-azure-storage-account.jpg" alt="azure storage account" width="205" height="205"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In Azure, the following entities are backed by Azure storage accounts: &lt;a href="https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-overview"&gt;blobs&lt;/a&gt;, &lt;a href="https://docs.microsoft.com/en-us/azure/storage/files/storage-files-introduction"&gt;file shares&lt;/a&gt;, &lt;a href="https://docs.microsoft.com/en-us/azure/storage/queues/storage-queues-introduction"&gt;queues&lt;/a&gt;, &lt;a href="https://docs.microsoft.com/en-us/azure/storage/tables/table-storage-overview"&gt;NoSQL table storages&lt;/a&gt;, &lt;a href="https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction"&gt;Data Lake Storage (gen2)&lt;/a&gt; and unmanaged disks. In this blog post, we'll go over the &lt;a href="https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy"&gt;various redundancy options&lt;/a&gt; available for these storage accounts. We'll compare &amp;amp; contrast them based on the following parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Replication latency: How soon before all replicas are in full-sync?&lt;/li&gt;
&lt;li&gt;Disaster scenarios: Are you looking at partial data loss or fully unrecoverable data? How easy (or difficult) is it to get back on track once things have hit rock bottom?&lt;/li&gt;
&lt;li&gt;SLAs: How many 9s?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hopefully this blog post will serve as a cheat-sheet and help you choose the right Azure storage redundancy options for your use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  LRS (locally-redundant storage)
&lt;/h2&gt;

&lt;p&gt;With &lt;a href="https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy-lrs?toc=%2fazure%2fstorage%2fblobs%2ftoc.json"&gt;LRS&lt;/a&gt;, your data is replicated thrice across multiple &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#fault-domain-physical-server-rack"&gt;fault domains&lt;/a&gt; &amp;amp; &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#update-domain"&gt;update domains&lt;/a&gt; within a single storage scale unit (all within a single datacenter). Note that all three replicas are addressed by a single endpoint (i.e. you can't target individual replicas for read/write operations).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replication latency&lt;/strong&gt;: No replication latency, data is synchronously written to all three replicas on every write request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disaster scenarios&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;disaster type&lt;/th&gt;
&lt;th&gt;service interruption?&lt;/th&gt;
&lt;th&gt;data loss?&lt;/th&gt;
&lt;th&gt;recovery possible?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;hardware failure in &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#fault-domain-physical-server-rack"&gt;physical rack&lt;/a&gt;/node&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;td&gt;NO&lt;sup&gt;1&lt;/sup&gt;
&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#datacenter"&gt;datacenter&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;YES&lt;/td&gt;
&lt;td&gt;YES&lt;/td&gt;
&lt;td&gt;NO&lt;sup&gt;2&lt;/sup&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#availability-zone"&gt;availability zone&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#region"&gt;regional&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#geography"&gt;geographic&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;worldwide disaster&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;Since the replicas are spread across multiple fault domains.&lt;/li&gt;
&lt;li&gt;Assuming all three replicas within the storage scale unit are affected, your data is permanently lost &amp;amp; unrecoverable.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;SLAs&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;object storage &amp;gt;= 99.999999999% (11 nines)&lt;/li&gt;
&lt;li&gt;read requests (hot tier) &amp;gt;= 99.9% (3 nines)&lt;/li&gt;
&lt;li&gt;read requests (cool tier) &amp;gt;= 99% (2 nines)&lt;/li&gt;
&lt;li&gt;write requests (hot tier) &amp;gt;= 99.9% (3 nines)&lt;/li&gt;
&lt;li&gt;write requests (cool tier) &amp;gt;= 99% (2 nines)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  ZRS (zone-redundant storage)
&lt;/h2&gt;

&lt;p&gt;With &lt;a href="https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy-zrs?toc=%2fazure%2fstorage%2fblobs%2ftoc.json"&gt;ZRS&lt;/a&gt;, your data is replicated across three availability zones within the same region (please note that currently not all regions support availability zones). As in the earlier case with LRS, all three replicas are addressed by a single endpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replication latency&lt;/strong&gt;: Very low latency, data is synchronously written to all three replicas on every write request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disaster scenarios&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;disaster type&lt;/th&gt;
&lt;th&gt;service interruption?&lt;/th&gt;
&lt;th&gt;data loss?&lt;/th&gt;
&lt;th&gt;recovery possible?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;hardware failure in &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#fault-domain-physical-server-rack"&gt;physical rack&lt;/a&gt;/node&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;td&gt;NO&lt;sup&gt;1&lt;/sup&gt;
&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#datacenter"&gt;datacenter&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#availability-zone"&gt;availability zone&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;YES&lt;sup&gt;2&lt;/sup&gt;
&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#region"&gt;regional&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;YES&lt;/td&gt;
&lt;td&gt;YES&lt;/td&gt;
&lt;td&gt;NO&lt;sup&gt;3&lt;/sup&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#geography"&gt;geographic&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;worldwide disaster&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;Only one replica will be affected, since the replicas are spread across different availability zones.&lt;/li&gt;
&lt;li&gt;Temporary service interruption until Azure finishes DNS updates (not entirely sure how long these updates take, &lt;a href="https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy-zrs?toc=%2fazure%2fstorage%2fblobs%2ftoc.json#what-happens-when-a-zone-becomes-unavailable"&gt;the official docs&lt;/a&gt; do not mention this). To mitigate this, best to use &lt;a href="https://docs.microsoft.com/en-us/azure/architecture/best-practices/transient-faults"&gt;transient fault-handling patterns&lt;/a&gt; (&lt;a href="https://docs.microsoft.com/en-us/azure/architecture/patterns/retry"&gt;retries&lt;/a&gt; with &lt;a href="https://docs.microsoft.com/en-us/dotnet/api/microsoft.windowsazure.storage.retrypolicies.iextendedretrypolicy?view=azure-dotnet"&gt;back-offs&lt;/a&gt; and &lt;a href="https://docs.microsoft.com/en-us/azure/architecture/patterns/circuit-breaker"&gt;circuit breakers&lt;/a&gt;) for all reads/writes on the storage account. More details can be found &lt;a href="https://docs.microsoft.com/en-us/azure/architecture/best-practices/retry-service-specific#azure-storage"&gt;here&lt;/a&gt; and &lt;a href="https://docs.microsoft.com/en-us/azure/storage/common/storage-designing-ha-apps-with-ragrs?toc=%2fazure%2fstorage%2fblobs%2ftoc.json#handling-retries"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Assuming all three replicas across the availability zones are affected, your data is permanently lost &amp;amp; unrecoverable.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;SLAs&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;object storage &amp;gt;= 99.9999999999% (12 nines)&lt;/li&gt;
&lt;li&gt;read requests (hot tier) &amp;gt;= 99.9% (3 nines)&lt;/li&gt;
&lt;li&gt;read requests (cool tier) &amp;gt;= 99% (2 nines)&lt;/li&gt;
&lt;li&gt;write requests (hot tier) &amp;gt;= 99.9% (3 nines)&lt;/li&gt;
&lt;li&gt;write requests (cool tier) &amp;gt;= 99% (2 nines)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  GRS (geo-redundant storage)
&lt;/h2&gt;

&lt;p&gt;With &lt;a href="https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy-grs?toc=%2fazure%2fstorage%2fblobs%2ftoc.json"&gt;GRS&lt;/a&gt;, your data is replicated across two &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#paired-regions"&gt;paired-regions&lt;/a&gt; (within the same Azure geography) in a primary region + secondary region setup. This ensures that one regional replica will be available in the event of a regional disaster.&lt;/p&gt;

&lt;p&gt;The primary region &amp;amp; the secondary regions are addressed by separate endpoints. The secondary endpoint is generally inaccessible. However in case of a &lt;a href="https://docs.microsoft.com/en-us/azure/storage/common/storage-disaster-recovery-guidance?toc=%2fazure%2fstorage%2fblobs%2ftoc.json#understand-the-account-failover-process"&gt;fail-over&lt;/a&gt;, the secondary is promoted to primary and read + write access is enabled for this endpoint. Fail-overs are automatically initiated by Azure in the event of a regional disaster. Azure is also introducing &lt;a href="https://docs.microsoft.com/en-us/azure/storage/common/storage-disaster-recovery-guidance?toc=%2fazure%2fstorage%2fblobs%2ftoc.json#initiate-an-account-failover"&gt;user-initiated fail-overs&lt;/a&gt;, which is currently in preview mode as of the time of writing this post.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: Both GRS (geo-redundant storage) and RA-GRS (read-access geo-redundant storage) are misnomers. They don't create redundant copies across Azure geographies, only across paired-regions within the same Azure geography.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HwC2XC4G--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/06-azure-storage-grs.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HwC2XC4G--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/06-azure-storage-grs.jpg" alt="azure storage GRS" width="464" height="359"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replication latency&lt;/strong&gt;: Your data is first replicated synchronously within the primary region via LRS. The data is then replicated asynchronously to the secondary region (eventually consistent). Within the secondary region, it is replicated synchronously using LRS. The &lt;a href="https://azure.microsoft.com/en-us/support/legal/sla/storage/v1_3/"&gt;official SLA for Azure storage&lt;/a&gt; does not make any guarantees about the time needed for geo-replication.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disaster scenarios&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;disaster type&lt;/th&gt;
&lt;th&gt;service interruption?&lt;/th&gt;
&lt;th&gt;data loss?&lt;/th&gt;
&lt;th&gt;recovery possible?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;hardware failure in &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#fault-domain-physical-server-rack"&gt;physical rack&lt;/a&gt;/node&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#datacenter"&gt;datacenter&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;YES&lt;sup&gt;1&lt;/sup&gt;
&lt;/td&gt;
&lt;td&gt;POSSIBLE&lt;sup&gt;2&lt;/sup&gt;
&lt;/td&gt;
&lt;td&gt;YES&lt;sup&gt;3&lt;/sup&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#availability-zone"&gt;availability zone&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#region"&gt;regional&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#geography"&gt;geographic&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;YES&lt;/td&gt;
&lt;td&gt;YES&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;worldwide disaster&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;Within the primary &amp;amp; secondary regions itself, the data is replicated via LRS. In the event of a datacenter disaster in the primary region, it is possible that all replicas within the storage scale unit are affected and the primary endpoint will now be both inaccessible &amp;amp; unrecoverable. Although the secondary region has replica data, its endpoint will be inaccessible until a fail-over is initiated (the data will be inaccessible until the fail-over is complete).&lt;/li&gt;
&lt;li&gt;With GRS, the replication from primary to secondary regions is asynchronous. In the event of the primary being destroyed before it has completely replicated the data to secondary, the secondary will have a stale copy and un-replicated writes will be permanently lost.&lt;/li&gt;
&lt;li&gt;Only when a fail-over has completed, the secondary endpoint becomes the new primary, accessible for read + write operations, with LRS replication.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;SLAs&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;object storage &amp;gt;= 99.99999999999999% (16 nines)&lt;/li&gt;
&lt;li&gt;read requests (hot tier) &amp;gt;= 99.9% (3 nines)&lt;/li&gt;
&lt;li&gt;read requests (cool tier) &amp;gt;= 99% (2 nines)&lt;/li&gt;
&lt;li&gt;write requests (hot tier) &amp;gt;= 99.9% (3 nines)&lt;/li&gt;
&lt;li&gt;write requests (cool tier) &amp;gt;= 99% (2 nines)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  RA-GRS (read-access geo-redundant storage)
&lt;/h2&gt;

&lt;p&gt;Same as GRS, but you always have read-only access to the secondary replica.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Replication latency&lt;/strong&gt;: Same as GRS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disaster scenarios&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;disaster type&lt;/th&gt;
&lt;th&gt;service interruption?&lt;/th&gt;
&lt;th&gt;data loss?&lt;/th&gt;
&lt;th&gt;recovery possible?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;hardware failure in &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#fault-domain-physical-server-rack"&gt;physical rack&lt;/a&gt;/node&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#datacenter"&gt;datacenter&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;YES&lt;sup&gt;1&lt;/sup&gt;
&lt;/td&gt;
&lt;td&gt;POSSIBLE&lt;sup&gt;2&lt;/sup&gt;
&lt;/td&gt;
&lt;td&gt;YES&lt;sup&gt;3&lt;/sup&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#availability-zone"&gt;availability zone&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#region"&gt;regional&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#geography"&gt;geographic&lt;/a&gt; disaster&lt;/td&gt;
&lt;td&gt;YES&lt;/td&gt;
&lt;td&gt;YES&lt;/td&gt;
&lt;td&gt;NO&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;worldwide disaster&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;In the event of primary replica being destroyed, the secondary region will still have read-only access, even without a failover being initiated (unlike GRS where the secondary is inaccessible until a fail-over has been completed).&lt;/li&gt;
&lt;li&gt;In the event of the primary being destroyed before it has completely replicated the data to secondary, the secondary will have a stale copy and un-replicated writes will be permanently lost (same as GRS).&lt;/li&gt;
&lt;li&gt;Prior to fail-over, the secondary will have read-access. After fail-over, the secondary becomes the new primary, with read + write access and LRS replication.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;SLAs&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;object storage &amp;gt;= 99.99999999999999% (16 nines)&lt;/li&gt;
&lt;li&gt;read requests (hot tier) &amp;gt;= 99.99% (4 nines)&lt;/li&gt;
&lt;li&gt;read requests (cool tier) &amp;gt;= 99.9% (3 nines)&lt;/li&gt;
&lt;li&gt;write requests (hot tier) &amp;gt;= 99.9% (3 nines)&lt;/li&gt;
&lt;li&gt;write requests (cool tier) &amp;gt;= 99% (2 nines)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>azure</category>
      <category>highavailability</category>
    </item>
    <item>
      <title>High Availability in Azure: Availability Zones</title>
      <dc:creator>Mithun Shanbhag</dc:creator>
      <pubDate>Fri, 10 Jul 2020 20:53:44 +0000</pubDate>
      <link>https://dev.to/cloudskew/high-availability-in-azure-availability-zones-3h49</link>
      <guid>https://dev.to/cloudskew/high-availability-in-azure-availability-zones-3h49</guid>
      <description>&lt;h2&gt;
  
  
  Azure Availability Zones
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--lNfwtgOi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/02-azure-availability-zones.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lNfwtgOi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/02-azure-availability-zones.jpg" alt="azure availability zones" width="285" height="244"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#availability-zone"&gt;opening post of this blog series&lt;/a&gt; we talked about availability zones and how resources can be classified as zone-redundant, zonal (zone-specific) or non-zonal (regional). If you haven't seen that post, please take a minute to do so.&lt;/p&gt;

&lt;p&gt;Availability zones exist to shield your resources against a datacenter-level disaster.&lt;/p&gt;

&lt;p&gt;As of the time of writing this blog post, &lt;a href="https://docs.microsoft.com/en-us/azure/availability-zones/az-overview#regions-that-support-availability-zones"&gt;only a few Azure regions support availability zones&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Availability zones are free (you're only charged for the VMs and resources placed in the availability zones).&lt;/p&gt;

&lt;h2&gt;
  
  
  Supported Azure Resources
&lt;/h2&gt;

&lt;p&gt;Only a few Azure resource types support availability zones (we're highlighting a couple of important ones below. The complete list is &lt;a href="https://docs.microsoft.com/en-us/azure/availability-zones/az-overview#regions-that-support-availability-zones"&gt;available here&lt;/a&gt;).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Virtual Machines&lt;/strong&gt;: During creation, a VM can be configured as zonal. Its managed disk and public IP address (standard sku only) are then automatically placed in that same zone.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Managed Disks&lt;/strong&gt;: During creation, a managed disk can be configured as zonal or non-zonal. &lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machines/windows/snapshot-copy-managed-disk"&gt;Snapshots&lt;/a&gt; of any managed disks (zonal or otherwise) can be be persisted to zone-redundant storage.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Public IPs&lt;/strong&gt;: During creation, a Public IP address (standard sku only) can be configured as zone-redundant (default) or zonal. Public IPs with basic sku are non-zonal.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Storage Accounts&lt;/strong&gt;: With zone-redundant storage, your data is replicated across three availability zones within the same region. We already covered ZRS storage in &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-storage-redundancies-3n9f/#zrs-zone-redundant-storage"&gt;part 5 of this blog series&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Load Balancers (standard sku only)&lt;/strong&gt;: During creation, load balancers (standard sku only) can be configured as zone-redundant or zonal. Load balancers with basic sku are non-zonal.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Availability Sets vs Availability Zones
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/architecture/resiliency/"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sR5OzjtU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/24-azure-avset-vs-avzone.jpg" alt="availability zone vs availability set" width="755" height="232"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Availability sets provide redundancies within a datacenter, while availability zones provide redundancies within a region. The former shields you against hardware failures in a physical rack, while the latter shields you against a datacenter-level disaster.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SLA for VMs in availability zones is predictably higher (99.99% uptime guarantee) than that of VMs in availability sets (99.95% uptime guarantee). &lt;a href="https://azure.microsoft.com/en-in/support/legal/sla/virtual-machines/v1_8/"&gt;Full SLA details here&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;With an availability set, all VMs in it must belong to the same VNET and same resource group. However an availability zone imposes no such restrictions (zonal VMs can belong to any VNET and any resource group within the region).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When placing a VM in an availability set, you cannot specify its placement (fault domain, update domain etc). However when placing a VM in an availability zone, you have to specify its zone.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Caveats, restrictions, gotchas &amp;amp; tidbits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Zonal resources, once created, cannot be moved to other availability zones within the region. It is however possible to &lt;a href="https://docs.microsoft.com/en-us/azure/site-recovery/move-azure-vms-avset-azone"&gt;use Azure Site Recovery to move non-zonal VMs to availability zones in another region&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;All VMs in an availability zone need not be identical&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;To ensure redundancies in all tiers of your n-tier application, each tier should ideally be placed in a separate availability zone.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Some additional caveats with zonal VMs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A zonal VM can only attach to a public IP address that is zone-redundant or zonal (i.e. standard sku only. Basic skus don't have zonal support).&lt;/li&gt;
&lt;li&gt;A zonal VM can only attach a managed disk from the same availability zone. A non-zonal VM can however attach any managed disk from the same region, irrespective of whether it's zonal or not.&lt;/li&gt;
&lt;li&gt;It's not possible for a zonal VM to use unmanaged disks.
&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_UPCcwVF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/23-azure-availability-zone-managed-disk.jpg" alt="vm in availability zone must use managed disks" width="575" height="116"&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pro tip: &lt;a href="https://docs.microsoft.com/en-us/azure/load-balancer/tutorial-load-balancer-standard-public-zone-redundant-portal"&gt;Pair zonal VMs with a zone-redundant load balancer (standard sku)&lt;/a&gt; for traffic equi-distribution amongst the VMs in that availability zone. All the zonal VMs must be connected to the same VNET.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>azure</category>
      <category>highavailability</category>
    </item>
    <item>
      <title>High Availability in Azure: Availability Sets</title>
      <dc:creator>Mithun Shanbhag</dc:creator>
      <pubDate>Fri, 10 Jul 2020 20:49:48 +0000</pubDate>
      <link>https://dev.to/cloudskew/high-availability-in-azure-availability-sets-5367</link>
      <guid>https://dev.to/cloudskew/high-availability-in-azure-availability-sets-5367</guid>
      <description>&lt;h2&gt;
  
  
  Azure Availability Sets
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ATyTx9KG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/19-azure-availability-set.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ATyTx9KG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/19-azure-availability-set.png" alt="azure availability sets" width="205" height="205"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We've already discussed the concepts of &lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machines/windows/regions-and-availability#fault-domains"&gt;fault domains&lt;/a&gt;, &lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machines/windows/regions-and-availability#update-domains"&gt;update domains&lt;/a&gt; and &lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machines/windows/regions-and-availability#availability-sets"&gt;availability sets&lt;/a&gt; in the &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9/#fault-domain-physical-server-rack"&gt;first post of this series&lt;/a&gt;. Visually, you can represent an availability set with a table as follow:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;-------&lt;/th&gt;
&lt;th&gt;FD0&lt;/th&gt;
&lt;th&gt;FD1&lt;/th&gt;
&lt;th&gt;FD2&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;UD0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;VM1&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;VM6&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;UD1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;VM2&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;UD2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;VM3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;UD3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;VM4&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;UD4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;VM5&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No two VMs in an availability set share the same fault &amp;amp; update domain. This ensures that there will be at least one available VM in the event of a planned maintenance (where an entire update domain is affected) or hardware failure (where an entire fault domain is affected). The &lt;a href="https://azure.microsoft.com/en-in/support/legal/sla/virtual-machines/v1_8/"&gt;SLA for Azure VMs&lt;/a&gt; guarantees that if an availability set has two or more VMs, then at least one VM will be available 99.95% of the time.&lt;/p&gt;

&lt;p&gt;Availability sets are free (you're only charged for the VMs and resources placed in the availability sets).&lt;/p&gt;

&lt;h2&gt;
  
  
  Caveats, restrictions, gotchas &amp;amp; tidbits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;A VM must be placed in an availability set at the time of creation. Once created, it can't be moved into an availability set. Also it's &lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machines/windows/change-availability-set"&gt;not possible to change an existing VM's availability set&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An availability set forces all its associated VMs to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Be in the same resource group and region (technically they all reside in the same data center actually).&lt;/li&gt;
&lt;li&gt;Have their network interfaces associated with the same VNET.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For HA, a VM can be placed in an availability set or in an availability zone. But NOT both. The former offers HA within a datacenter, the latter offers HA within a region.&lt;br&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--u_DU3NRw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/20-azure-avset-vs-avzone.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--u_DU3NRw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/20-azure-avset-vs-avzone.jpg" alt="availability set vs availability zone" width="487" height="116"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;All VMs in an availability set need not be identical, but there are hardware size constraints. Use the &lt;a href="https://docs.microsoft.com/en-us/powershell/module/az.compute/get-azvmsize?view=azps-1.6.0"&gt;Get-AzVmSize&lt;/a&gt; powershell cmdlet to list all the VM sizes available for a particular availability set (&lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machines/windows/tutorial-availability-sets#check-for-available-vm-sizes"&gt;more details&lt;/a&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;

&lt;p&gt;For an availability set with (say) 3 FDs and 5 UDs, the &lt;a href="https://blogs.msdn.microsoft.com/plankytronixx/2015/05/01/azure-exam-prep-fault-domains-and-update-domains/"&gt;placement of the VMs will generally be as follows&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1st VM: FD0, UD0&lt;/li&gt;
&lt;li&gt;2nd VM: FD1, UD1&lt;/li&gt;
&lt;li&gt;3rd VM: FD2, UD2&lt;/li&gt;
&lt;li&gt;4th VM: FD0, UD3&lt;/li&gt;
&lt;li&gt;5th VM: FD1, UD4&lt;/li&gt;
&lt;li&gt;6th VM: FD2, UD0&lt;/li&gt;
&lt;li&gt;and so on...&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Generally an &lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machines/windows/manage-availability#combine-a-load-balancer-with-availability-sets"&gt;availability set is paired with a load balancer&lt;/a&gt; for traffic equi-distribution amongst the VMs in that availability set.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pro tip: To ensure redundancies in all tiers of your n-tier application, &lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machines/windows/manage-availability#configure-each-application-tier-into-separate-availability-sets"&gt;each tier should be placed in a separate availability set&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Pro tip: Use managed disks &amp;amp; managed availability sets for higher availability. Read more below.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Managed disks and managed availability sets
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The issue with unmanaged disks in an availability set
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machines/windows/manage-availability?#use-managed-disks-for-vms-in-an-availability-set"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--haQf-RJg--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/21-azure-av-set-unmanaged-disks.jpg" alt="unmanaged availability set" width="689" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The storage accounts associated with unmanaged disks in an availability set are all placed in a single storage scale unit (stamp), which then becomes a single point of failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benefits of managed disks
&lt;/h3&gt;

&lt;p&gt;With &lt;a href="https://docs.microsoft.com/en-gb/azure/virtual-machines/windows/managed-disks-overview"&gt;Azure managed disks&lt;/a&gt;, you no longer have to explicitly provision storage accounts to back your disks. Managed disks provide a convenient abstraction over storage accounts, blob containers and page blobs. Internally, managed disks use &lt;a href="https://dev.to/cloudskew/high-availability-in-azure-storage-redundancies-3n9f/#lrs-locally-redundant-storage"&gt;LRS storage&lt;/a&gt; (3 redundant copies within a storage scale unit inside a single datacenter).&lt;/p&gt;

&lt;h3&gt;
  
  
  Managed disks go in managed availability sets
&lt;/h3&gt;

&lt;p&gt;If you plan to use managed disks, please ensure you select the "aligned" option while creating the availability set. This effectively creates a managed availability set.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9JTtRjhY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/20-azure-managed-availability-set.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9JTtRjhY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/20-azure-managed-availability-set.jpg" alt="creating managed availability set" width="576" height="510"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To &lt;a href="https://docs.microsoft.com/en-gb/azure/virtual-machines/windows/migrate-to-managed-disks"&gt;migrate VMs in an existing availability set to managed disks&lt;/a&gt;, the availability set itself needs to be &lt;a href="https://docs.microsoft.com/en-gb/azure/virtual-machines/windows/convert-unmanaged-to-managed-disks"&gt;converted to a managed availability set&lt;/a&gt;. This can be done via the Azure portal or via the &lt;a href="https://docs.microsoft.com/en-us/powershell/module/az.compute/update-azavailabilityset?view=azps-1.6.0"&gt;Update-AzAvailabilitySet&lt;/a&gt; powershell cmdlet. Once converted, only VMs with managed disks can be added to the availability set (existing VMs with unmanaged disks in the availability set will continue to operate as before).&lt;/p&gt;

&lt;p&gt;Please note that the &lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machines/windows/manage-availability#number-of-fault-domains-per-region"&gt;max number of managed FDs will depend on the availability set's region&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Managed availability sets get it right
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machines/windows/manage-availability?#use-managed-disks-for-vms-in-an-availability-set"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TJErz4IO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/22-azure-av-set-managed-disks.jpg" alt="unmanaged availability set" width="731" height="303"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The managed disks in an availability set are all placed in a multiple storage scale units (stamps), aligned with VM FDs, avoiding a single point of failure. In the event of a storage scale unit failing, only VMs with managed disks in that storage scale unit will fail (other VMs will be unaffected). This increases the overall availability of the VMs in that availability set.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>highavailability</category>
    </item>
    <item>
      <title>High Availability in Azure: The basics</title>
      <dc:creator>Mithun Shanbhag</dc:creator>
      <pubDate>Fri, 10 Jul 2020 13:56:05 +0000</pubDate>
      <link>https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9</link>
      <guid>https://dev.to/cloudskew/high-availability-in-azure-the-basics-10g9</guid>
      <description>&lt;h2&gt;
  
  
  High availability what now?
&lt;/h2&gt;

&lt;p&gt;In order to understand high availability in Azure, we first need to dig into some underlying Azure concepts. To explain these, I've cobbled together a diagram (it's not 100% accurate, but it does make it simpler to explain things).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://assets.cloudskew.com/assets/blog/images/04-azure-global-infra.jpg"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9wij_dvG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/04-azure-global-infra.jpg" alt="azure global infrastructure" width="625" height="337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Geography
&lt;/h3&gt;

&lt;p&gt;The "highest-level" entity that exists to meet &lt;a href="https://azuredatacentermap.azurewebsites.net/"&gt;data residency&lt;/a&gt;, compliance and sovereignty requirements. Currently there are &lt;a href="https://azure.microsoft.com/en-us/global-infrastructure/geographies/"&gt;4 Azure geographies&lt;/a&gt; - Americas, Europe, Asia Pacific and Middle East + Africa. An Azure geography contains two or more Azure regions within it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Region
&lt;/h3&gt;

&lt;p&gt;As of the time of writing this post, there are &lt;a href="https://azure.microsoft.com/en-us/global-infrastructure/regions/"&gt;53 Azure regions&lt;/a&gt; (with 8 more announced) spread across 4 Azure geographies. Each Azure region contains a inter-connected set of datacenters (all datacenters within an azure region are connected via a dedicated regional low-latency network).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://azure.microsoft.com/en-us/global-infrastructure/regions/"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rv9XWCdy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/01-azure-regions.jpg" alt="azure regions" width="395" height="225"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/availability-zones/az-overview#regions-that-support-availability-zones"&gt;Some Azure regions&lt;/a&gt; support availability zones (each such region contains 3 or more availability zones).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/availability-zones/az-overview"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--lNfwtgOi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/02-azure-availability-zones.jpg" alt="azure regions" width="285" height="244"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Paired Regions
&lt;/h3&gt;

&lt;p&gt;It is recommended that your redundancies span across a set of &lt;a href="https://docs.microsoft.com/en-us/azure/best-practices-availability-paired-regions"&gt;paired regions&lt;/a&gt; in order to meet data residency &amp;amp; compliance requirements even during planned platform maintenance &amp;amp; outages. Azure ensures that during planned platform maintenance, only one region in each pair is updated at a time. Also during multi-regional outages, azure ensures that at least one region in each pair will be prioritized for recovery.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/best-practices-availability-paired-regions"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ro3WHbLn--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://assets.cloudskew.com/assets/blog/images/03-azure-paired-regions.jpg" alt="azure regions" width="337" height="175"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Availability Zone
&lt;/h3&gt;

&lt;p&gt;An &lt;a href="https://docs.microsoft.com/en-us/azure/availability-zones/az-overview"&gt;availability zone&lt;/a&gt; comprises of one or more datacenters. Each availability zone has its own autonomous, independent infrastructure for power, cooling, and networking.&lt;/p&gt;

&lt;p&gt;The Azure resources that support availability zones are &lt;a href="https://docs.microsoft.com/en-us/azure/availability-zones/az-overview#services-that-support-availability-zones"&gt;listed here&lt;/a&gt;. Please note that these Azure resources can be categorized as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;&lt;strong&gt;zone-specific (zonal) resources&lt;/strong&gt;&lt;/em&gt;: Azure ensures that the resources are contained within a specific availability zone. VMs, managed disks and IP addresses fall in this category.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;&lt;strong&gt;zone-redundant resources&lt;/strong&gt;&lt;/em&gt;: Azure automatically replicates the resources across multiple availability zones. Zone-redundant storage accounts and SQL databases fall in this category.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;&lt;strong&gt;non-zonal (regional) resources&lt;/strong&gt;&lt;/em&gt;: Azure resource that are not supported by availability zones.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'll talk about availability zones in detail in a future blog post in this series.&lt;/p&gt;

&lt;h3&gt;
  
  
  Datacenter
&lt;/h3&gt;

&lt;p&gt;You can watch one of &lt;a href="https://twitter.com/markrussinovich"&gt;Mark Russinovich&lt;/a&gt;'s excellent presentations (&lt;a href="https://www.youtube.com/watch?v=D8hMu4jJAwo"&gt;link1&lt;/a&gt;, &lt;a href="https://www.youtube.com/watch?v=m7I8ANssACk"&gt;link2&lt;/a&gt;, &lt;a href="https://www.youtube.com/watch?v=t3Vo37V9oU8"&gt;link3&lt;/a&gt; and &lt;a href="https://youtu.be/S2zguwKvlQk"&gt;link4&lt;/a&gt;) to peek into what an Azure datacenter comprises of. Also you can take a &lt;a href="https://cloud-platform-assets.azurewebsites.net/datacenter/index.html"&gt;virtual tour&lt;/a&gt; of an Azure datacenter.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fault domain (physical server rack)
&lt;/h3&gt;

&lt;p&gt;A single physical rack is considered as a fault domain, since all servers in that rack are connected by common points of failure (common power source and common network switch).&lt;/p&gt;

&lt;h3&gt;
  
  
  Update domain
&lt;/h3&gt;

&lt;p&gt;An update domain is a logical grouping of machines that Azure upgrades/patches simultaneously during planned platform maintenance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Availability Set
&lt;/h3&gt;

&lt;p&gt;It's always a bad idea to run a production workload on a single VM. Best to provision multiple VMs in an &lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machines/windows/regions-and-availability#availability-sets"&gt;availability set&lt;/a&gt;, which is a logical grouping of VMs within a datacenter across multiple &lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machines/windows/regions-and-availability#fault-domains"&gt;fault &amp;amp; update domains&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;When you create multiple VMs within an availability set, Azure distributes them across these fault &amp;amp; update domains. This ensures that at least one VM is remains running in event of either a planned platform maintenance (only one update domain in an availability set is patched at a time) or in the event of a server rack facing hardware failure, network outage or power supply issues.&lt;/p&gt;

&lt;p&gt;My next blog post will explore availability sets for VMs in detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Preemptive FAQs
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What about VM scale sets?
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/overview"&gt;VM scale sets&lt;/a&gt; exist for horizontal scaling under load. In &lt;em&gt;&lt;strong&gt;my&lt;/strong&gt;&lt;/em&gt; opinion, they have almost nothing to do with redundancies for high availability. So I'll be excluding them from this particular blog series. Perhaps I'll address them in a future series on horizontal &amp;amp; vertical scaling for Azure resources.&lt;/p&gt;

&lt;p&gt;Aside: Horizontal scaling &amp;amp; high availability address slightly different issues (performance &amp;amp; reliability respectively). The former adds additional instances when under load to ensure performant service. The latter adds redundant instances (irrespective of load) to prevent service disruption during outages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Will I address high availability on Azure's government cloud?
&lt;/h3&gt;

&lt;p&gt;No. I know very little about Azure's government cloud. You're welcome to read the &lt;a href="https://docs.microsoft.com/en-in/azure/azure-government/"&gt;documentation&lt;/a&gt; yourself.&lt;/p&gt;

</description>
      <category>azure</category>
      <category>highavailability</category>
      <category>ha</category>
    </item>
  </channel>
</rss>
