Kaspar Von Grünberg

Posted on Feb 10, 2021 • Originally published at humanitec.com

Oops, we're multi-cloud. A hitchhiker's guide to surviving.

#devops #kubernetes #programming #productivity

Over the last few years, enterprises have adopted multicloud strategies in an effort to increase flexibility and choice and reduce vendor lock-in. According to Flexera's 2020 State of the Cloud Report most companies embrace multicloud, with 93 percent of enterprises having a multicloud strategy. In a recent Gartner survey of public cloud users, 81% of respondents said they are working with two or more providers. Multicloud makes so many things more complicated that you need a damn good reason to justify this. At Humanitec, we see hundreds of ops and platform teams a year, and I am often surprised that there are several valid reasons to go multi-cloud. I also observe that those teams which succeed are those that take the remodelling of workflows and tooling setups seriously.

What is multicloud computing?

Put simply, multicloud means: an application or several parts of it are running on different cloud-providers. These may be public or private but typically include at least one or more public providers. It may mean data storage or specific services are running on one cloud providers and others on another. Your entire setup can run on different cloud providers in parallel. This is distinct from hybrid cloud services where one component is running on-premise, other parts of your application are running in the cloud.

Why adopt the multicloud approach?

If you don’t have a very specific reason, I usually recommend fiercely staying away from multi-cloud. As the author and enterprise architect Gregor Hohpe puts it: “Excessive complexity is nature's punishment for organizations that are unable to make decisions.” Multicloud significantly increases the complexity around developer workflows, staffing, tooling, and security. The core risk remains that it ads redundancy to your workflows. If you were already struggling with managing dozens of deployment scripts in different versions for one target infrastructure, all of this doubles as you go multi-cloud.

Often multicloud happens involuntarily. For example, legacy is a common culprit for a multicloud approach where generations of teams have chosen particular vendors based on what they needed at a particular time. As people left, new people came on board adding more vendors based on personal preferences and skill sets without signing off on the legacy cloud solutions. This can also happen when a company is acquired by another with their own workplace preferences.

At Humanitec, we analyze and work with hundreds of platform teams and get a close-up on their operational setup. It felt counterintuitive but during the last two years I actually came across several valid and compelling reasons to adopt multicloud:

Avoid cloud-vendor lock-in

Companies may opt for a multicloud approach to avoid the risk of cloud vendor lock-in. If your cloud-based applications are dependent on proprietary capabilities specific to the offerings of your cloud platform (Amazon Kinesis for example), this can leave you in a state of cloud vendor lock-in. This means you are beholden to the changes to the product offerings and the price increases without recourse. Having data locked-in to a single provider also increases the risk if something goes wrong. Further, using only one cloud provider could prove constraining as a company grows. However, as Gregor Hohpe asserts:

“Many enterprises are fascinated with the idea of portable multicloud deployments and come up with ever more elaborate and complex (and expensive) plans that'll ostensibly keep them free of cloud provider lock-in. However, most of these approaches negate the very reason you'd want to go to the cloud: low friction and the ability to use hosted services like storage or databases.”
Feature and pricing optimization
Critically, the use of multiple vendors provides the opportunity to gain the best bits of various cloud providers. Every cloud vendor is different when it comes to features and pricing. One may excel in integration with specific technologies, one is better at hosting VMs, another may offer better support, and yet another may be cheaper. A company may choose to prioritize a more expensive cloud provider as it offers greater security for expensive data while using another for less critical data.

Further, while workloads can be built to be vendor neutral, some are better served from specific cloud platforms. Apps that use APIs native to AWS such as Alexa skills are best served by using Amazon Web Services, for example.

Compliance and risk minimization

Risk minimization and compliance regulations and requirements are common to mission-critical enterprises like public infrastructure, healthcare, and banking. They may be required under HIPAA regulations to store patient data on-premises or in a private cloud, for example. Such industries also may opt to run parallel structures on different cloud providers, meaning if one of the major cloud vendors were to fail, in a disaster recovery case they could simply continue running on another provider.
Other examples can be found in the banking sector. Compliance there, for example, requires that systems cannot be subject to price shocks. An alternative must therefore be available in the hypothetical event that a cloud provider suddenly increases its prices sharply.

Legal requirements and geographical enforcement
Sometimes business activity in a particular country requires a multicloud approach. While you typically run your architecture by default on one big US vendor (like AWS, Azure, GCP), to operate your app in China or Russia you’d be required to run on an Chinese or Russian vendor.

Edge computing

Distributed models such as Edge computing are a common way to support mission-critical IoT applications using predictive analytics such as autonomous vehicles, smart city technology, industrial manufacturing, and remote monitoring such as off-shore oil refineries. It also is used to pre-process data such as video and mobile data locally. Edge computing is useful in scenarios that require greater data latency with little to no lag and down-time. It follows that while edge computing does the heavy lifting of collecting, processing and analyzing at the edge, the data that goes to the cloud for deeper and historical processing is significant in size (data from a connected car alone is about 6TB) and can be more manageable in a designated cloud.

Challenges with multicloud setups

As mentioned before, such multicloud environments can create an abundance of unanticipated challenges.

As Kelsey Hightower notes, it may significantly compromise your effectiveness as an organization.(https://twitter.com/kelseyhightower/status/1164203419822772224)

Skilling up and the shortage of cloud specialists

Developers who work with the same cloud service provider over time, gain a deep domain knowledge of its specific tools, processes, and configurations. This knowledge is hard-baked into a company’s skills and accomplishments and generates significant value. Shifting to multiple vendors shatters this expertise as developers now have to contend with upskilling, relearning, and certification. How transferable is the existing skillset? What is the cost (time, financial, and developer frustration) in upskilling? Do you need to hire new staff with specialist knowledge? I find that even with the more simple things like authorization management there are huge differences for every provider. Ever tried applying what you learned at GCP in terms of IMAP on AWS? Good luck.

The lack of a single interface

Multicloud setups are hard to manage because it’s hard to keep track of what is running where. This includes network connectivity, workload allocation across clouds, data transmission rates, storage, archiving, backup, and disaster recovery. It can be hard to integrate Information from different sources and gain an understanding of the movements within and across multiple clouds. Each cloud provider may have its own dashboard with specific APIs and its own interface rules, procedures, and authentication protocols. There’s also the challenge of migrating and accessing data.

Management of multiple delivery and deployment processes
A multicloud setup requires multiple deployment pipelines which adds additional complexity. From managing config files to database provisioning to deployment pipelines: with every cloud provider, the amount of work increases substantially, diverting developers from other essential tasks. More complexity also can increase workloads on ops teams. Increased tooling and more layers of abstraction also mean a greater risk of misconfiguration.

Integration, portability, and interoperability

The integration of multiple cloud vendors with your existing application and databases can be challenging. Clouds typically differ when it comes to APIs, functions, and containerization. Are you able to migrate components to more than one cloud without having to make major modifications in each system? How does interoperability flow between your public and private clouds?

Cloud sprawl

Technopedia defines cloud sprawl as the uncontrolled proliferation of an organization’s cloud presence. It happens when an organization inadequately controls, monitors, and manages its different cloud instances, resulting in numerous individual cloud instances which may then be forgotten but continue to use up resources or incur costs since most organizations pay for public cloud services. No vendor is going to tell you you’re running on too many machines when the money keeps coming in.

As well as unnecessary costs, cloud sprawl means the risk of data integrity and security problems. Unmanaged or unmonitored workloads may be running in QA or dev/test environments using real production data and pose as potential attack vectors for hackers.

Security

As alluded to with cloud sprawl, one of the biggest challenges you’re likely to face with a multicloud environment is security. Security protocols may differ between clouds and may require extensive customization. It’s essential to be able to see the security of the whole multicloud environment at any one time to prevent cyber attacks and respond to security vulnerabilities.

Siloed vendor features and services

While there are similarities with cloud providers, there are also siloed vendor offerings, features, and services to make their product more compelling than those of their competitors. A multicloud environment is more complex if a lack of equivalent features results in difficulties such as replicating architecture.

How to survive multicloud challenges

Standardize to the lowest common denominator
The more things vary between clouds the worse it gets for you. Look out for whatever helps you standardize layers so you have to worry less about switching context between clouds. There are certain scenarioss where this is impossible. The authentication stuff mentioned above is an example. You just have to deal with that and this will require specializing at least one colleague.

But as you move higher up the stack, there are strategies you can use. For instance, provisioning resources with IaC scripts or using Terragrunt is one of them. In my opinion, an almost must is making sure workloads are strictly containerized. No containers, no multi-cloud. Next, make sure you use the managed Kubernetes offerings (and only those). Yes, K8s can be a beast, but it equalizes configurations and practices across the board and there are great operating layers (hint: such as Humanitec) that can help you manage this.

Define a single source of truth for configurations

Dealing with configurations is sufficiently hard on one cloud already, especially if you work with unstructured scripts or you don’t have your shit together if it comes to versioning. I wrote a piece on this recently if you want to dive in. But if you streamline this in a way that you don’t use unstructured scripts but define a baseline template that works for all clouds (with Kubernetes as a common denominator) this gets a lot easier. If a developer wants to change anything, she applies changes to this template through a CLI/API or UI and at deployment time you create manifests for each deploy. You then route the manifests to a specified cluster and just save the information deploy = manifest + target-cluster. This way you have a defined, auditable structure and history of all deploys across clouds. Maybe you also want to build a dashboard that shows what state of which config is applied to which cluster? This will make it a lot easier.

Abstract multi-cloud entirely from application developers
If you think multi-cloud is only draining for the ops team you are wrong. Multi-cloud heavily disrupts the workflows of your application development team too. Waiting times for new environments, colleagues, or pieces of infrastructure increase significantly. I wrote a piece on how those little minutes pile up. What I see a lot is that teams respond with endless training slots to get their front ends up to speed on helm charts. The truth is they don’t give a shit. They are benchmarked against their ability to write typescripts, and YML doesn’t cut the deal. Use the config management approach lined out above, get them a slick UI or CLI and let the only touchpoint they have with the clouds be the specification of what environment type in what cloud vendor they want to spin up. Don’t do this, and your ops team will be the extended help-desk going down under in TicketOps tasks.

Look at Internal Developer Platforms

Because they do pretty much all of the above out of the box. They standardize configs, help the ops teams orchestrate infrastructure, log deploys so every developer can roll back, provide a RBAC layer on top of all clouds to manage permissions and help you manage environments and environment variables across clouds, workloads and applications. This is the community page around Internal Developer Platforms This provides a good overview on what this actually is.

DEV Community

Oops, we're multi-cloud. A hitchhiker's guide to surviving.

Top comments (0)