Most technology companies start with a single cloud provider. With time they start to adopt the cloud-native functionalities of that cloud. This is expected and completely makes sense. Moving towards cloud-native architectures brings convenience and possibly, cost-efficiency.
But if you are not careful, you may get bound to the cloud provider. Many would prefer to have just an option to switch their cloud provider or more commonly, have an option to run either one part of their product or one of their environments on another cloud. The reasons may be varied. Your customer may have a preferred cloud hosting clause. You might want to expand to a region dominated by other cloud providers. Other reasons could be Data-heaven laws, pricing or a particular service of another cloud that absolutely eases things for you.
On the other hand, some technology teams value this optionality so much that they forgo cloud-native functionalities and operate just like they would operate on a typical traditional data center. This adds up to the tooling needs significantly, making the infrastructure expensive. After all, the cloud was never meant to be run like a traditional data center!
Having the best of both worlds isn't very hard. One needs to follow some design guidelines from both Dev and Ops sides. These are best practices even if you choose to be on a particular cloud forever.
βDev Guidelines and Best Practices:
It all boils down to following simple decision models. We like to think of them as "blue cloud" and "grey cloud". A "blue cloud" is essentially all cloud-native features that have an easy alternate in most other cloud providers/open-source without doing any code changes / minimal code changes. A "grey cloud" is the exact opposite of that. It generally leads to the below three types choices -
Protocol compliant cloud-native resources: As an example, AWS Aurora is a cloud-native MySQL provided by AWS which is absolutely ok to use. Applications don't need to know it is an Aurora instance or a vanilla MySQL hosted on a bare EC2 machine. This is "Blue cloud" but shifts to the grey zone if you start to make the assumptions on specific Aurora features. For e.g., that the read replica is a lower latency version of master-slave replication and design your apps accordingly (make assumptions on the latency in the application). It isn't so and may not have solutions across open-source/other cloud providers.
Reliable and Popular cloud components: There are services provided by cloud providers which are unique and widely popular like S3. Usually, most cloud providers will have a look-alike of this type of services like Azure Blob or Google cloud storage. They offer similar/same features with different APIs. It is absolutely worth using them but need to be managed. The solution is fairly trivial that do not build deep tie-ups but build utility layers for your functionalities. For such widely popular services, cloud-agnostic wrappers/SDK are common as well, for e.g., MinIO. This would shift it to the "Blue zone" again.
Niche Cloud features: There are unique capabilities of each cloud that can bring down your development time significantly. Like, say an S3 select feature can give new capabilities to the object that you already store in S3. You can use this feature by wrapping with micro-services or functional abstractions so you at least can write another cloud-specific implementation if it comes to that. This would help in localizing the change without the need to go everywhere in your whole codebase.
OPs Guidelines and Best Practices:
In the above cases, we took care of the Dev Part. How about the Ops? The Ops setup generally includes backup, recovery, code-delivery, observability, security, HA. Instead of completely building them in a cloud-agnostic way, it is prudent to use some of the cloud-native capabilities of each cloud. This would reduce the burden of building everything from scratch in an error-free way. A few tips while you build your Ops toolchain -
There should be a central repository of the policies, that should be agnostic of the implementation. The implementation may choose the most reliable method in each of the cloud or manifestations. For e.g., a Disaster recovery policy should pertain to the backups to keep and their frequency, agnostic of the fact that the implementation is an Aurora MySQL or a Self-hosted MySQL in a Linux server
You should provide a uniform developer experience even if your environments contain a mix of self-hosted or cloud-native components. For e.g., Metrics should be pulled from everywhere and collated in a single source of truth for creating the dashboards and alerts in a uniform way.
Kubernetes is the first step towards being cloud-native and at the same time being agnostic. However, there should not be any change in the code delivery workflows even if the underlying Kubernetes clusters are cloud provider managed on each cloud.
Being Cloud-agnostic doesn't mean multi-cloud. It doesn't mean migrating to another cloud at will as well. It simply means if you could host a particular environment of yours in another cloud within a reasonable time, say weeks not years. This would give the necessary optionality of the future without investing in tooling for other clouds. You don't need to sacrifice cloud-native functionalities either, you just need to manage the abstractions well.
We at Facets.cloud are building on the above principles to provide you with the necessary tooling to achieve the best of both worlds. Do write us to know more or collaborate!
This post was originally posted here.
Top comments (0)