Optimizing Data Management on AWS - Part 1

#awscloud #datamanagement #awswellarchitected #aws

Introduction

In the cloud setup, having a good data management plan on Cloud is of utmost importance for multiple reasons. Reasons for this are many. Cloud costs are typically linked to usage and as such poor data handling can result in rising costs due to unnecessary or outdated data. Further, since cloud data might cross several regions, it's essential to adhere to different regulations and uphold strict security standards, which includes measures like encryption and controlling access.

This blog touches upon 5 critical areas you should be looking at with regards to data management. This part highlights the first two, Part 2 covers the remaining three. For more exhaustive details, please refer to the AWS well architected whitepaper.

Do not assume Uniform Data Storage and Access Patterns

Too often, we assume that all data storage can be managed uniformly in the same storage type. But for multiple workloads of any sizeable complexity, this is not likely to be the case. Every workload has its unique data storage and access requirements. Hence assuming that that all workloads have similar patterns or using a single storage tier for all workloads can lead to inefficiencies. The sooner these patterns are recognized the better. The advantage of recognizing and catering to these patterns is that it reduces the resources required to meet business needs, and thereby the overall efficiency of the cloud workload. To address this issue, regularly evaluate your data characteristics, your access patterns and plan to migrate data to the appropriate storage type that best aligns with these patterns. Also, understand that this will not be a one time activity but an exercise that needs to be conducted regularly.

For a comprehensive evaluation, the decision guides provided by AWS for storage and database services should be a good starting point.

Have a solid data classification strategy

Data classification is the process of categorizing data based on its sensitivity and criticality. A common mistake is not identifying the types of data they process and stored based on its sensitivity and importance. This ends up being a massive oversight and not having a classification strategy could lead to other consequences such as inappropriate security controls in place and even lead to compliance, regulatory or legal issues.
By having a proper data classification policy, organizations can determine the most performant and cost optimized storage(and even energy efficient, if sustainability is one of the key drivers) tier for their data. To come up with a data classification strategy, one should conduct an inventory of the various data types and then determine their criticality. Also have a periodical audit the environment not just for untagged and unclassified data but also to re-evaluate the data classification conducted earlier to see if that needs any change as per changing business conditions.

AWS Data classification guide

Few other strategies are covered in Part 2 of this blog.