DEV Community

Cover image for Big Data in Cloud Computing - AWS
Warda Liaqat
Warda Liaqat

Posted on

Big Data in Cloud Computing - AWS

Image description

This is an introductory article about Big Data and it's uses in Cloud Computing, especially in AWS.

Introduction

The use of Big Data and Cloud Computing has now become mainstream in our current technological age. Both technologies serve different purposes; Big Data represents content, while cloud computing is infrastructure. When used together, both technologies can provide excellent results.

Big Data & Cloud Computing Explained

To understand why both the technologies are often bundled together, we need to have a basic understanding of what Big Data and cloud computing are.
Big Data is a High velocity, High Volume, high variety info assets that demands cost effective, innovative forms of information processing.

The most straightforward definition of Big Data is that it’s a large volume of data- think terabyte or petabyte or even more than that. Data can be either structured or unstructured. This data can be so extensive that it cannot be processed through traditional database and software techniques.

As for Cloud Computing, in the shortest term, it means storing and accessing data, files, and programs over the Internet instead of the local computer’s hard drive. The cloud is a metaphor for the Internet.

As we have understood the basics of Big Data and Cloud Computing Now let's move towards expanding our horizon of these technologies.

We need to understand the Five V's of Big Data that actually defines it altogether.
Those are as follow:
Volume: Represents a huge amount of data
Variety: Represents different formats of data from various sources.
Value: Extracting useful data
Velocity: High speed of accumulation of data
Veracity: Inconsistencies and uncertainty in data

Big Data services from famous Cloud service providers:

A sampling of available big data services from the top three providers includes the following:

AWS

  • Amazon Elastic MapReduce
  • AWS Deep Learning AMIs
  • Amazon SageMaker

Microsoft Azure

  • Azure HDInsight
  • Azure Analysis Services
  • Azure Databricks

Google Cloud

  • Google BigQuery
  • Google Cloud Dataproc
  • Google Cloud AutoML

Process involved in Big Data analysis:

  1. Data ingestion: collecting raw data (e.g., from logs, mobile devices etc.)
  2. Data storage
  3. Data preprocessing: (converting raw data to consumable from)
  4. Data visualization

Services provided by AWS for Big Data analytics:

  1. Data ingestion
    • AWS kinesis, AWS Snowball
  2. Data preprocessing
    • AWS EMR, AWS Redshift
  3. Data storage
    • AWS S3, AWS Glue
  4. Data visualization
    • AWS QuickSight

Conclusion

Cloud computing provides enterprises with a cost-effective & flexible way to access a vast volume of information we call Big Data. Because of Big Data and cloud computing, it is now much easier to start an IT company than ever before. However, it is important to note that cloud-based big data analytics success depends on many factors. An important factor is a reliable cloud provider with extensive expertise, offering highly robust services.

I hope it helped to clear your basic concepts of Big data and it's uses in Cloud. Please Look forward to further detailed articles on Big Data and Cloud.

Top comments (0)