In this article, we will be learning about three important cloud services used commonly by data engineers. They are:
- AWS Glue - serverless data integration service that helps to discover, prepare, and integrate data from multiple sources for analytics
- AWS Lake Formation - helps to easily create secure data lakes
- AWS Athena - interactive query service for analyzing data in Amazon S3 using standard SQL queries
AWS Glue
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides all of the capabilities needed for data integration making it easy for analyzing data.
AWS Glue provides both visual and code-based interfaces to make data integration easier. Data integration is the process of preparing and combining data for analytics, machine learning, and application development. It can involve multiple tasks like data extraction from various sources, data cleaning, normalizing, combining, and organizing data in data warehouses, and data lakes. Glue can automatically generate the code to run the data transformations and loading processes. AWS Glue helps to easily run and manage even thousands of ETL jobs.
The key benefits of using amazon Glue are:
- Faster data integration
- Automatic Data integration at scale
- Serverless
AWS Lake Formation
A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. A data lake has a flat architecture which enables it to store unstructured, semi-structured, or structured data collected from various sources across the organization. Because of their architecture, data lakes offer massive scalability and it does not require knowing the volume in advance.
The benefits of having data lakes are:
- Support all data types
- Suitable for all users
- Easily adapt to Changes
- Provide Faster Insights
- Easily Scalable
- Data is stored in raw form and processed only when it is needed
AWS Lake Formation is a popular cloud service that makes it easy to set up a secure data lake. Lake Formation helps us collect and catalog data from databases, move the data into Amazon S3 data lake, clean and classify our data using machine learning algorithms, and has features to secure access to sensitive data.
AWS Athena
AWS Athena is a simple SQL-like interactive query service that helps us to analyze the data stored in Amazon S3. Athena is easy to use, serverless, extremely fast, and need to pay only for the query we run.
Athena is easy to use. We need to define the schema for our data in Amazon S3 and can start querying using standard SQL. Athena can provide results within seconds. Using Athena, there’s no need for complex ETL jobs to prepare the data for analysis. Athena can run multiple queries simultaneously.
The key feature of AWS Athena are:
- SQL-based tool
- Serverless
- Fast and optimized
- Cost-effective
- Durability and availability of the data
- Security
Reference:
Top comments (0)