AWS Glue

#aws #awschallenge #awsdatalake

AWS Glue is a serverless data integration service that helps us easily discover, prepare, and combine data from multiple sources for analytics, machine learning, and application development. With no infrastructure to manage, AWS takes care of everything including configuration , provision and life cycle.

Features of AWS Glue

Data Discovery: Automatically identify data structures and schemas.
ETL (Extract, Transform, Load): Easily create and manage data pipelines.
Data Catalog: A centralized metadata repository for data.

Working Process:

AWS Glue supports both structured and semi-structured data formats from services like Amazon S3, RDS, Redshift, DynamoDB, and JDBC-compliant sources as well as more than 70 diverse data sources . AWS Glue Crawlers scan these data sources, discover schemas, and create tables on GLUE catalog . After that we can start querying .