DEV Community

Nidhi Thakore
Nidhi Thakore

Posted on

How I Used AWS Glue and Athena for Serverless Data Analytics

As someone who loves building data pipelines, I’ve always been fascinated by how serverless architectures simplify analytics.

Recently, I worked on a project where I built a fully serverless data analytics pipeline using AWS Glue and Amazon Athena — no servers, no EC2, no clusters, and no headaches.

In this blog, I’ll take you through how I used these two AWS powerhouses to go from raw S3 data → cleaned data → analytical insights — all without managing a single server.

Step 1: Store Raw Data in Amazon S3
I started with raw e-commerce transaction data — product sales, customers, and timestamps — stored in S3.
my-ecommerce-analytics/
raw/
sales01.csv
sales02.csv
customers.csv
transformed/

Step 2: Crawling Data Using AWS Glue
Next, I created an AWS Glue Crawler and pointed it to my s3 bucket.

What’s amazing about Glue Crawlers is that they automatically detect schema and data types and create tables inside the AWS Glue Data Catalog.

After running the crawler, I had:
sales_data
customers_data
in a database called ecommerce_analytics

You can also schedule crawlers to run daily or hourly — perfect for continuously updated S3 data.

Step 3: Exploring Data with Amazon Athena
With the Glue Catalog ready, I moved to Amazon Athena that allows you to run SQL queries directly on S3 data, without having to load it into a database where I explored my sales data, aggregated revenue numbers, and filtered out any invalid or duplicate records.

You can write your own queries to perform these operations — it feels just like using a normal SQL database.

Step 4: Transforming Data with AWS Glue Jobs
Raw data is rarely perfect, so I used AWS Glue ETL Jobs to clean and transform it where i created a Glue job in Python to remove duplicates, standardize timestamp formats, and join sales data with customer information.

Once transformed, I stored the cleaned data back in S3 — this time in Parquet format to make future queries faster and more cost-efficient.

If you’re implementing this, you can write your own ETL logic inside Glue Studio or the Glue Job editor.

Step 5: Querying Transformed Data with Athena
After transformation, I returned to Athena to query the cleaned data. This is where you can perform your own analytical queries like finding top-selling products, analyzing sales patterns, or identifying high-value customers.

Athena makes it effortless — just write standard SQL queries, and it processes everything directly from S3

If you’re exploring AWS as a student, data engineer, or cloud enthusiast, I highly recommend trying this out with your own dataset so that You can understand the true power of serverless analytics once you query data sitting in S3 — in seconds — without spinning up a single machine.

Top comments (0)