DEV Community

Discussion on: How I reduced $10000 Monthly AWS Glue Bill to $400 using Airflow

Collapse
 
skysingh04 profile image
Akash Singh

Thanks for your feedback! You’ve raised some great points, and I’d love to clarify a few things.

On the surface Glue and Airflow serve different purposes. Glue is primarily an ETL service, whereas Airflow is an orchestration tool. Our use case involved heavy ETL processing with AWS Glue using Spark, which led to unexpectedly high costs. The $10K bill wasn’t just from orchestration but from running Glue jobs at scale. The motivation behind moving to Airflow was to gain more control over execution and cost efficiency using Airflow workers and move away from the cloud vendor lockin.

Regarding MWAA, it's a great managed solution, but for our scale, self-hosted Airflow provided better flexibility and cost savings. MWAA's pricing (around $0.50/hour plus metadata storage, logging, and network costs) can add up quickly, especially when managing hundreds of DAGs. In some cases, a self-managed setup gave us more control over instance types, autoscaling, and optimizations that MWAA abstracts anyways.

For DAG deployment, I did explore the s3 approach but we faced a lot of issues with setting it up. Maybe you can pinpoint the right documentation for this, perhaps we were doing something wrong. Anyways, pushing the DAGs to s3 or just writing a simple CI pipeline to do it for you is a matter of choice only.

Collapse
 
mauricebrg profile image
Maurice Borgmeier

I don't understand, how does gaining more control over how you schedule your Glue Jobs reduce your costs unless you change something about the Glue Jobs?

  • Are you running them less frequently?
  • Did you change the logic to be more efficient?
  • Did you supply fewer DPUs?

Airflow is a great orchestrator and using MWAA seems like a much more painless setup, especially when you take into account future debugging / maintenance / operations expenses.

What are your costs for the self-managed Airflow + the Glue Jobs it's triggering?

Thread Thread
 
skysingh04 profile image
Akash Singh

Answering @mauricebrg , the updated cost is literally of just the computation of running the airflow ecs services, approximately $400-$500 per month. Using MWAA is a more painless setup yes, but there is not much maintainance needed to our self-managed airflow once we have set it up.

Again, our airflow is not triggering any glue jobs, rather we have written DAGs for airflow that mimic our glue jobs and run it on airflow workers using celery. Kindly read the blog for further details!

Thread Thread
 
mauricebrg profile image
Maurice Borgmeier

That means your savings are coming from you changing how you do ETL. IMO the more interesting story is how you replaced Glue Jobs that run some Spark-stuff with DAGs on Containers.

Collapse
 
wahid_m_1da6cebefa1750714 profile image
Wahid M

You still haven’t provided the details of your glue jobs and you airflow dag replacing the glue jobs. And airflow ain’t a replacement for glue.

Thread Thread
 
skysingh04 profile image
Akash Singh

@wahid_m_1da6cebefa1750714 I can't share the exact details of the glue jobs due to security concerns ofc, but they were on the lighter side of transformations.

And yes, we were able to completely replace glue jobs with our airflow setup using airflow workers, kindly refer to the blog for the same \oo/