DEV Community

Maksim Bober
Maksim Bober

Posted on

Lessons learned: Migrating Spark App to AWS EMR

Some background, the startup that I was working at got acquired, and suddenly all of my programming effort was focused on migrating ML infrastructure to a parent company's infrastructure. I didn't know anything about AWS and EMR at the time, so I had to learn on the job 💪

Get ready to work

So here are my newbie thoughts about AWS EMR good, ugly and bad sides.

Good

  • Management of Spark cluster is done by AWS, so you can spend time on delivering the value instead of mantaining the cluster. ❤️
  • Read/Writes from S3 out of the box. Data input and output can live in S3 which makes it very easy to share data with teamates. ❤️ ❤️ ❤️

Ugly

  • Logs...
    • To get the logs of you application, you would need to jump through bunch of hoops. Live logs of a running application on the cluster? (Very, very tricky todo). 😭😭
    • You can get the logs after the application has finished running, by ssh into EMR master and runing yarn logs --applicationId.
    • You can also wait ~5min and get them from the S3 bucket where EMR saves these logs 😭

Bad

  • Finetuning hole...

    • Something that you could run with 4 nodes in your own Spark cluster could take 30 nodes with autoscaling in EMR.
  • No Spark standalone mode, no nice debugging...

    • EMR does not support Spark standalone mode. So you would write your PySpark app submit it to EMR cluster as a job. Once it fails, you would need to go through ENORMOUS stack trace of Yarn, Spark and your application to see the error. No more pleasant debugging experience from the comfort of your IDE 😭😭

Conclusion

Overall, AWS EMR still requires maintenance effort. It's much more involved compared to a simple web app. You would need to understand Spark's inner workings to fine-tune it to get the logs quickly and not bankrupt you in the process with its compute autoscale. (You could disable autoscaling, but then you would need to spend more time tuning your resources)

Top comments (0)