DEV Community

Discussion on: SQL-based INSERTS, DELETES and UPSERTS in S3 using AWS Glue 3.0 and Delta Lake

Collapse
 
dude0001 profile image
Mark Lambert

Thank you for the article. We have the need to do fast UPSERTs in an ETL pipeline just like this article. I am using Glue 2.0 with Hudi in a PoC that seems to be giving us the performance we need. Delta was on my radar and when I saw the Glue 3.0 announcement making a lot of improvements for Delta but no mention of Hudi it makes me think we should have looked at Delta first. Do you have any experience with Hudi to compare with your Delta experience in this article?

Collapse
 
klescosia profile image
Kyle Escosia

I actually want to try out Hudi because I'm still evaluating whether to use Delta Lake over it for our future workloads. I'm on the same boat as you, I was reluctant to try out Delta Lake since AWS Glue only supports Spark 2.4, but yeah, Glue 3.0 came, and with it, the support for the latest Delta Lake package.

Others think that Delta Lake is too "databricks-y", if that's a word lol, not sure what they meant by that (perhaps the runtime?). But so far, I haven't encountered any problems with it because AWS supports Delta Lake as much as it does with Hudi.