Top Data Picks for AWS re:Invent 2019

Helen Anderson on November 08, 2019

AWS re:Invent 2019 is only a few weeks away and promises to be a huge event with more sessions on Databases and Analytics than ever before. Since ... [Read Full]
markdown guide
 

I'm looking forward to the Athena analytics workshop (ANT222) on Wednesday! We've been interested in it for a while but I haven't had the time to dig in yet.

Other than that, I'm waitlisted for a talk on best practices with large geospatial datasets; and I was signed up for a session on data lakes in the oil & gas industry but that seems to have disappeared.

 

I notice a few of my Builders Sessions are now 'waitlisted' after I signed up. I guess the good news is there are sessions being added all the time so there will be plenty of options.

 

I'd recommend checking the agenda often while you're there as a lot of these sessions are rearranged on the fly and very popular sessions are often re-run on a different day, sometimes the presenters can't make it and have to cancel etc etc.

 

Thanks for putting this together!

While I'm not attending Reinvent, I'm really looking forward to catching ANT307 on Youtube after the fact. Managing a Presto cluster is no fun, but the last time I evaluated Athena as a replacement, it had a bunch of undocumented limitations that kept it from being a real alternative. Fingers crossed that's changed!

 

The main limitation I've found with Athena is that it's still based on Presto 0.172 (from April 2017) so is missing a bunch of functions and enhancements from subsequent releases. I've also found that it's not too hard to run into scalability issues where very large queries will intermittently fail due to lack of resources.

On the other hand it is incredibly simple to set up and much easier than running a standalone Presto cluster, and a lot cheaper than a decent Redshift cluster!

 

There are so many sessions on offer I'm really glad they are putting them up on YouTube as well.

Athena is top of my list to take a second look at it. I've given it a try in tutorials but am curious to see how things have changed in the last year.

 
 

I thought of you when I saw that pop up! It's a Session so will be up on YouTube for you.

 

I'm already trying to connect with the Solution Architect from that talk. One of the people who may appreciate my Amazóns

 

Great list! Do you have a similar list for last year's re:Invent?

 

Sure do!

Deep Dive and Best Practices for Amazon Redshift (ANT401-R1)
Great technical overview of Redshift and AWS recommended best practices for table design, ETL processes, cluster sizing and workload management.

Using Performance Insights to Optimize Database Performance (DAT402)
Interesting behind the scenes technical discussion of how the Performance Insights system populates the console dashboard. This is used for monitoring the performance of RDS databases, including Aurora. Understanding how this works enables us to make better use of this tool to identify when performance issues are occurring and how to resolve them.

Effective Data Lakes: Challenges and Design Patterns (ANT316)
A presentation of an example data lake architecture, plus answers to a number of FAQ from real world customers.

Technology Trends: Data Lakes and Analytics (ANT205)
Anurag Gupta, VP for AWS Analytic and Transactional Database Services, talks about some of the key trends he sees in data and analytics, also discusses AWS products that are relevant to these trends.

Big Data Analytics Architectural Patterns & Best Practices (ANT201-R1)
Discusses best practices for loading, storing, processing and analysing big data on AWS. Offers some advice on choosing the right technology for different scenarios.

Data Lake Implementation: Processing & Querying Data in Place (STG204-R1)
High level discussion of the different technologies available for analysing data stored as files in S3. Includes a discussion on "S3 Select".

Best Practices to Secure Data Lake on AWS (ANT327)
Looks at the security options for data held or accessed through various AWS technologies such as Redshift, Glue, Amazon EMR clusters and S3.

Serverless State Management & Orchestration for Modern Apps (API302)
Using Step Functions to orchestrate processes on AWS.

 

I highly recommend the ANT401-R1 session on Redshift best practices if you're working with Redshift.

 

That's a great line up of sessions Helen, looking forward to reading about your Re:invent 2019 experiences!

 

Thanks Nathan! Only a few weeks away so it's getting exciting :D

code of conduct - report abuse