On April 13, the journey of the new AWS Data Analytics Specialty certification officially began - prior to the beta phase in December 2019 / January 2020. It coincided in time with the AWS Database Specialty Beta, which forced me to choose between the two. Finally, I decided on taking the Databases Specialty, as I had recently tested from AWS Big Data.
The “Beta exam” experience is very different from the “standard” one: 85 questions and 4 hours long - that is, 20 questions and one more hour - a really intense experience. I recommend taking a 5-minute break - in the centers they are allowed - since after the third hour it is very difficult to stay focused.
The certification is the new version of AWS Big Data Specialty, an exam that will be withdrawn in June 2020. I will not go into much depth on the differences, suffice it to say that the domain of Machine Learning has been eliminated, expanding and updating the rest of domains in depth. But beware, Machine Learning and IoT continue to appear integrated in the other domains, therefore, it is necessary to know them at an architectural level, at the very least.
Prerequisites and recommendations
I will not repeat the information that is already available on the AWS website; instead, I am going to give my personal recommendations and observations, as I consider the Learning Path that AWS suggests to be somewhat light for the current level of the exam.
- AWS experience at the architectural level. The exam is largely focused on advanced architecture solution - 5 pillars - and to a lesser extent on development, which is present mainly in services such as Kinesis and Glue. I recommend being in possession of the AWS Architect Solutions Pro certification or alternatively the AWS Architect Associate + AWS Security Specialty.
- Advanced AWS security experience. it is a complete domain of the exam, but can be found - cross domain - in many questions. If you are in possession of the AWS Architect Solutions Pro, general security knowledge may be sufficient - not the specific certification knowledge for each service. Otherwise, the AWS Security Specialty is a good option, or equivalent knowledge in certain services - that I will indicate later on.
- Analytics knowledge. Otherwise, I´d recommend studying books such as “Data Analytics with Hadoop” - O’Reilly 2016, or taking the courses indicated in the AWS Learning Path. Likewise, carry out laboratories or pet projects to obtain some practical experience.
- Hadoop´s ecosystem knowledge. Connected to the previous point. High-level and architectural knowledge of the ecosystem is a must: Hive, Presto, Pig, …
- Knowledge of Machine Learning and IoT - AWS ecosystem. Sagemaker and core IoT services at the architectural level
The questions follow the style of other certifications such as AWS Pro Architect or Security or Databases Specialty. They are all “scenario based”, long and complex - most of them. You are not going to find many simple questions. Certainly, between 5% and 10% of “easy” questions appeared, but all in a “scenario” format.
Let's look at an example taken from the AWS sample questions:
I´d classify this question as "intermediate" level of difficulty. If you have taken the Architect PRO, or some specialty such as Security or Big Data, you will know what I am talking about. Certainly, the level of the questions is much higher and deeper than in the previous version of the exam.
I´d recommend doing the new specialty directly, as the old one contains questions about already deprecated services - or outdated information.
Services to know in depth
AWS Kinesis - in its three modalities, Data Streams, Firehose and Analytics. Architecture, dimensioning, configuration, integration with other services, security, troubleshooting, metrics, optimization and development. Questions of various levels, some of them very complex and of great depth.
AWS Glue - in deep for ETL and discover - an integral part of the exam. Questions of different levels - I did not find them to be the most difficult.
AWS Redshift - architecture, design, dimensioning, integration, security, ETL, backups … a large number of questions and some of them very complex.
AWS EMR / Spark - architecture, sizing configuration, performance, integration with other services, security, integration with the Hadoop ecosystem - very important, but not as important as the previous three services. Very complex questions that require advanced and transversal knowledge of all domains and the Hadoop ecosystem: Hive, HBase, Presto, Scoop, Pig …
Security - KMS encryption, AWS Cloud HMS, Federation, Active Directory, IAM, Policies, Roles etc … in general and for each service in particular. Transversal questions to other domains and of a high difficulty.
Very important services to consider
- AWS S3 - core service base (storage, security, rules) and new features like AWS S3 Select. It appears consistently across all certifications, which is why I´d assume it's known in depth except for the new features.
- AWS Athena - architecture, configuration, integration, performance, use cases. It appears consistently and as an alternative to other services.
- AWS Managed Kafka - alternative to Kinesis, architecture, configuration, dimensioning, performance, integration, use cases.
- AWS Quicksight - subscription formats, service features, different ways of viewing, use cases. Alternative to other services.
- AWS Elastic Search y Kibana (ELK) - architecture, configuration, dimensioning, performance, integration, use cases. Alternative to other services.
- AWS Lambda - architecture, integration, use cases.
- AWS StepFunctions - architecture, integration, use cases.
- AWS DMS - architecture, integration, use cases.
- AWS DataPipeline - architecture, integration, use cases.
- AWS Networking - basic network architectures and knowledge: VPC, security groups, Direct Connect, VPN, Regions, Zones … network configuration of each particular service.
- AWS DynamoDB, ElasticCache - architecture, integration, use case knowledge. These services, which appeared very prominently in the previous version of the exam, have much less weight in the current one.
- AWS CloudWatch, Events, Log - architecture, configuration, integration, use case knowledge.
- AWS RDS y Aurora - architecture, configuration, integration, use case knowledge.
- EC2, Autoscaling - knowledge of architecture, integration, use cases.
- SQS, SNS - knowledge of architecture, integration, use cases.
- AWS Cloudformation - knowledge of architecture, use cases, devops.
- Sagemaker y AWS IoT core - knowledge of architecture, integration, use cases.
- AWS Certification Website.
- Example questions.
- Readiness Course - a must, packed with information and resources - including a 20 question test.
- AWS Whitepapers - Big Data Analytics Options on AWS.
- AWS FAQS for every service - specially for Kinesis, Glue, Redshift, EMR.
- AWS Big Data Blog
- Practice Exam - a must, quite challenging and very representative of the actual exam.
Is it worth then?
Let´s see :)
AWS Data Analytics Specialty is a complex and difficult certification; expensive (300 euros), which requires a very important investment of time - even having experience in analytics and AWS. Therefore, it is not a decision that can be taken lightly.
In my personal case, I found it very convenient to have done it, since I having been working on several projects of that kind - fast data, IoT - under AWS in recent times - apart from being the only certification that I needed to complete the full set of thirteen - if Big Data is included - certifications.
Certifications are a good way, not only to validate knowledge externally, but to collect updated information, validate good practices and consolidate knowledge with real (or almost) practical cases.
For those interested in the analytics field or who have professional experience in it, and who want to make the leap to the cloud, my recommendation is to first obtain an AWS Architect-type certification - preferably PRO - and optionally the Security specialty or equivalent knowledge , at least in the services that I have mentioned in previous points.
For those who already have AWS certifications, but no professional experience in the specific field, it may be a good way to start, but it will not be an easy or short path. I recommend doing labs or pet projects, in order to get some experience necessary to pass the exam.
So is it worth it? Absolutely, but not as a first certification. Especially aimed at people with advanced knowledge of AWS architecture who want to delve deeper into the analytics - cloud field.
Good luck to you all!
somewhat light for the current level of the exam.