Emilia Nenova

Posted on Apr 4, 2023

Pass the AWS Certified Data Analytics - Specialty (DAS-C01) Exam 2023

#aws #dataanalytics #awsspecialty #certification

In this blog post, I will explain how to prepare for the AWS Data Analytics Specialty exam. It is advisable to already have an Associate certification so that you are familiar with the main AWS services.

The exam covers the following domains:
Domain 1: Collection - 18%
Domain 2: Storage & Data Management - 22%
Domain 3: Processing - 24%
Domain 4: Analysis and Visualization - 18%
Domain 5: Security - 18%

I passed the exam in March 2023, and here are my personal steps for preparing:

Read the official exam guide AWS Exam guide. It describes what kind of knowledge, topics, and services will be covered. I discovered that Kinesis Video Analytics is not included in the exam, although it appeared shortly on a course and practice exams.
I recommend watching a video course, especially if you have limited experience working with AWS on a platform like ACloud Guru, Cloud Academy, or WhizLabs. For me, writing notes during the course and taking screenshots of the presentations are helpful, so I can go over them later.
Take note of the AWS services to put your primary focus on, like Glue, the Kinesis Family, Redshift, QuickSight, OpenSearch, Athena, EMR, and S3. Additionally, gather services that are still relevant but less important such as Lake Formation, AWS MSK, DMS, or DataSync.
The next step is to get a deep understanding of the "top-level" services and a more general understanding of the "second-level" services. The courses usually can't cover every available feature and all of the best practices. You can go through the documentation and FAQs for each service and write down any additional information that you find significant. Pay attention to how the service integrates with other services, how it deals with encryption, logs, user access control, sharing between accounts/regions, and what are suitable use cases for it. Ask yourself when you would choose that service compared to another and take into account the following:
- cost
- how easy it is to set up and maintain
- is it (near) real-time, or is a delay acceptable For example, you can compare Kinesis vs. AWS MSK or SQS FIFO. On this website, you can find multiple comparison tables Kinesis vs. AWS MSK. Consider common errors and CloudWatch Metrics and how to solve the issue or improve the performance. Here are a few examples:
  - OpenSearch - JVMMemoryPressure
  - Kinesis - ProvisionedThroughputExceededException
  - Kinesis Data Analytics - MillisBehindLatest
  - EMR - YarnMemoryAvailablePercentage
On YouTube, you can find great AWS Tech talks with best practices for specific services, which I recommend you to watch. I also recommend checking out the Johnny Chivers channel, which contains many videos on all Data Analytics services with hands-on examples.
The last step is solving practice exams to test your knowledge - I recommend doing the Tutorials Dojo practice tests as they were the closest to the actual exam and contained great explanations for each question. I also found 25 free questions from WhizLabs and one YouTube video that was also useful.

Here are my notes on what to definitely research for each service:

Redshift:

Distribution styles
Redshift Spectrum
GRANT / REVOKE statements
KMS / HSM encryption
Classic, elastic resize, concurrency scaling
VACUUM
COPY Command (sources, syntax, the number of files should be a multiple of the number of slices, ensure that the files are roughly the same size, between MB and 1 GB after compression; using a manifest file)
UNLOAD Command
Audit logging
Materialized views
Snapshots
Data API
WLM

Glue:

Glue Crawlers
Glue Jobs (DPUs, types, transformations, bookmarks)
Glue Triggers

Athena:

Integrations
File format conversion
Workgroups
Partitioning

QuickSight:

Pay attention to what is available only in the Enterprise edition
When to use different types of charts
Embedding in a website or app
Data encryption (at rest and in transit)
Connecting to resources in a private subnet
Authentication options
Permission to access only specific tables or rows
Refreshing data

Kinesis:

For each service what are producers(source) and consumers(destination)
Data streams - limits, shards, enhanced fan-out, KCL/KPL/Kinesis agent/PutRecord(s)
Firehose - record format conversion and lambda transformation, buffer size and interval, compression, PutRecord/ PutRecordBatch, source record backup
Data Analytics - SQL/Apache Flink app, multiple in-application input streams, windowed queries, Random ForestCut feature
AWS KMS - ZooKeeper nodes and broker nodes, writing to topics, scaling the cluster

EMR:

Primary, core, and task nodes + when to use on-demand vs. spot instances (instance fleets vs. instance groups)
Storage - HDFS vs. EMRFS
Transient vs. Long-running clusters
Bootstrap actions, jobs, steps
Security configuration for Encryption, Kerberos authentication, and EMRFS authorization for S3
Data replication across nodes
Compression algorithms
S3DistCp
Managed Scaling vs. Custom Auto Scaling
Storing logs (S3)
Orchestration with Step Functions

OpenSearch:

Data sources
Shards, replicas
Storage types (Hot, UltraWarm, Cold) + ISM
Slow logs
Cross-cluster search and replication + Multi-AZ deployment
SAML authentication
Fine-grained access control
Refresh interval
OpenSearch Dashboards - authentication, permissions, and sharing

Lake Formation

Blueprints
Handling permissions and sharing between accounts

DMS

Tasks + change data capture (CDC)

DataSync

Compare with Transfer family and Snowball

If I have missed something or you have additional recommendations, please leave a comment. Hopefully, my steps and bullet points will help you to prepare well, and I wish you the best of luck if you've already booked the exam :)

DEV Community

Pass the AWS Certified Data Analytics - Specialty (DAS-C01) Exam 2023

Redshift:

Glue:

Athena:

QuickSight:

Kinesis:

EMR:

OpenSearch:

Lake Formation

DMS

DataSync

Top comments (0)

Read next

Amazon Q Developer Tips: No.17 Choose the right tool

AWS Landing Zone - Overview

Monitoring AWS Lambda Functions with AWS X-Ray and CloudWatch: Advanced Technique

Streaming of Desktop Applications Securely on Web Browser Using Amazon AppStream 2.0