Gowtham Potureddi

Posted on May 26

AWS Data Engineer Associate (DEA-C01) Certification: Prep Roadmap

#sql #python #interview #dataengineering

AWS DEA-C01 — the AWS Certified Data Engineer — Associate exam — is the cloud certification that finally maps cleanly to the data engineering day job: ingest, store, transform, catalog, secure, and operate data on AWS. Released in 2024, the exam tests ~85 scored questions in 130 minutes, with a passing scaled score of roughly 720 / 1000, and covers four domains weighted toward Data Ingestion + Transformation (34%), Data Store Management (26%), Data Operations + Support (22%), and Data Security + Governance (18%). If you've spent years gluing AWS data engineer certification services together — S3, Glue, Athena, Kinesis, Redshift, Lake Formation, Step Functions — this exam is the credential that proves it.

This guide is the DEA-C01 study guide field manual: a complete AWS data engineer associate certification roadmap that takes you from "I think I'm ready" to a booked exam slot in eight focused weeks. You'll see the exam blueprint broken down domain-by-domain, an 8-week study plan with reading + lab hours per week, the six minimum-viable hands-on labs that cover every exam domain end-to-end, a four-tier resource stack (official → hands-on → practice → exam day), and the exam-day playbook — proctor setup, time budget per question, flagging strategy, and what to do in the last 10 minutes. Every section walks through a real DEA-C01 exam questions-style scenario so you can pattern-match on the day.

When you want hands-on reps between study sessions, browse Python practice library →, drill ETL Python drills →, sharpen SQL practice →, rehearse streaming Python drills →, or widen coverage on the full data-analysis library →.

On this page

Why DEA-C01 matters and what the exam actually tests
The four DEA-C01 exam domains and how to weight your time
The 8-week DEA-C01 study plan — week by week
Six minimum-viable hands-on labs that cover every domain
The four-tier resource stack and exam-day playbook
Choosing the right DEA-C01 study lever (cheat sheet)
Frequently asked questions
Practice on PipeCode

1. Why DEA-C01 matters and what the exam actually tests

`AWS DEA-C01` — the first AWS certification built for the data engineering job, not the analytics one

The one-sentence invariant: AWS DEA-C01 is AWS's first associate-tier certification that maps directly to the data-engineering job description — pipelines, storage, transformation, security — instead of bolting analytics onto a generalist track. If you've previously side-eyed the now-retired DAS-C01 (Data Analytics — Specialty) for being half BI dashboards, DEA-C01 is the cleaner replacement: every domain is something you actually do in a DE seat.

The exam at a glance.

Code — DEA-C01.
Full name — AWS Certified Data Engineer — Associate.
Release — March 2024 (general availability); current as of 2025-2026.
Format — multiple choice and multiple response.
Question count — ~85 questions total (65 scored + 20 unscored pretest).
Time — 130 minutes (plus 30 minutes total for non-disclosure / surveys = ~2h 40m chair time).
Pass mark — scaled score ≈ 720 / 1000 (AWS does not publish a fixed percentage).
Cost — USD 150 (associate tier); plus optional Official Practice Question Set on Skill Builder.
Delivery — Pearson VUE test centre or PSI / OnVue online proctored from home.
Validity — 3 years; recertify by passing the latest version.
Prerequisites — none required; 2-3 years AWS / data engineering experience recommended.

Who DEA-C01 is for.

Working data engineers on AWS who want a credential that matches the actual day-job.
Cloud or DevOps engineers moving sideways into data.
Analytics engineers who use AWS but mostly through dbt + Snowflake / Redshift and want broader AWS fluency.
Career switchers from BI / analytics / SWE backgrounds preparing for their first DE role at an AWS-shop company.
AWS Solutions Architect Associate (SAA-C03) graduates who want the data-specific follow-on cert.

Who DEA-C01 is **not strictly for.**

Pure ML practitioners — that's the MLS-C01 (Machine Learning Specialty) lane.
Pure BI / dashboard engineers — the data-engineering scenarios on DEA-C01 will feel orthogonal.
Teams on GCP or Azure — different vendor certifications (PDE for GCP, DP-203 for Azure) cover the equivalent ground.

What changed when DAS-C01 retired.

DAS-C01 (Data Analytics — Specialty) was retired in April 2024.
DEA-C01 is the spiritual successor for the data-engineering side; the BI / visualisation half effectively folded into other learning paths.
DEA-C01 is associate-tier (DAS-C01 was specialty-tier) — slightly easier scope, slightly cheaper sticker.
Modern services — DEA-C01 explicitly tests Glue Studio, Lake Formation, Iceberg on Athena, Redshift Serverless, MWAA, DataZone, Step Functions Distributed Map — none of which existed when DAS-C01 was written.

What the exam **does test (the headline themes).**

Designing data pipelines end-to-end on AWS — pick the right ingest, store, transform, and serve services for a given scenario.
Service trade-offs — Glue vs EMR vs Lambda for compute; Redshift vs Athena vs Aurora for serving; Kinesis Data Streams vs Kinesis Firehose vs MSK for streaming.
Operating and monitoring pipelines — CloudWatch metrics, alarms, dashboards, X-Ray traces; Step Functions error handling and retries; DLQs.
Securing data on AWS — IAM least-privilege, KMS encryption (SSE-S3 / SSE-KMS / CSE), Lake Formation tag-based / column-level access, VPC endpoints, PrivateLink, Macie for PII.
Cost optimisation — S3 storage classes and lifecycle, partitioning + compression for Athena, RA3 + AQUA for Redshift, Spot pricing for EMR.

What the exam does **not test.**

Hand-coding bespoke Spark UDFs from memory.
Memorising every single CLI flag for every service.
Deep ML / model-training internals.
Pure visualisation tooling (QuickSight is mentioned, but not the focus).
Hadoop-only on-prem topics.

Why most candidates fail.

Studied SAA-C03 thinking it overlaps — it doesn't, the data services barely come up there.
Watched videos but never opened the console — DEA-C01 is heavy on scenario questions where you must know which knob to turn.
Memorised service names, not service trade-offs — the exam writers love "service A vs service B vs service C, which fits this constraint?" prompts.
Skipped governance / Lake Formation — 18% of the exam, and the part candidates with pure pipeline backgrounds most often skip.
No mock exams — without practice tests, your timing and your weakest domain stay hidden until exam day.

DEA-C01 vs DAS-C01 — the comparison that still comes up.

Aspect	DEA-C01 (current)	DAS-C01 (retired)
Tier	Associate	Specialty
Focus	Data engineering	Data analytics + BI
Cost	USD 150	USD 300
Question count	~85	65
Time	130 min	180 min
Visualisation weight	Light	Heavy (QuickSight)
Streaming weight	Heavy (Kinesis + MSK)	Heavy (Kinesis + MSK)
Status	Active	Retired April 2024

Worked example — the most-common DEA-C01 scenario shape

Detailed explanation. Almost every DEA-C01 question is a short scenario followed by four options. The right answer is rarely the "fanciest" service — it's the one that meets the stated constraint (cost / latency / governance / scale) without over-engineering. Learn this shape and you'll save 20 seconds per question.

Question (DEA-C01-style sample).

A data engineering team ingests clickstream events at a steady rate of ~5 MB/s with bursty spikes up to 20 MB/s for short periods. They need to land the raw data in S3 as Parquet, partitioned by event date, with no custom code to run, and they want to minimise operational overhead. Which solution meets these requirements?

A. Amazon Kinesis Data Streams → AWS Lambda → Amazon S3
B. Amazon Kinesis Data Firehose with dynamic partitioning and Parquet conversion → Amazon S3
C. Amazon MSK → Apache Spark on Amazon EMR → Amazon S3
D. Amazon SQS → AWS Glue streaming job → Amazon S3

Input (the constraints to weigh).

Constraint	Wording in question	What it points at
Throughput	"~5 MB/s, spikes to 20 MB/s"	Firehose or KDS comfortably handle this
Format	"land as Parquet"	Firehose has built-in format conversion
Partitioning	"partitioned by event date"	Firehose dynamic partitioning
Code	"no custom code"	Rules out Lambda + EMR + Glue streaming
Overhead	"minimise operational overhead"	Serverless, fully managed = Firehose

Code. No code needed — the right answer is fully managed. The Firehose configuration that does it:

{
  "DeliveryStreamName": "clickstream-to-s3",
  "ExtendedS3DestinationConfiguration": {
    "BucketARN": "arn:aws:s3:::analytics-raw",
    "DynamicPartitioningConfiguration": { "Enabled": true },
    "Prefix": "clickstream/year=!{partitionKeyFromQuery:year}/month=!{partitionKeyFromQuery:month}/day=!{partitionKeyFromQuery:day}/",
    "ErrorOutputPrefix": "errors/",
    "DataFormatConversionConfiguration": {
      "Enabled": true,
      "OutputFormatConfiguration": {
        "Serializer": { "ParquetSerDe": {} }
      },
      "SchemaConfiguration": {
        "DatabaseName": "analytics",
        "TableName": "clickstream_raw"
      }
    }
  }
}

Step-by-step explanation.

Eliminate option C (MSK + EMR) — explicit "no custom code" rules out a Spark job.
Eliminate option D (SQS + Glue streaming) — Glue streaming jobs are Spark code; also rules out the "no code" constraint.
Eliminate option A (KDS + Lambda) — Lambda is custom code; you'd hand-write the Parquet conversion + partitioning logic.
Pick B (Firehose) — Firehose's built-in Parquet conversion + dynamic partitioning meets every constraint without a single line of code.
Sanity-check — throughput (~5-20 MB/s) is well within Firehose limits; cost is per-GB ingested + delivered, predictable and low.

Output.

Field	Answer
Correct option	B
Why	Only fully-managed, no-code path that converts to Parquet and partitions on the fly
Common wrong pick	A — candidates default to "KDS + Lambda" out of habit
Time it should take	< 60 seconds once you spot the "no custom code" anchor

Rule of thumb: The DEA-C01 exam writers anchor each scenario on one or two constraints (no code, sub-second latency, < $X / month, ACID, exactly-once). Find the anchor first; the right answer usually falls out of three constraints once you eliminate the over-engineered options.

Python
Topic — etl
ETL Python drills

Practice →

Python
Topic — streaming
Streaming Python drills

Practice →

2. The four DEA-C01 exam domains and how to weight your time

`DEA-C01 exam domains` — Ingestion 34%, Store 26%, Ops 22%, Security 18%

DEA-C01 exam domains are the single most important thing to memorise before you plan your study weeks — the percentages dictate where your time goes. Every scored question maps to exactly one of these four buckets.

Domain 1 — Data Ingestion and Transformation (34%).

The biggest chunk of the exam. Expect ~22 of the 65 scored questions here.

Streaming ingest — Kinesis Data Streams, Kinesis Firehose, Amazon MSK (managed Kafka), Kinesis Data Analytics (Apache Flink).
Batch ingest — AWS DataSync, AWS Snow family, AWS DMS (Database Migration Service), AWS Transfer Family (SFTP).
CDC / database replication — DMS with CDC tasks, Aurora zero-ETL integration with Redshift.
Transformation engines — AWS Glue (Spark + Python shell), Amazon EMR (Spark / Hive / Presto), AWS Lambda for lightweight transforms.
Glue specifics — Glue Studio visual editor, Glue Crawlers, Glue Data Catalog, Glue Job bookmarks (incremental processing), Glue DataBrew.
EMR specifics — EMR Serverless, EMR on EC2, EMR on EKS, instance fleets, Spot pricing, managed scaling.
Orchestration — AWS Step Functions, Amazon MWAA (Managed Airflow), EventBridge, Step Functions Distributed Map.

Domain 2 — Data Store Management (26%).

Expect ~17 of the 65 scored questions here.

Object storage — S3, S3 storage classes (Standard / Standard-IA / Intelligent-Tiering / Glacier / Glacier Deep Archive), lifecycle policies, S3 Object Lambda, S3 Select.
Lake formats — Parquet vs ORC vs Avro, Apache Iceberg on Athena, Apache Hudi, Delta Lake.
Data warehouse — Redshift (RA3 nodes, Serverless), Redshift Spectrum, Redshift materialised views, distribution + sort keys.
NoSQL — DynamoDB (LSI / GSI, on-demand vs provisioned, DAX, streams), DocumentDB.
Relational — Aurora (Postgres + MySQL), RDS.
Specialty — OpenSearch Service, Timestream, Neptune (graph).
Catalog — AWS Glue Data Catalog, Lake Formation governed tables, DataZone.

Domain 3 — Data Operations and Support (22%).

Expect ~14 of the 65 scored questions here.

Monitoring — CloudWatch metrics, custom metrics, alarms, dashboards, Container Insights.
Logging — CloudWatch Logs, CloudWatch Logs Insights queries, log groups, log retention.
Tracing — AWS X-Ray for Lambda + Step Functions chains.
Auditing — AWS CloudTrail (management + data events), Config rules.
Error handling — Step Functions Retry / Catch, Lambda DLQs (SQS), Kinesis Firehose error records, Glue job retries.
Performance — Athena partition projection, Glue dynamic frames vs DataFrames, EMR Spot fleets, Redshift workload management (WLM) queues, RA3 + AQUA.
Cost — Cost Explorer, Cost Allocation tags, S3 storage class analysis, Glue auto-scaling, Redshift Serverless RPU caps.

Domain 4 — Data Security and Governance (18%).

Expect ~12 of the 65 scored questions here.

Identity — IAM roles, policies, conditions, IAM Identity Center (formerly SSO), service control policies (SCPs) in AWS Organizations.
Encryption — KMS keys (AWS-managed vs customer-managed), key rotation, SSE-S3 / SSE-KMS / SSE-C, client-side encryption, envelope encryption.
Network isolation — VPC endpoints (Interface + Gateway), PrivateLink, VPC peering, Direct Connect.
Lake Formation — fine-grained access (table / column / row / cell), LF-Tags, cross-account sharing.
PII / sensitive data — Amazon Macie, Glue PII detection transforms, KMS-backed tokenisation.
Compliance frameworks — GDPR, HIPAA, PCI DSS, SOC; AWS Artifact for audit reports.
Data quality — Glue Data Quality (DQDL rules), AWS Deequ.

Worked example — a Domain 4 (Security and Governance) scenario

Detailed explanation. Domain 4 questions are notorious because pipeline engineers haven't usually set up Lake Formation themselves. The exam writers love LF-Tag and column-level scenarios because they sit at the intersection of IAM, Glue Data Catalog, and S3. Pattern: a multi-team scenario where one team must see one column subset and another team a different subset.

Question (DEA-C01-style sample).

A company stores a customers table in S3 (Parquet) cataloged in AWS Glue Data Catalog. Team A (Marketing) must see all columns except ssn and dob. Team B (Finance) must see all columns. Both teams query via Amazon Athena. The solution must be manageable centrally and must not require duplicating the data. Which approach is best?

A. Copy the table twice — once without ssn / dob for Marketing, once full for Finance — and grant each team access to its copy via IAM bucket policies.
B. Use AWS Lake Formation to grant column-level permissions on the customers table — exclude ssn and dob for Marketing's IAM role; grant all columns to Finance's role.
C. Use Athena workgroups with WHERE clauses and trust each user to omit ssn / dob.
D. Encrypt ssn and dob with different KMS keys and only grant Finance access to those keys.

Input.

Constraint	What it rules in / out
"Manageable centrally"	Rules out A (two copies = double maintenance)
"Must not duplicate data"	Rules out A again
"Manageable centrally"	Rules out C (trust isn't access control)
Two teams, different column subsets	Rules in column-level grants
Athena queries	Lake Formation integrates natively

Code (Lake Formation column-level grant via AWS CLI).

# Grant Finance role full SELECT on every column
aws lakeformation grant-permissions \
  --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:role/FinanceRole \
  --permissions SELECT \
  --resource '{
    "Table": {
      "DatabaseName": "analytics",
      "Name": "customers"
    }
  }'

# Grant Marketing role SELECT on all columns EXCEPT ssn, dob
aws lakeformation grant-permissions \
  --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:role/MarketingRole \
  --permissions SELECT \
  --resource '{
    "TableWithColumns": {
      "DatabaseName": "analytics",
      "Name": "customers",
      "ColumnWildcard": {
        "ExcludedColumnNames": ["ssn", "dob"]
      }
    }
  }'

Step-by-step explanation.

Lake Formation owns the Glue Catalog permissions — once you register the database with LF, IAM grants on the catalog stop working and you must use LF grants.
ColumnWildcard.ExcludedColumnNames lets you grant SELECT * minus a denylist — perfect for the "all columns except" pattern.
Marketing IAM role assumes its role, queries Athena; Athena consults Lake Formation, returns rows with ssn / dob null-masked or hidden depending on engine version.
Finance role queries the same table; Lake Formation grants every column.
No data duplication — both teams query the same underlying S3 Parquet files; Lake Formation rewrites the schema per role.

Output.

Field	Answer
Correct option	B
Why	Only LF column-level grants are centrally managed, no-duplicate, and Athena-native
Common wrong pick	D — KMS key separation doesn't hide columns in query results
Time	< 75 seconds once you spot "centrally managed + no duplication"

Rule of thumb: Every Lake Formation question on DEA-C01 has the same shape — multiple teams, different column / row subsets, must avoid duplicating data. The answer is always LF grants (column-level, row-level filters, or LF-Tags) — never IAM-only, never bucket policies, never "copy the data".

SQL
Topic — sql
SQL practice library

Practice →

Python
Topic — etl
ETL pipeline drills

Practice →

3. The 8-week DEA-C01 study plan — week by week

`DEA-C01 study plan` — eight focused weeks, ~8 hours per week, half reading + half hands-on

DEA-C01 study plan works best as eight weeks at ~8 hours per week (64 hours total). The proportions matter more than the order — re-arrange weeks if you already know storage or already use Spark, but don't compress the lab time.

Week 0 — set the foundation (do this before W1).

Buy / download the official Exam Guide PDF from the AWS certification page.
Skim it once end-to-end in 90 minutes — don't try to memorise.
Highlight the task statements under each domain; these are the closest you'll get to the actual exam blueprint.
Pin Exam Guide on your second monitor — re-read the task list every week.
Create a free AWS account (or use a sandbox / employer account if you have one); the labs need real console access.
Bookmark — Skill Builder, AWS Workshops, the Tutorials Dojo cheat sheets, and the Whizlabs / Tutorials Dojo practice exam pages.

Weeks 1-2 — Storage and ingestion (Domain 1 + Domain 2 core).

Day	Topic	Reading hours	Lab hours
W1 D1-2	S3 basics — buckets, keys, storage classes, lifecycle	2	1
W1 D3-4	S3 advanced — versioning, replication, encryption, Object Lambda	2	1
W1 D5-7	Kinesis Data Streams + Firehose + Lambda	2	1
W2 D1-2	Amazon MSK + Kinesis Data Analytics (Flink)	2	1
W2 D3-4	AWS DMS + Aurora zero-ETL + AWS DataSync	2	1
W2 D5-7	Lab 1 — S3 + Glue + Athena lakehouse + Lab 2 — Kinesis + Firehose streaming	1	4

Reading goal — be able to name every storage class + every ingest service and one trade-off for each.
Lab goal — finish Lab 1 (S3 + Glue + Athena lakehouse) and Lab 2 (Kinesis + Firehose streaming) — see §4.

Weeks 3-4 — Compute and transform (rest of Domain 1).

Day	Topic	Reading hours	Lab hours
W3 D1-2	AWS Glue — Studio, Crawlers, Catalog, Bookmarks	2	1
W3 D3-4	Glue jobs — Spark vs Python Shell; DataBrew	2	1
W3 D5-7	Amazon EMR — Serverless, on EC2, on EKS; managed scaling; Spot fleets	2	1
W4 D1-2	Athena — partitioning, partition projection, query plan, workgroups	2	1
W4 D3-4	Redshift — RA3, Serverless, Spectrum, materialised views, dist + sort keys	2	1
W4 D5-7	Iceberg on Athena, Hudi, Delta Lake on AWS	2	1
W4 D6-7	Lab 3 — Glue job + bookmarks + Lab 4 — EMR + Spark + Iceberg	1	4

Reading goal — know Glue vs EMR vs Lambda trade-offs cold; know Redshift vs Athena vs Aurora trade-offs cold.
Lab goal — finish Lab 3 (Glue bookmarks) and Lab 4 (EMR + Iceberg).

Week 5 — Orchestration and ops (Domain 3).

Day	Topic	Reading hours	Lab hours
D1-2	Step Functions — states, error handling, Distributed Map	2	1
D3	Amazon MWAA (Managed Airflow) — DAGs, secrets, Glue / EMR operators	1	1
D4	EventBridge + EventBridge Scheduler	1	1
D5	CloudWatch — metrics, alarms, dashboards, Logs Insights	1	1
D6	CloudTrail + X-Ray + Config	1	1
D7	Lab 5 — Redshift + Spectrum + RA3 (covers DW ops)	1	3

Reading goal — be able to design a Step Functions DAG with retries + DLQ from memory.
Lab goal — finish Lab 5 (Redshift Spectrum) and a small Step Functions side-project chaining Glue + Lambda + Athena.

Week 6 — Security and governance (Domain 4).

Day	Topic	Reading hours	Lab hours
D1-2	IAM — roles, policies, conditions, IAM Identity Center	2	1
D3	KMS — keys, rotation, SSE-S3 vs SSE-KMS, envelope encryption	1	1
D4	Lake Formation — fine-grained access, LF-Tags, cross-account	1	2
D5	VPC endpoints, PrivateLink, Direct Connect	1	1
D6	Macie + Glue PII detection + Glue Data Quality (DQDL)	1	1
D7	Lab 6 — Lake Formation + IAM + column-level ACL	1	3

Reading goal — be able to recite the IAM trust policy / permissions policy split + the Lake Formation column-level grant syntax.
Lab goal — finish Lab 6 (LF column ACL).

Week 7 — Mock exams and gap analysis.

Day	Activity	Hours
D1	Mock 1 (Official Practice Question Set on Skill Builder, 20 questions) + review	2
D2	Gap-fill — re-read weakest two domains	2
D3	Mock 2 (Tutorials Dojo or Whizlabs, 65 questions, timed) + review	3
D4	Gap-fill — drill the 5 services you scored worst on	2
D5	Mock 3 (Tutorials Dojo or Whizlabs, 65 questions, timed) + review	3
D6	Gap-fill	2
D7	Mock 4 (different provider, 65 questions, timed) — target ≥ 80%	3

Reading goal — for every wrong answer, write a one-line "why I missed it" note in a single document; re-read this document every morning of W8.
Lab goal — none; this week is pure practice questions.

Week 8 — Final review and book the exam.

Day	Activity
D1	Re-read the "why I missed it" document; re-read the Exam Guide PDF
D2	Re-read your weakest domain's task statements end-to-end
D3	Re-watch your two weakest service videos (Skill Builder)
D4	Skim cheat sheets (Tutorials Dojo summary PDFs)
D5	Final mock — aim ≥ 85%
D6	Rest day — no AWS content; sleep
D7	Exam day — see §5 for the playbook

Booking the exam — book it during W7 once you're scoring ≥ 75% on mocks. AWS schedules through Pearson VUE or PSI; pick a morning slot if you're an early person, otherwise mid-afternoon.

The 8-week budget in one line.

Total — ~64 hours.
Reading — ~30 hours.
Hands-on labs — ~25 hours.
Mocks + review — ~9 hours.
Per week — ~8 hours; comfortable alongside a day job.

If you only have 4 weeks (cram plan).

Compress W1-2 into W1, W3-4 into W2, W5+W6 into W3, W7+W8 into W4.
Cut the second lab in each section — keep Lab 1, Lab 4, Lab 6.
Take 2 mocks instead of 4.
Doable but stressful — only attempt if you already have 2+ years AWS experience.

Python
Topic — etl
Pipeline study drills

Practice →

Python
Topic — data-manipulation
Data-manipulation patterns

Practice →

4. Six minimum-viable hands-on labs that cover every domain

`DEA-C01 hands-on labs` — the six labs that touch every domain end-to-end

DEA-C01 hands-on labs are non-negotiable. Reading without building leaves gaps that scenario questions will exploit. Six small labs — each ~4-6 hours — cover every exam domain at least once.

Lab 1 — S3 + Glue + Athena lakehouse (Domain 1 + 2).

What you build — CSV → S3 raw → Glue Crawler → Glue Data Catalog → Athena query → S3 results.
Why it matters — the canonical "lakehouse on a budget" pattern; appears on ~10% of exam scenarios.
Key services — S3, Glue (Crawler + Data Catalog), Athena, IAM, KMS.
Time — ~4 hours.
Stretch goal — re-run with the data in Parquet (partitioned by date) and compare Athena scan size + cost.

Lab 2 — Kinesis Data Streams + Firehose + S3 (Domain 1).

What you build — producer (Python boto3) writes events to Kinesis Data Stream → Firehose consumes → converts to Parquet → lands in S3 partitioned by date.
Why it matters — exercises streaming ingest, format conversion, dynamic partitioning, and the KDS vs Firehose trade-off (which the exam loves).
Key services — Kinesis Data Streams, Kinesis Data Firehose, Lambda (optional transform), S3, Glue (for table registration).
Time — ~5 hours.
Stretch goal — swap Firehose for an MSK cluster + a Lambda consumer; observe the operational delta.

Lab 3 — Glue job + bookmarks + partitions (Domain 1).

What you build — A Glue Spark job that reads incremental data from S3 (using job bookmarks to skip files already processed), transforms, and writes partitioned Parquet to S3.
Why it matters — Glue bookmarks are a heavily-tested "how do I avoid re-processing" pattern; partitioning + compression is the cost-control answer to half the Athena scenarios.
Key services — Glue (Spark + bookmarks), S3, Glue Data Catalog, CloudWatch (job metrics).
Time — ~5 hours.
Stretch goal — add Glue Data Quality rules (DQDL) and fail the job on rule breach; emit results to CloudWatch.

Lab 4 — EMR + Spark + Iceberg (Domain 1 + 2).

What you build — an EMR Serverless application that runs a PySpark job creating an Apache Iceberg table on S3, doing an ACID MERGE INTO (upsert), and querying it from Athena.
Why it matters — Iceberg shows up across both ingest and store domains; EMR Serverless vs EMR on EC2 is a frequent trade-off question.
Key services — EMR Serverless, Spark, Apache Iceberg, Glue Data Catalog, S3, Athena.
Time — ~6 hours.
Stretch goal — schema-evolve the table (add a column) and verify Athena query reads old + new partitions correctly.

Lab 5 — Redshift + Spectrum + RA3 (Domain 2 + 3).

What you build — a Redshift Serverless workgroup with one materialised view + one Redshift Spectrum external table backed by S3 Parquet.
Why it matters — Redshift questions test dist + sort keys, RA3 vs DS2, Spectrum vs COPY, and Serverless RPU limits.
Key services — Redshift Serverless, Redshift Spectrum, S3, Glue Data Catalog, CloudWatch.
Time — ~5 hours.
Stretch goal — pause + resume Serverless; confirm cost stops; configure WLM queues.

Lab 6 — Lake Formation + IAM + column-level ACL (Domain 4).

What you build — register your Lab 1 lakehouse with Lake Formation; create two IAM roles (Marketing, Finance); grant column-level + LF-Tag access; verify each role sees the correct columns from Athena.
Why it matters — Domain 4 is the part most candidates skip and the part the exam writers love most. Building it once will save you 8-10 questions on exam day.
Key services — Lake Formation, IAM, Glue Data Catalog, S3, Athena.
Time — ~5 hours.
Stretch goal — add a row-level filter (e.g. region = 'EMEA') for a third role; cross-account share via LF.

Lab order — read this carefully.

Do Lab 1 first — sets up the catalog and S3 layout you'll reuse.
Do Lab 6 last — depends on Lab 1's catalog; tying it off at the end cements security thinking.
Labs 2-5 can be done in any order but follow the W1-W6 schedule for momentum.

Where to find ready-made lab scripts.

AWS Workshops (workshops.aws) — Data Engineering on AWS — Foundations, Building Data Lakes, Lake Formation workshops.
AWS Skill Builder Builder Labs — gated paid sandbox labs that give you a real account for 1 hour.
GitHub — search "aws data engineering workshop" or "dea-c01 lab"; the AWS Samples org publishes most templates as CloudFormation.

Worked example — Lab 1 end-to-end (the canonical S3 + Glue + Athena lakehouse)

Detailed explanation. This is the most-built lab in DEA-C01 prep. Walk through it once and you'll recognise every Glue + Athena exam question for the next year. The flow: drop CSV in S3 → run a Crawler → query in Athena → rewrite to Parquet → re-query and compare scan size + cost.

Question (lab task).

A sales.csv file (1 GB, columns order_id, customer_id, order_date, region, amount) is in s3://my-raw/sales/sales.csv. Build a queryable Athena table over it, then create a partitioned Parquet copy in s3://my-curated/sales/ and confirm the Parquet query scans less data.

Input.

Item	Value
Raw file	`s3://my-raw/sales/sales.csv`
Size	1 GB CSV
Columns	`order_id`, `customer_id`, `order_date`, `region`, `amount`
Target	Athena queries; minimise scan cost
Partition column	`order_date` (truncate to month)

Code.

# 1. Create the Glue database
aws glue create-database --database-input Name=sales_lake

# 2. Create the Glue Crawler
aws glue create-crawler \
  --name sales-csv-crawler \
  --role AWSGlueServiceRoleDefault \
  --database-name sales_lake \
  --targets '{"S3Targets": [{"Path": "s3://my-raw/sales/"}]}'

# 3. Run the Crawler — it discovers schema and registers the table
aws glue start-crawler --name sales-csv-crawler

# 4. Query in Athena (the CSV table is "sales")
#    Run in the Athena console:
#      SELECT region, SUM(amount) FROM sales_lake.sales GROUP BY region;
#    Note the data scanned (~1 GB).

# 5. Use CREATE TABLE AS SELECT (CTAS) to write a partitioned Parquet copy
#    Run in the Athena console:
#      CREATE TABLE sales_lake.sales_parquet
#      WITH (
#        format = 'PARQUET',
#        external_location = 's3://my-curated/sales/',
#        partitioned_by = ARRAY['order_month']
#      ) AS
#      SELECT
#        order_id, customer_id, order_date, region, amount,
#        date_format(order_date, '%Y-%m') AS order_month
#      FROM sales_lake.sales;

# 6. Re-run the aggregation against sales_parquet — note the much smaller data scan.

Step-by-step explanation.

Glue database — the namespace for tables; sales_lake is the catalog DB.
Glue Crawler — points at the S3 prefix, scans files, infers schema, registers a table sales in sales_lake.
Run the Crawler — takes 30-60 seconds; check the Glue console for the new table.
Athena query against CSV — scans the full 1 GB on every query (Athena charges per TB scanned).
CTAS to Parquet + partition — Athena writes columnar Parquet partitioned by order_month; query planner can now skip irrelevant partitions.
Re-query Parquet — scan drops to ~50-100 MB for a single region aggregation; cost drops ~10-20×.

Output.

Query	Format	Partitioned	Data scanned	Athena cost (approx)
`SELECT region, SUM(amount)…`	CSV	No	~1 GB	$0.005
`SELECT region, SUM(amount)…`	Parquet	Yes (by month)	~50 MB	$0.00025

Rule of thumb: Every Athena cost question on the exam has the same answer — partition + compress (Parquet) + columnar. Build this lab once and the answer is muscle memory.

SQL
Topic — aggregation
Aggregation SQL drills

Practice →

Python
Topic — data-analysis
Data-analysis library

Practice →

5. The four-tier resource stack and exam-day playbook

`DEA-C01 study resources` — official → hands-on → practice → exam day, in that order

DEA-C01 study resources are best stacked as four tiers that compound. Skip the bottom tier and the upper tiers cost more; over-invest in the middle tiers and you forget the official wording the exam grades against.

Tier 1 — Official (start here, free or near-free).

AWS Certified Data Engineer — Associate Exam Guide PDF — the single most important document. Re-read weekly.
AWS Skill Builder Learning Plan — "Standard Exam Prep Plan: AWS Certified Data Engineer — Associate" (free Skill Builder tier).
AWS Skill Builder Official Practice Question Set — ~20 official questions; same authoring team as the live exam; identical wording cadence (low-cost subscription).
AWS Whitepapers (read just these — not all of them):
- AWS Well-Architected Framework — Data Analytics Lens (essential).
- Lake Formation Best Practices.
- Big Data Analytics Options on AWS.
- Data Warehousing on AWS.
- Securing Data on AWS.
AWS re:Invent talks — search YouTube for "DEA-C01 exam prep" and the latest re:Invent "What's new in Glue / EMR / Redshift" sessions.

Tier 2 — Hands-on (do every workshop you can fit).

AWS Workshops (workshops.aws) — Data Engineering on AWS, Building Data Lakes, Lake Formation, Iceberg on AWS, Redshift Serverless workshops.
AWS Skill Builder Builder Labs — paid sandbox accounts; 1-hour scoped labs with real consoles.
AWS Free Tier sandbox — your own account; you can finish every lab in §4 for < $10 of charges if you tear down after each session.
GitHub — aws-samples org publishes CloudFormation + CDK templates for almost every reference architecture.

Tier 3 — Practice exams (the highest-leverage tier in W7).

AWS Official Practice Question Set — the gold standard; 20 questions, same authors as the live exam.
Tutorials Dojo (Jon Bonso) — widely considered the closest third-party question style; ~390 questions across multiple test modes.
Whizlabs — older third-party; cheaper; question quality is mixed but volume is high.
Stéphane Maarek / Neal Davis Udemy practice tests — variable quality but cheap; useful as fill-in volume.
Score-target rule — aim for ≥ 80% on three different providers before booking the exam.

Tier 4 — Exam day.

Pearson VUE test centre — quiet, in-person, no proctor camera; book early as slots fill quickly.
PSI test centre — alternative test centre operator; similar experience.
Online proctored (PSI OnVue) — at-home; webcam, microphone, room scan; bring your patience — check-in can take 30 minutes.

Online-proctor checklist (do this 24 hours before).

Quiet room with a door you can lock; no second monitor; clear desk.
Government ID (passport or driver's licence); name must match the registration exactly.
Webcam and microphone working; run the proctor app's system test 24 hours in advance.
Wired internet if possible; mobile hotspot as backup.
Close every other app; the proctor app will refuse to start otherwise.
No water bottle on the desk during the exam (rules vary by provider; check yours).
Bathroom break — allowed but the clock keeps running; pee first.

Exam-day time budget.

130 minutes / 85 questions = ~92 seconds per question.
Aim for 75 seconds per question on the first pass, leaving ~15 minutes for the flagged-question second pass.
Flag any question you're not 90% sure of; don't agonise. Come back.
Never leave a question blank — there's no penalty for wrong answers; guess if you must.

The two-pass strategy.

Pass 1 (75 minutes) — answer everything quickly; flag the unsure ones.
Pass 2 (40 minutes) — re-read every flagged question; eliminate one wrong option at a time.
Last 10 minutes — sanity-check the unflagged answers; trust your gut on flagged ones.

Pattern-matching tricks for the day.

"No custom code" / "no operational overhead" → fully managed (Firehose, Athena, Glue Studio, MWAA, Step Functions).
"Sub-second latency" → DynamoDB or Redshift Serverless (not Athena, not Glue, not S3 alone).
"Petabyte-scale ad-hoc SQL on S3" → Athena or Redshift Spectrum (not Aurora).
"Multi-team, different column subsets" → Lake Formation column grants (not IAM-only).
"Cost-optimise S3" → Lifecycle → Intelligent-Tiering or Glacier; compression + partition.
"Stream + windowed aggregation" → Kinesis Data Analytics (Flink) or Spark Structured Streaming on EMR.
"Exactly-once + ACID on a data lake" → Iceberg / Hudi / Delta — not raw Parquet.
"Audit who queried what" → CloudTrail data events + S3 access logs.

What happens after you click Submit.

Provisional pass / fail shown on the screen immediately (online proctor) or at the test centre.
Official score report in your AWS Certification account within 5 business days.
Detailed domain-level breakdown in the score report; useful even if you passed (to see what to reinforce).
Digital badge issued via Credly within a week.
Validity — 3 years; recertify by passing the latest exam version (no separate "recert" exam).

If you fail.

Wait 14 days before retake; AWS-enforced cooldown.
Re-read your score report; identify the bottom domain.
Spend two weeks rebuilding that domain end-to-end (re-do the lab, re-read the whitepaper, re-take the practice test).
Retake fee — full USD 150 again.

Common day-of mistakes.

Over-thinking obvious questions — if four options are obvious eliminations, pick the remaining one and move on.
Changing too many answers on Pass 2 — first-instinct accuracy is usually higher; only change if you spot a misread.
Misreading "EXCEPT" or "NOT" — the exam loves negation in stems; underline the negation on your scratchpad.
Running out of time on the last 10 questions — the two-pass strategy prevents this; stick to it.

Python
Topic — etl
Exam-prep pipeline drills

Practice →

Python
Topic — real-time-analytics
Real-time analytics drills

Practice →

Choosing the right DEA-C01 study lever (cheat sheet)

A one-screen cheat sheet for the most-asked AWS DEA-C01 prep questions.

You want to …	Lever	Notes
Understand the exam scope	Exam Guide PDF	Single source of truth; re-read weekly
Build foundational fluency	AWS Skill Builder Learning Plan	Free tier covers most of it
Get hands dirty	Six labs from §4	Build them on your own AWS account
Practice exam writing	Tutorials Dojo + Skill Builder Official Practice	Target ≥ 80% on three providers
Compare Glue vs EMR vs Lambda	Domain 1 of Exam Guide	Pick by code-amount + scale + cost
Compare Redshift vs Athena vs Aurora	Domain 2 of Exam Guide	Pick by query pattern + freshness + scale
Understand Lake Formation	Whitepaper + Lab 6	LF-Tags + column-level grants are exam-favourite
Get fluent with Step Functions	AWS Workshop + Lab 5 stretch goal	Retry / Catch / Distributed Map are tested
Tune Athena cost	Lab 1 stretch goal	Partition + Parquet + compression; never `SELECT *`
Diagnose a slow Glue job	Glue job metrics in CloudWatch	Spark UI + executor count; consider auto-scaling
Book the exam	Pearson VUE or PSI portal via aws.training	Morning slot if early-bird; mid-afternoon if not
Pass the exam	Two-pass strategy + flag every uncertain	75s per question on Pass 1, 40 min for Pass 2
Recertify in 3 years	Pass the latest DEA-C01 version	No separate "recert" exam
Upgrade after DEA-C01	SAP-C02 (Solutions Architect Professional)	Or MLS-C01 if you're pivoting to ML

Frequently asked questions

Is the AWS DEA-C01 certification worth it in 2026?

Yes — for data engineers already on AWS or moving to an AWS-shop company, AWS DEA-C01 is the most relevant cloud certification on the market. It's the first AWS associate-tier certification built specifically for the data-engineering job description (ingest, store, transform, secure, operate) rather than bolting analytics onto a generalist track. Most AWS-shop employers (Amazon, Capital One, JPMorgan, Disney+, hundreds of mid-market companies) explicitly list DEA-C01 in DE job descriptions; some pay a per-cert bonus. The cert also signals that you've built (not just read about) lakehouse, streaming, orchestration, and governance pipelines — which is exactly what AWS-shop interviews probe. If you're on GCP or Azure, the equivalents (Google PDE, Azure DP-203) carry the same signal in their ecosystems; DEA-C01 is specifically the AWS one.

How long does it take to study for the DEA-C01?

DEA-C01 study time depends heavily on your starting point. For an engineer with 2-3 years of AWS data-engineering experience, 8 weeks at ~8 hours per week (64 total) is comfortable — see the W1-W8 plan in §3. For a candidate with general AWS experience (e.g. SAA-C03 holder) but light data exposure, plan 10-12 weeks to give yourself two extra weeks of labs. For a complete AWS beginner, 3-4 months is realistic — you'll need to learn IAM, VPC, S3 basics before tackling the data services. The single biggest time-saver is building the labs (§4) — reading without console reps leaves the kinds of gaps that scenario questions exploit. The single biggest time-waster is watching tutorial videos without taking notes or building anything alongside.

DEA-C01 vs DAS-C01 — what's the difference?

DEA-C01 vs DAS-C01 is a non-question in 2026 because DAS-C01 (Data Analytics — Specialty) was retired in April 2024. DEA-C01 (Data Engineer — Associate) is its spiritual successor for the engineering side. The differences: DEA-C01 is associate-tier (DAS-C01 was specialty-tier), cheaper (USD 150 vs USD 300), shorter (130 vs 180 minutes), more questions (~85 vs 65), and explicitly covers modern services that didn't exist when DAS-C01 was written — Glue Studio, Lake Formation column-level grants, Iceberg on Athena, Redshift Serverless, MWAA, DataZone, Step Functions Distributed Map. DAS-C01 also leaned heavier on QuickSight and BI; DEA-C01 is heavier on pipeline orchestration and governance. If you're still seeing DAS-C01 in a study guide, that guide is outdated — use DEA-C01 material.

What kind of salary uplift does DEA-C01 unlock?

The certification itself isn't a magic salary lever — the underlying skills and the projects you can now talk about are. That said, US-market signals (Levels.fyi, Glassdoor, Burning Glass / Lightcast data through 2025) show DEA-C01-holders self-reporting a 5-15% uplift when they switch employers, with the higher end concentrated in financial-services + healthcare AWS-shop roles where the cert is a soft prerequisite for compliance interviews. Internal promotion uplift is usually smaller (1-3%) but the cert often accelerates promotion by 6-12 months by un-blocking access to more senior data-platform projects. The biggest non-salary win is role mobility — the cert is one of the few credentials that's portable across every AWS-shop employer worldwide, which expands your hiring pool dramatically.

How many practice tests should I take before booking the exam?

DEA-C01 practice tests are the single highest-leverage activity in W7 — plan for at least four full-length timed mocks (65 questions, 130 minutes, lock yourself in a room). Take one official AWS Skill Builder practice set (20 questions, same authors as the live exam) plus three third-party 65-question mocks (Tutorials Dojo is the closest in style; Whizlabs and Stéphane Maarek's Udemy sets fill volume). Your score-target rule: ≥ 80% on three different providers before booking the exam, and ≥ 85% on the final mock the day before. Mock-exam review is more important than the mock itself — for every wrong answer, write a one-line "why I missed it" note in a single document, re-read that document the morning of the exam. Skipping mocks is the #1 reason candidates who "felt ready" fail on the day.

Can I pass DEA-C01 without hands-on AWS experience?

Technically yes, practically no. The exam is heavily scenario-based — almost every question describes a real architecture and asks you to choose the service combination that meets the constraints. Without hands-on console time you'll struggle to weigh trade-offs (Glue vs EMR vs Lambda; Redshift vs Athena vs Aurora; Kinesis Data Streams vs Firehose vs MSK) because those trade-offs are easier to feel than to memorise. The good news is AWS Free Tier + the six labs in §4 cost less than USD 10 in total charges if you tear down each lab after the session. If your employer doesn't give you a sandbox account, spin up a personal one for the prep period. Two months of weekend console time will outperform two months of pure video watching on every metric the exam cares about.

Practice on PipeCode

PipeCode ships 450+ data-engineering interview problems — including Python practice and SQL practice keyed to the same patterns the AWS DEA-C01 exam tests: pipeline thinking, partition / cost trade-offs, streaming aggregation, set-based SQL on lake / warehouse tables, and the operational scenarios that show up in both certification mocks and real interview loops. Whether you're prepping for the cert, a DE interview at an AWS-shop company, or both, the practice library mirrors the same mental model this roadmap teaches.

Kick off via Explore practice →; drill the Python practice lane →; fan out into the ETL lane →; rehearse SQL practice →; reinforce data-manipulation drills →; widen coverage on the full streaming Python library →.

Top comments (1)

PracHub • May 27

The focus on AWS services like S3, Glue, and Redshift aligns well with what we've seen in data engineering interviews. The DEA-C01 highlights understanding service trade-offs. Designing AWS data pipelines can be complex, especially when choosing between cost, performance, and scalability. At PracHub, we track these practical angles in our interview questions, focusing on selecting the right tools for specific scenarios. For anyone preparing for this certification, prachub.com offers a variety of current questions reflecting these themes, filterable by role and domain.

1. Why DEA-C01 matters and what the exam actually tests

AWS DEA-C01 — the first AWS certification built for the data engineering job, not the analytics one

Worked example — the most-common DEA-C01 scenario shape

2. The four DEA-C01 exam domains and how to weight your time

DEA-C01 exam domains — Ingestion 34%, Store 26%, Ops 22%, Security 18%

Worked example — a Domain 4 (Security and Governance) scenario

3. The 8-week DEA-C01 study plan — week by week

DEA-C01 study plan — eight focused weeks, ~8 hours per week, half reading + half hands-on

4. Six minimum-viable hands-on labs that cover every domain

DEA-C01 hands-on labs — the six labs that touch every domain end-to-end

Worked example — Lab 1 end-to-end (the canonical S3 + Glue + Athena lakehouse)

5. The four-tier resource stack and exam-day playbook

DEA-C01 study resources — official → hands-on → practice → exam day, in that order

Choosing the right DEA-C01 study lever (cheat sheet)

Frequently asked questions

Is the AWS DEA-C01 certification worth it in 2026?

How long does it take to study for the DEA-C01?

DEA-C01 vs DAS-C01 — what's the difference?

What kind of salary uplift does DEA-C01 unlock?

How many practice tests should I take before booking the exam?

Can I pass DEA-C01 without hands-on AWS experience?

Practice on PipeCode

`AWS DEA-C01` — the first AWS certification built for the data engineering job, not the analytics one

`DEA-C01 exam domains` — Ingestion 34%, Store 26%, Ops 22%, Security 18%

`DEA-C01 study plan` — eight focused weeks, ~8 hours per week, half reading + half hands-on

`DEA-C01 hands-on labs` — the six labs that touch every domain end-to-end

`DEA-C01 study resources` — official → hands-on → practice → exam day, in that order