Discussion on: Serverless ETL/ELT Architecture with S3, EventBridge, Lambda, Step Functions, and Glue

View post

The ETL vs ELT decision point you're making explicit is worth highlighting: landing raw/validated data first and deferring transformation to downstream engines is the right default for data lake architecture because it preserves optionality. If your transformation logic is wrong, raw data is still there; if a new use case needs differently-shaped data, you don't have to re-ingest.

The quarantine handling and schema drift sections are where most architecture walkthroughs skip the interesting part. Schema drift in particular is an underappreciated operational problem — data sources change their schemas without warning, and a pipeline that fails silently on schema changes is much worse than one that fails loudly. The EventBridge + Step Functions orchestration gives you good hooks for that, but the schema drift detection logic itself (what counts as breaking vs additive change) is where most teams need to make explicit decisions early.

What's your approach for schema versioning in Glue — catalog-managed or metadata alongside the data?