DEV Community

Discussion on: AWS Glue first experience - part 4 - Deployment & packaging

Collapse
 
sardbaba profile image
Mauro Mascia

A note for the TempDir in Glue ETL Pyspark case: the fact that is temporary could probably let understimate its role in the process. Instead, from my experience, it is preferred to be pointed to a specific prefix in a bucket to avoid reaching the limit of 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second.
So it is be better to use i.e. s3://aws-glue-temporary--us-east-1/ds1_raw_to_refined/, specifying a prefix for each job.
In fact an error like "503 Slow Down" may appear, without however indicating the source of the problem, which it can be the temporary folder, , not the destination one.