1. What happens when you move data to Amazon S3 Glacier
When you use an S3 Lifecycle rule to transition .csv
files from S3 Standard → S3 Glacier (Flexible Retrieval), the objects are archived for long-term, low-cost storage.
They’re still in S3 — but you can’t directly read them until you restore them.
A restore temporarily copies the object back into S3 Standard (or S3 Standard-IA) so it can be accessed.
2. Glacier retrieval options (for S3 Glacier Flexible Retrieval)
Retrieval Type | Typical Time to Access | Cost | Notes |
---|---|---|---|
Expedited | 1–5 minutes | Highest | Good for urgent, small retrievals |
Standard | 3–5 hours | Moderate | Default, good balance for planned jobs |
Bulk | 5–12 hours | Lowest | Best for large data restores |
Because your question says:
“ML trainings and audits are planned weeks in advance”
You can easily schedule a Standard or Bulk retrieval a few hours before training starts — very cost-effective.
3. Can the ML jobs still read the data?
✅ Yes — absolutely.
You just need to initiate a restore from Glacier before training.
Once restored, the .csv
objects are temporarily available for normal access (e.g., 1–7 days depending on the restore duration you choose).
Then they automatically go “cold” again in Glacier to keep costs low.
4. Why this works well for the question
- Training only happens twice a year, so the
.csv
files spend most of their time cold in Glacier. - Retrieval delay of a few hours is acceptable because the ML runs are pre-scheduled.
- Glacier cost per GB/month is much lower than S3 Standard or One Zone-IA, so total cost is minimal.
✅ In short:
With Glacier, your data is still retrievable, just not immediately.
Typical retrieval delay = 3–5 hours (Standard retrieval) — perfect for planned ML jobs.
Top comments (0)