DEV Community

Camille Chang
Camille Chang

Posted on

From Certification to Real-World AWS: Troubleshooting

When I started working in IT, I thought AWS certifications would give me the confidence to handle real-world challenges. I passed several exams, but in practice, I realised certifications only provide a broad understanding of AWS services. The real lessons come when you’re stuck debugging production issues — that’s when the learning truly sticks.

Recently, I encountered a particularly challenging problem at work. In repo A, I wrote a feature in AWS SAM (YAML) using Lambda, SNS, EventBridge, and S3. Unexpectedly, my colleague said that our new feature needed to be in repo B instead. Repo B was written in Terraform, but it didn’t yet have reusable modules for Lambda — and I had zero Terraform experience. What I thought would take a single day to deploy and test ended up taking me several days, even with the help of AI. Finally, my PR was approved, the code was merged and deployed… and then IAM role issues appeared: insufficient permissions.

The role’s definition was in another repo, C, built with CloudFormation and CodeBuild. Many AWS policies couldn’t be reused because they used overly broad Resource * permissions. We needed fine-grained policies specifying the exact actions and resources. At first, I tried to add many actions at once into the role, but CodeBuild kept failing. I switched to adding them incrementally — compile, deploy repo B, test permissions, repeat. This wasted an entire day.

In the evening, a colleague joined me and pointed out the real problem:

Cannot exceed quota for policy size: 6144. ServiceLimiteExceeded.

At that moment, everything clicked. I created a new IAM policy, attached it to the role, and the issue was resolved.

Looking back, I realized I was only seeing part of the problem. Each CodeBuild run would generate logs in S3, and I only looked at those logs, which simply said “role failed to update.” I hadn’t gone deeper into the corresponding CloudFormation stack to investigate the exact reason. If I had checked that right away, I could have avoided wasting so much time adding permissions piece by piece.

Lessons Learned

  • Check root causes early – Logs in S3 only said “role failed to update.” The detailed error was in CloudFormation. I should have traced it sooner.
  • Don’t overload IAM policies – AWS IAM has a strict 6,144-character policy size limit. Split large policies into smaller ones.
  • Hands-on beats theory – Certifications gave me a foundation, but real troubleshooting taught me far more.
  • Ask for help – Sometimes a colleague’s perspective saves hours (or days).

Top comments (0)