DEV Community

Shankar
Shankar

Posted on

Adding API Gateway to My Cloud Resume

Five Failures in One Evening: Adding API Gateway to My Cloud Resume

In my previous article, I documented migrating my Cloud Resume from ClickOps to Terraform. The system worked: S3 + CloudFront for the frontend, a Lambda Function URL for the visitor counter, DynamoDB for persistence, and GitHub Actions for CI/CD.

But the Lambda Function URL had a problem. It was a bare endpoint with no throttling, no API key, and no usage tracking. Anyone with the URL could call it a million times and I'd be paying for a million DynamoDB writes.

I added API Gateway in front of the Lambda. Five things broke.

What I added

Three new Terraform modules:

An api-gateway module (modules/api-gateway/) with a REST API exposing GET and POST on /visitors, both requiring an API key. It includes a usage plan with rate limiting (5 requests/sec, burst of 10), a monthly quota of 10,000 requests, a MOCK integration for CORS preflight, and CloudWatch access logging on the prod stage.

A vpc module (modules/vpc/) with a 10.0.0.0/16 network, two public subnets and two private subnets across us-east-1a and us-east-1b. I skipped the NAT Gateway because that's $32/month I don't need yet. This is prep for Phase 2 when I add containers or RDS.

A dns module (modules/dns/) for ACM certificate and API Gateway custom domain mapping to api.arlingtonhood21.work, gated behind a feature flag (enable_custom_domain = false) since it requires manual DNS validation.

The frontend JavaScript changed from calling the Lambda Function URL directly to calling the API Gateway with an x-api-key header.

Failure 1: Resources already exist

I pushed everything to main. The backend CI ran terraform apply and failed with four errors:

ResourceAlreadyExistsException: CloudWatch Logs log group /aws/apigateway/visitor-api already exists
EntityAlreadyExists: Role with name api-gateway-cloudwatch-role already exists
ResourceConflictException: The statement id (AllowAPIGatewayGetInvoke) provided already exists
Enter fullscreen mode Exit fullscreen mode

I had created these resources locally with terraform apply before the CI existed. The CI's import step only covered root-level resources (DynamoDB, Lambda, SNS). The new API Gateway module resources weren't in the import list.

The fix was adding six terraform import commands for the module resources: the CloudWatch log group, IAM role, role policy attachment, both Lambda permissions, and the API Gateway account.

Failure 2: CI didn't trigger

I pushed the import fix to .github/workflows/backend-cicd.yml. Nothing happened.

The workflow trigger only watched resume-backend/**. The workflow file itself lives at .github/workflows/backend-cicd.yml, outside that path. GitHub Actions path filters are literal glob matches. If the file you change doesn't match the path filter, the workflow doesn't run.

I added the workflow file to its own trigger and threw in workflow_dispatch for manual runs:

paths:
  - 'resume-backend/**'
  - '.github/workflows/backend-cicd.yml'
workflow_dispatch:
Enter fullscreen mode Exit fullscreen mode

Failure 3: New API Gateway, old URL

After the CI succeeded, the visitor counter showed "--" instead of a number.

The import step had only imported resources with globally-unique identifiers (IAM roles, CloudWatch log groups, Lambda permissions). The REST API itself, its methods, integrations, stages, and deployment weren't imported because they don't have globally-unique names. Terraform created a brand new API Gateway with a different ID.

My frontend was still pointing at the old URL. And Terraform had updated the Lambda permissions to reference the new API Gateway, so the old URL lost its ability to invoke the Lambda. Both URLs were broken.

I pulled the new URL and API key from terraform output, updated index.js, pushed, and invalidated the CloudFront cache.

Still "--".

Failure 4: CORS preflight mismatch

I tested with curl and got {"count": 162} back. The API worked. The browser was blocking it.

I sent an OPTIONS request mimicking the browser's preflight check:

curl -D- -X OPTIONS \
  "https://pb5rav4teh.execute-api.us-east-1.amazonaws.com/prod/visitors" \
  -H "Origin: https://shankar-resume.arlingtonhood21.work" \
  -H "Access-Control-Request-Method: POST"
Enter fullscreen mode Exit fullscreen mode

The response:

Access-Control-Allow-Origin: https://arlingtonhood21.work
Enter fullscreen mode Exit fullscreen mode

My site loads from https://shankar-resume.arlingtonhood21.work. The browser does an exact string match. arlingtonhood21.work does not equal shankar-resume.arlingtonhood21.work. Preflight rejected, POST blocked.

The problem was in how API Gateway handles CORS. The actual POST request flows through the Lambda proxy integration, where my Python code checks the Origin header dynamically and returns the matching origin. But the OPTIONS preflight never reaches the Lambda. It hits a MOCK integration that returns a hardcoded, static value. I had set that static value to the apex domain instead of the subdomain.

Curl doesn't send preflight requests, which is why it worked from the terminal. Browsers always send OPTIONS first for cross-origin POST requests with custom headers.

Failure 5: The state kept disappearing

I fixed the CORS config, pushed, and the CI created a third API Gateway. Apply complete! Resources: 35 added, 4 changed, 4 destroyed. 35 new resources for a one-line change. Something was destroying the Terraform state between runs.

I checked S3:

$ aws s3 ls s3://shankar-resume-2025/ --recursive
2026-04-11  404.html
2026-04-11  favicon.svg
2026-04-11  index.html
2026-04-11  index.js
2026-04-11  style.css
Enter fullscreen mode Exit fullscreen mode

No resume-backend/terraform.tfstate. Gone.

My Terraform backend stores state in the same S3 bucket that hosts the frontend. The frontend CI pipeline runs aws s3 sync . s3://bucket --delete on every push. That --delete flag removes any S3 object not present in the source directory. The source directory has five HTML/CSS/JS files. It does not have resume-backend/terraform.tfstate.

Every frontend deploy deleted the Terraform state. Every backend CI run started from zero, imported a subset of resources, and created everything else from scratch. That's why the API Gateway URL kept changing.

The fix:

aws s3 sync . s3://${{ secrets.AWS_S3_BUCKET_NAME }} --delete --exclude "resume-backend/*"
Enter fullscreen mode Exit fullscreen mode

After adding the exclude flag, I triggered one final backend CI run, updated the frontend with the fourth and final API Gateway URL, and confirmed the state file survived the next frontend deploy.

What I'd change next time

Terraform state and application assets should not share a bucket. I stored infrastructure state in the same S3 bucket as the website. One --delete flag on a sync command was all it took to wipe the state on every deploy. If I were starting over, the state bucket would be its own resource with no other purpose.

CORS on API Gateway has two paths, and they don't share configuration. The Lambda handles CORS dynamically for actual requests. The MOCK integration returns static headers for preflight. If those two don't agree on the allowed origin, the browser blocks everything. Curl won't catch this because it skips preflight entirely. I should have tested with browser DevTools instead of curl. A 200 from curl tells you nothing about whether a browser can reach your API.

The import-on-every-run pattern is a workaround, not a design. It exists because I deployed manually before CI existed. If the first deploy had gone through CI, I would never have needed imports. For new projects: set up CI first, then deploy through it.

Current state

Browser (shankar-resume.arlingtonhood21.work)
  |
  +-- Static files: CloudFront -> S3
  |
  +-- Visitor API: POST /prod/visitors
        |
        API Gateway (EDGE, api-key-required)
          Rate limit: 5 req/sec, burst 10
          Monthly quota: 10,000
          CloudWatch access logging
          |
          Lambda (Python 3.9) -> DynamoDB

Kill Switch:
  AWS Budget ($2/mo) -> SNS -> Lambda -> disables CloudFront
Enter fullscreen mode Exit fullscreen mode

Four API Gateways were created and destroyed in the process. The fifth one stuck.

Live site: shankar-resume.arlingtonhood21.work

Top comments (0)