DEV Community

Josh Blair
Josh Blair

Posted on

Zero-Secret CI/CD: GitHub Actions + OIDC on AWS (Part 6)

Zero-Secret CI/CD: GitHub Actions + OIDC on AWS (Part 6)

No AWS_ACCESS_KEY_ID in your GitHub secrets. Ever. Here's how OIDC trust works and why it's strictly better.


The most common GitHub Actions setup I see in portfolios stores AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY as repository secrets. Those are long-lived credentials tied to an IAM user. One breach of your GitHub account — a compromised OAuth token, a compromised third-party Action, a secret accidentally logged in workflow output — and an attacker has permanent AWS access until someone notices and rotates the keys.

OIDC federation eliminates the stored credentials entirely. GitHub Actions assumes an IAM role using a short-lived signed token. When the job ends, the session expires. There are no keys to rotate because there are no keys.

This post covers how the trust relationship works, how the CI and deploy workflows are structured, and how the frontend gets deployed to CloudFront with correct cache headers.


How GitHub Actions OIDC Works

GitHub operates as an OpenID Connect identity provider. When a workflow job runs with the id-token: write permission, GitHub can mint a signed JWT asserting the identity of the running job:

{
  "sub": "repo:joshblair/sift:ref:refs/heads/main",
  "aud": "sts.amazonaws.com",
  "iss": "https://token.actions.githubusercontent.com",
  "repository": "joshblair/sift",
  "ref": "refs/heads/main"
}
Enter fullscreen mode Exit fullscreen mode

AWS STS accepts this JWT via AssumeRoleWithWebIdentity and issues a short-lived role session — credentials that expire when the job ends, typically within an hour. The exchange only works if AWS has been told to trust GitHub's OIDC provider and the role's trust policy permits the specific repository making the request.

Setting Up the Trust (Once Per Account)

scripts/bootstrap.sh runs once to wire this up. It does three things:

1. Creates the GitHub OIDC provider in IAM:

aws iam create-open-id-connect-provider \
  --url "https://token.actions.githubusercontent.com" \
  --client-id-list "sts.amazonaws.com" \
  --thumbprint-list "6938fd4d98bab03faadb97b34396831e3780aea1"
Enter fullscreen mode Exit fullscreen mode

This tells AWS to trust JWTs signed by GitHub's OIDC endpoint. It's a one-time setup for the AWS account — not per-repo.

2. Creates the IAM deploy role with a scoped trust policy:

{
  "Statement": [{
    "Effect": "Allow",
    "Principal": { "Federated": "arn:aws:iam::ACCOUNT_ID:oidc-provider/token.actions.githubusercontent.com" },
    "Action": "sts:AssumeRoleWithWebIdentity",
    "Condition": {
      "StringEquals": {
        "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
      },
      "StringLike": {
        "token.actions.githubusercontent.com:sub": "repo:joshblair/sift:*"
      }
    }
  }]
}
Enter fullscreen mode Exit fullscreen mode

The StringLike condition on sub limits trust to jobs running from the joshblair/sift repository. The wildcard * allows both branch pushes and pull request checks. For a production setup, you'd tighten this to repo:org/repo:ref:refs/heads/main to prevent deploy jobs from running on feature branches.

3. Prints the values to add as GitHub Actions variables:

Bootstrap complete. Add these as GitHub Actions variables in your repo settings:

  AWS_REGION       = us-west-2
  SAM_BUCKET       = sift-sam-123456789-us-west-2
  DEPLOY_ROLE_ARN  = arn:aws:iam::123456789:role/sift-github-actions-deploy
Enter fullscreen mode Exit fullscreen mode

These go in the repository's Variables (not Secrets) — they're not sensitive values. Cognito configuration (VITE_USER_POOL_ID, VITE_USER_POOL_CLIENT_ID, VITE_COGNITO_DOMAIN) is also stored as variables.

In the Workflow

With the provider and role in place, a single step handles authentication in every job that needs AWS access:

permissions:
  id-token: write   # allows the job to request an OIDC token
  contents: read

- uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: ${{ vars.DEPLOY_ROLE_ARN }}
    aws-region:     ${{ vars.AWS_REGION }}
Enter fullscreen mode Exit fullscreen mode

After this step, the job has temporary AWS credentials in its environment — the same AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN that the AWS CLI and SDKs look for, but populated automatically for the duration of the job.


CI: Three Parallel Jobs on Every Pull Request

ci.yml triggers on pull requests to main. Three jobs run in parallel — each covers a different part of the stack.

.NET Build and Test

- name: Restore
  run: dotnet restore
  working-directory: backend
- name: Build
  run: dotnet build --no-restore --configuration Release
- name: Test
  run: dotnet test --no-build --configuration Release --logger "console;verbosity=normal"
Enter fullscreen mode Exit fullscreen mode

Standard dotnet pipeline. The --no-restore and --no-build flags avoid redundant work between steps. Tests run against unit test doubles — no database, no Bedrock — so they complete in a few seconds with no external dependencies.

Python Pipeline Tests

The six pipeline Lambda functions live in separate directories under backend/pipeline/. Each has its own tests/ subdirectory. The shared utilities (database connection, Bedrock client) live in a Lambda layer at backend/pipeline/layers/shared/python/.

In the Lambda execution environment, that layer is mounted at a path Python automatically searches. In CI there's no layer mounting — so the path is added to PYTHONPATH instead:

- name: Add shared layer to PYTHONPATH
  run: echo "PYTHONPATH=$PYTHONPATH:$GITHUB_WORKSPACE/backend/pipeline/layers/shared/python" >> $GITHUB_ENV
Enter fullscreen mode Exit fullscreen mode

Writing to $GITHUB_ENV makes the variable available to all subsequent steps in the job — not just the current shell. This is the correct approach; export would only persist for the current run block.

The test suites then run one per handler:

- name: pytest — chunk (no AWS deps)
  run: pytest backend/pipeline/chunk/tests/ -v
- name: pytest — extract
  run: pytest backend/pipeline/extract/tests/ -v
- name: pytest — embed
  run: pytest backend/pipeline/embed/tests/ -v
- name: pytest — metadata
  run: pytest backend/pipeline/metadata/tests/ -v
Enter fullscreen mode Exit fullscreen mode

The chunker tests are noted as having no AWS dependencies — that handler is pure Python stdlib, so no mocking needed. The others mock boto3 calls to Bedrock and S3.

TypeScript Check, Lint, and Build

- name: TypeScript check
  run: npx tsc --noEmit
- name: ESLint
  run: npx eslint src --max-warnings 0
- name: Build (smoke test)
  run: npm run build
  env:
    VITE_API_URL: https://placeholder.execute-api.us-west-2.amazonaws.com/dev
    VITE_USER_POOL_ID: us-west-2_PLACEHOLDER
    VITE_USER_POOL_CLIENT_ID: placeholder
    VITE_COGNITO_DOMAIN: placeholder.auth.us-west-2.amazoncognito.com
Enter fullscreen mode Exit fullscreen mode

TypeScript checking and ESLint catch type errors and style issues. The vite build step is a smoke test: TypeScript's tsc --noEmit checks types but doesn't bundle. Vite's bundler can still fail on import cycles, missing environment variable references, or tree-shaking edge cases that tsc doesn't see. Placeholder values satisfy Vite's env var requirements without needing real infrastructure.

--max-warnings 0 on ESLint means warnings are treated as errors — a warning that gets committed and ignored accumulates into noise. Zero tolerance keeps the lint output meaningful.


Deploy: Two Sequential Jobs on Every Push to Main

deploy.yml triggers on pushes to main. It has two jobs: deploy-backend builds and deploys the infrastructure stacks, then deploy-frontend builds the React app with the real API URL and syncs it to S3.

Job 1: SAM Build and Deploy

After authenticating with OIDC, the job builds the Lambda functions:

- name: SAM build
  run: sam build --template-file infrastructure/template.yaml
Enter fullscreen mode Exit fullscreen mode

The build step does the work of compiling or packaging each Lambda function according to the Metadata.BuildMethod in the template. The .NET functions use a Makefile target that runs dotnet publish. The Python pipeline functions are packaged with their dependencies. Because all functions target x86_64 — matching the ubuntu-latest runner — no cross-compilation or Docker containerization is needed.

The deploy step reads from the build output, not the source template:

- name: SAM deploy — main stack
  run: |
    sam deploy \
      --no-confirm-changeset \
      --no-fail-on-empty-changeset \
      --s3-bucket ${{ vars.SAM_BUCKET }} \
      --s3-prefix sift-main \
      --stack-name sift-dev \
      --parameter-overrides Env=dev SamBucket=${{ vars.SAM_BUCKET }} \
      --capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM
Enter fullscreen mode Exit fullscreen mode

Notice there's no --template-file flag here. When sam deploy runs without specifying a template, it reads from .aws-sam/build/template.yaml — the processed template that references the compiled Lambda artifacts from the build step. Passing --template-file infrastructure/template.yaml would bypass the build and re-package raw source, which would break the .NET functions.

--no-fail-on-empty-changeset means the deploy step succeeds even if nothing changed. Without it, a push that only modifies the frontend would fail the backend deploy job with "No changes to deploy."

The frontend hosting stack is a plain CloudFormation template (no SAM transforms), deployed separately:

- name: Deploy frontend hosting stack
  run: |
    aws cloudformation deploy \
      --template-file infrastructure/template-frontend.yaml \
      --stack-name sift-frontend-dev \
      --parameter-overrides Env=dev \
      --no-fail-on-empty-changeset
Enter fullscreen mode Exit fullscreen mode

The job then captures the stack outputs — API URL, S3 bucket name, CloudFront distribution ID — and writes them to $GITHUB_OUTPUT so the next job can read them:

- name: Capture stack outputs
  id: outputs
  run: |
    API_URL=$(aws cloudformation describe-stacks \
      --stack-name sift-dev \
      --query "Stacks[0].Outputs[?OutputKey=='ApiUrl'].OutputValue" \
      --output text)
    echo "api-url=$API_URL" >> $GITHUB_OUTPUT
    # ... bucket name and CF distribution ID
Enter fullscreen mode Exit fullscreen mode

Job 2: Frontend Build and Sync

deploy-frontend declares needs: deploy-backend, which both enforces ordering and makes the first job's outputs available as needs.deploy-backend.outputs.*.

- name: Build
  run: npm run build
  env:
    VITE_API_URL:             ${{ needs.deploy-backend.outputs.api-url }}
    VITE_USER_POOL_ID:        ${{ vars.VITE_USER_POOL_ID }}
    VITE_USER_POOL_CLIENT_ID: ${{ vars.VITE_USER_POOL_CLIENT_ID }}
    VITE_COGNITO_DOMAIN:      ${{ vars.VITE_COGNITO_DOMAIN }}
Enter fullscreen mode Exit fullscreen mode

VITE_API_URL is the only value that comes from the previous job's runtime output — it's only known after the SAM stack deploys. Everything else is static configuration stored as repository variables.

The S3 sync uses split cache headers:

- name: Sync to S3
  run: |
    aws s3 sync dist/ s3://${{ needs.deploy-backend.outputs.frontend-bucket }} \
      --delete \
      --cache-control "public,max-age=31536000,immutable" \
      --exclude "index.html"
    aws s3 cp dist/index.html s3://$BUCKET/index.html \
      --cache-control "no-cache,no-store,must-revalidate"
Enter fullscreen mode Exit fullscreen mode

This is the standard SPA cache strategy. Vite includes a content hash in every asset filename — main-Dz9a8bK2.js, vendor-x4jKLmY8.css. These filenames change when the content changes. The browser can safely cache them for a year (max-age=31536000,immutable); if the file changes, it gets a new URL.

index.html doesn't have a hash in its name — it's always index.html. It contains <script src="./assets/main-Dz9a8bK2.js">, pointing to the current hashed filenames. If index.html is cached, the browser never fetches the new asset filenames after a deploy. no-cache,no-store,must-revalidate forces the browser to revalidate index.html on every navigation — it fetches fresh, reads the new asset filenames, and the rest loads from cache.

Finally, the CloudFront edge caches are invalidated:

- name: Invalidate CloudFront
  run: |
    aws cloudfront create-invalidation \
      --distribution-id ${{ needs.deploy-backend.outputs.cloudfront-id }} \
      --paths "/*"
Enter fullscreen mode Exit fullscreen mode

Without this, CloudFront would serve the cached previous version of index.html from edge locations for up to 24 hours — undermining the no-cache header set on the origin.


CloudFront + S3 Hosting

template-frontend.yaml configures CloudFront to serve a private S3 bucket using Origin Access Control:

FrontendBucket:
  Type: AWS::S3::Bucket
  Properties:
    PublicAccessBlockConfiguration:
      BlockPublicAcls:       true
      IgnorePublicAcls:      true
      BlockPublicPolicy:     true
      RestrictPublicBuckets: true
Enter fullscreen mode Exit fullscreen mode

The bucket has all public access blocked. The only way to read objects is through CloudFront.

Origin Access Control (OAC) replaces the older Origin Access Identity (OAI) pattern. OAC uses SigV4 request signing rather than a special IAM principal:

OriginAccessControl:
  Type: AWS::CloudFront::OriginAccessControl
  Properties:
    OriginAccessControlConfig:
      OriginAccessControlOriginType: s3
      SigningBehavior:               always
      SigningProtocol:               sigv4
Enter fullscreen mode Exit fullscreen mode

The bucket policy grants s3:GetObject to CloudFront, scoped to this specific distribution's ARN:

Condition:
  StringEquals:
    AWS:SourceArn: !Sub arn:aws:cloudfront::${AWS::AccountId}:distribution/${CloudFrontDistribution}
Enter fullscreen mode Exit fullscreen mode

The AWS:SourceArn condition means even if someone obtained CloudFront's service principal, they couldn't use it to access this bucket from a different distribution. The permission is tied to the specific CloudFront resource, not just the service.

The distribution handles the SPA routing requirement with custom error responses:

CustomErrorResponses:
  - ErrorCode: 403
    ResponseCode: 200
    ResponsePagePath: /index.html
  - ErrorCode: 404
    ResponseCode: 200
    ResponsePagePath: /index.html
Enter fullscreen mode Exit fullscreen mode

When a user bookmarks app.example.com/documents and navigates directly to it, S3 returns a 403 (no object at that key) or 404. Without this configuration, the user sees an XML error response. With it, CloudFront intercepts that error and serves index.html instead — React Router then handles the /documents path client-side. The ErrorCachingMinTTL: 0 on each rule prevents CloudFront from caching the error responses themselves.


The Result

The full pipeline, end to end:

  1. Developer opens a pull request → CI runs three parallel jobs (45–90 seconds)
  2. Merge to main → Deploy job builds and deploys all infrastructure stacks
  3. Frontend build runs with the real API URL from stack outputs
  4. aws s3 sync with correct cache headers, CloudFront invalidation

The only AWS credentials that exist are the temporary role session credentials inside the running job. There's no AWS_ACCESS_KEY_ID in repository secrets, no IAM user to audit, and no credential rotation to schedule. The IAM role trust policy limits which repos can assume it, and the role itself is scoped to exactly the permissions needed to deploy Sift — no more.


The Complete Series

That's all six parts:

Part Topic
1 Architecture overview — service choices and cost breakdown
2 Multi-tenant auth — Cognito JWT, API Gateway validation, Postgres RLS
3 Step Functions pipeline — state machine, Map state, Express Workflows
4 RAG and vector search — pgvector, Titan Embed v2, citations
5 React frontend — Amplify auth, presigned upload, React Query polling
6 CI/CD — OIDC federation, SAM build/deploy, CloudFront cache strategy

The live demo is at sift.bonefishsoftware.com. The code is at github.com/joshblair/sift.


Part of the Sift series: building a production-ready multi-tenant RAG platform on AWS.

Top comments (0)