DEV Community

Cover image for How I Built a PCI-Ready Merchant Onboarding API on AWS for Under $5/Month
Victor Ojeje
Victor Ojeje

Posted on

How I Built a PCI-Ready Merchant Onboarding API on AWS for Under $5/Month

Payment processors get rejected from enterprise contracts because it wasn't auditable. PCI DSS cares whether every data access event is logged, every record is encrypted with a key you control, and whether you can recover a merchant record to a specific second if a dispute arises.

This post walks through a serverless merchant onboarding API built to those standards, with full cost transparency.


What This Solves

A payment processor needs to register merchants: collect business details, store KYC documents, and expose that data to internal admin systems. Straightforward on the surface. The compliance overhead is where most implementations fall apart:

  • PCI DSS Requirement 3.5.1 — stored data must be encrypted with a key the organization controls, not a cloud-provider default
  • PCI DSS Requirement 8 — every API call must be authenticated and tied to an identity
  • PCI DSS Requirement 10 — every data access event must produce audit evidence

Building infra that can survive a compliance audit without emergency rework is the goal to strive for.


Architecture

Architectural Diagram

The full flow:

  1. Merchant authenticates with email and password via Amazon Cognito, receives a JWT (JSON Web Token)
  2. Merchant calls API Gateway with that JWT in the Authorization header
  3. API Gateway validates the JWT against Cognito ensuring Lambda never receives an unauthenticated request
  4. Lambda processes the request and creates merchant profile in DynamoDB, stores KYC documents in S3 if present
  5. DynamoDB stores profiles encrypted at rest with a customer-managed KMS key, PITR enabled
  6. S3 stores KYC documents encrypted with SSE-KMS, Bucket Key enabled
  7. CloudTrail captures every API call, Lambda invocation, and DynamoDB operation as mandated by PCI DSS Requirement 10
  8. CloudWatch monitors operational metrics

Assumptions and Traffic Model

Dimension Value Basis
Monthly Active Users (MAU) 10,000 Initial operating scale
Projected new merchants/month 500 Growth estimate
DynamoDB writes/month 500 One write per new merchant registration
DynamoDB reads/month 330,000 10,000 users × 3 reads + 500 new records × 20 reads × 30 days
S3 PUT requests/month 2,000 500 merchants × 4 KYC documents each
S3 GET requests/month 6,000 2,000 PUTs × 3 retrieval average
API Gateway + Lambda requests 330,500 Matches read/write totals

They were derived from real behavioral assumptions: how often a merchant logs in, how often admins pull records, document retrieval patterns.


Security Decisions and Why They Were Made

1. JWT Validation at API Gateway, Not Lambda

Authentication is enforced at the API Gateway layer using a Cognito authorizer. This means Lambda is never invoked for an unauthenticated request meaning there is no compute cost and no attack surface for unauthenticated traffic.

If you push JWT validation into Lambda, you pay for Lambda invocations even for rejected requests, and you've moved your authentication boundary inward, increasing blast radius.

2. Customer-Managed KMS Key on DynamoDB (Not AWS-Owned)

The DynamoDB table uses a customer-managed KMS key

DynamoDB Details

AWS-Owned Keys (the free default) fail PCI DSS Requirement 3.5.1 for a specific reason: you cannot produce key usage evidence for an auditor because you have no visibility into key operations. CloudTrail does not log AWS-Owned Key usage. Customer-managed keys generate CloudTrail entries on every encrypt and decrypt operation. That's the audit evidence.

3. S3 SSE-KMS with Bucket Key Enabled

KYC documents in S3 are encrypted with SSE-KMS. Bucket Key is enabled, which reduces the number of KMS API calls by batching envelope encryption at the bucket level rather than generating a unique data key per object.

S3 Details

Without Bucket Key, every S3 object PUT and GET generates a KMS API call. At 8,000 S3 requests/month, that's 8,000 additional KMS calls on top of DynamoDB operations. Bucket Key collapses that significantly.

4. Point-in-Time Recovery (PITR) on DynamoDB

PITR is enabled on the merchants table. This provides continuous backups with one-second granularity for the last 35 days.

In a payment dispute or fraud investigation, you may need to prove what a merchant record looked like at a specific timestamp. Without PITR, you either rebuild from CloudTrail (tedious) or you can't answer the question. PITR makes this a 30-second console operation.

5. GSI for Read Efficiency

The GET /merchants endpoint (admin listing) queries a Global Secondary Index (GSI) called entity-type-index rather than scanning the full table:

query_kwargs = {
    "IndexName": GSI_NAME,
    "KeyConditionExpression": Key("entity_type").eq("MERCHANT"),
    "Limit": limit,
    "ScanIndexForward": False,
}
Enter fullscreen mode Exit fullscreen mode

DynamoDB Scan reads every item in the table regardless of what you need. At scale, that becomes expensive. A GSI query reads only matching items. With 300,000+ reads per month, using Scan instead of a GSI query would multiply your read capacity consumption by the full table size factor.

Cursor-based pagination is implemented using ExclusiveStartKey and base64-encoded tokens, which is the correct DynamoDB pagination pattern.

6. Duplicate Registration Prevention at the Database Level

table.put_item(
    Item=item,
    ConditionExpression="attribute_not_exists(cac_number)",
)
Enter fullscreen mode Exit fullscreen mode

A merchant's CAC (Corporate Affairs Commission) registration number is unique to their business. This condition expression makes DynamoDB reject the write if a record with that CAC number already exists. This is enforced at the storage layer, not in application logic, so it cannot be bypassed by a Lambda bug or a parallel request race condition.

7. CloudTrail for PCI DSS Requirement 10

CloudTrail is configured to log management events and Lambda data events. Every PutItem, GetItem, and Query operation on the merchants table appears in CloudTrail. Every S3 object PUT and GET is logged.

PCI DSS Requirement 10.2 specifies the exact event types that must be logged: user access to cardholder data, invalid logical access attempts, use of privilege escalation mechanisms. CloudTrail at this configuration covers all of them.


The Full Lambda Function

This single function handles all three API routes. Every design decision in the security section above maps directly to something in this code.

import json
import uuid
import boto3
import os
import re
import base64
from datetime import datetime, timezone
from boto3.dynamodb.conditions import Key

dynamodb = boto3.resource("dynamodb")
TABLE_NAME = os.environ.get("TABLE_NAME", "merchants")
table = dynamodb.Table(TABLE_NAME)

# GSI (Global Secondary Index) used for GET /merchants — avoids a full table
# Scan which reads every item regardless of what you need. At 300,000 admin
# reads/month, Scan would multiply read capacity consumption by the full
# table size factor.
GSI_NAME = "entity-type-index"
DEFAULT_PAGE_SIZE = 20
MAX_PAGE_SIZE = 100
EMAIL_REGEX = re.compile(r"^[\w\.-]+@[\w\.-]+\.\w+$")


def respond(status_code, body):
    return {
        "statusCode": status_code,
        "headers": {"Content-Type": "application/json"},
        "body": json.dumps(body),
    }


def validate_post_body(body):
    required = ["business_name", "cac_number", "contact_email"]
    missing = [f for f in required if not body.get(f, "").strip()]
    if missing:
        return None, f"Missing or empty fields: {', '.join(missing)}"

    email = body["contact_email"].strip().lower()
    if not EMAIL_REGEX.match(email):
        return None, "Invalid contact_email format"

    # CAC = Corporate Affairs Commission registration number (Nigeria).
    # Exactly 6 digits — validated here before it ever touches DynamoDB.
    cac = body["cac_number"].strip()
    if not cac.isdigit() or len(cac) != 6:
        return None, "cac_number must be exactly 6 digits"

    return {
        "business_name": body["business_name"].strip(),
        "cac_number": cac,
        "contact_email": email,
    }, None


def post_merchant(event):
    try:
        body = json.loads(event.get("body") or "{}")
    except json.JSONDecodeError:
        return respond(400, {"error": "Request body must be valid JSON"})

    validated, error = validate_post_body(body)
    if error:
        return respond(400, {"error": error})

    merchant_id = str(uuid.uuid4())
    created_date = datetime.now(timezone.utc).isoformat()

    item = {
        "merchant_id": merchant_id,
        "created_date": created_date,
        "entity_type": "MERCHANT",
        "business_name": validated["business_name"],
        "cac_number": validated["cac_number"],
        "contact_email": validated["contact_email"],
        "status": "active",
    }

    # ConditionExpression enforces uniqueness at the storage layer.
    # If a record with this CAC number already exists, DynamoDB rejects
    # the write with ConditionalCheckFailedException — no application-layer
    # race condition can bypass this.
    table.put_item(
        Item=item,
        ConditionExpression="attribute_not_exists(cac_number)",
    )

    return respond(201, {"merchant_id": merchant_id, "created_date": created_date})


def get_all_merchants(event):
    params = event.get("queryStringParameters") or {}

    try:
        limit = min(int(params.get("limit", DEFAULT_PAGE_SIZE)), MAX_PAGE_SIZE)
    except ValueError:
        return respond(400, {"error": "limit must be an integer"})

    query_kwargs = {
        "IndexName": GSI_NAME,
        "KeyConditionExpression": Key("entity_type").eq("MERCHANT"),
        "Limit": limit,
        "ScanIndexForward": False,  # newest first
    }

    # Cursor-based pagination using DynamoDB's ExclusiveStartKey.
    # The cursor is base64-encoded JSON of the last evaluated key — safe
    # to pass to clients without exposing internal DynamoDB key structure.
    cursor = params.get("cursor")
    if cursor:
        try:
            exclusive_start_key = json.loads(
                base64.b64decode(cursor.encode()).decode()
            )
            query_kwargs["ExclusiveStartKey"] = exclusive_start_key
        except Exception:
            return respond(400, {"error": "Invalid cursor"})

    response = table.query(**query_kwargs)

    next_cursor = None
    if "LastEvaluatedKey" in response:
        next_cursor = base64.b64encode(
            json.dumps(response["LastEvaluatedKey"]).encode()
        ).decode()

    return respond(200, {
        "merchants": response["Items"],
        "count": len(response["Items"]),
        "next_cursor": next_cursor,
    })


def get_merchant_by_id(merchant_id):
    response = table.query(
        KeyConditionExpression=Key("merchant_id").eq(merchant_id),
        Limit=1,
    )
    items = response.get("Items", [])
    if not items:
        return respond(404, {"error": "Merchant not found"})
    return respond(200, items[0])


# Route table — static dispatch is faster than regex matching at Lambda
# invocation frequency and easier to read during a code review.
ROUTES = {
    ("POST", "/merchants"): post_merchant,
    ("GET", "/merchants"): get_all_merchants,
}


def lambda_handler(event, context):
    method = event.get("httpMethod", "")
    path = event.get("path", "")
    path_params = event.get("pathParameters") or {}

    if (method, path) in ROUTES:
        return ROUTES[(method, path)](event)

    # Dynamic route: /merchants/{id}
    if method == "GET" and path_params.get("id"):
        return get_merchant_by_id(path_params["id"])

    return respond(404, {"error": "Route not found"})
Enter fullscreen mode Exit fullscreen mode

Three things worth noting in this code that matter for regulated environments:

Input validation happens before any AWS SDK call. Email format, CAC number format, and required field presence are all checked before Lambda touches DynamoDB. Malformed requests never generate KMS decrypt operations, which keeps the KMS call count accurate to your cost model.

entity_type: "MERCHANT" is set by the server, not the client. The GSI partition key cannot be spoofed by a bad actor sending a crafted payload. If it were client-supplied, a request sending entity_type: "ADMIN" could pollute the index.

The route table (ROUTES dict) replaces a chain of if/elif checks. At 330,500 invocations per month this is not a performance concern, but during a code review at a financial institution, explicit dispatch tables are easier to audit than nested conditionals.

How Authentication Was Tested

# Generate SECRET_HASH (required for Cognito app clients with secret)
import hmac, hashlib, base64
message = username + client_id
secret_hash = base64.b64encode(
    hmac.new(client_secret.encode(), message.encode(), hashlib.sha256).digest()
).decode()

# Get JWT from Cognito
aws cognito-idp initiate-auth \
  --client-id 3einfu39fvd2opcoldicn0fh2q \
  --auth-flow USER_PASSWORD_AUTH \
  --auth-parameters USERNAME=<username>,PASSWORD=<password>,SECRET_HASH=<hash>

# Call the API with the JWT
curl -X POST https://93z9qxnyv4.execute-api.us-east-1.amazonaws.com/prod/merchants \
  -H "Authorization: <IdToken>" \
  -H "Content-Type: application/json" \
  -d '{"business_name": "Jiji", "cac_number": "128957", "contact_email": "jiji@example.com"}'

# List all merchants
curl -X GET https://93z9qxnyv4.execute-api.us-east-1.amazonaws.com/prod/merchants \
  -H "Authorization: <IdToken>"

# Retrieve a specific merchant
curl -X GET https://93z9qxnyv4.execute-api.us-east-1.amazonaws.com/prod/merchants/01fcd787-47d1-4db5-a396-f2adcf0c3e18 \
  -H "Authorization: <IdToken>"
Enter fullscreen mode Exit fullscreen mode

Requests without a valid JWT are rejected at the API Gateway layer before Lambda is invoked.


Cost Breakdown

Source: AWS Pricing Calculator export, 03/22/2026, us-east-1. 10,000 MAU, 330,500 requests/month.

Service Monthly Cost % of Bill Notes
KMS $2.01 41.3% 669,000 symmetric API calls
API Gateway $1.16 23.8% 330,500 requests, avg 34KB
Lambda $0.76 15.6% 330,500 invocations, 512MB
CloudTrail $0.34 7.0% Lambda data events at volume
CloudWatch $0.32 6.6% 0.63GB log ingestion
DynamoDB $0.24 4.9% On-demand, PITR, KMS
S3 $0.04 0.8% 1GB storage, 8,000 requests
Cognito $0.00 0.0% 10,000 MAU = within free tier
Total $4.87/month

What most engineers miss about this bill:

KMS is 41% of the monthly cost at this scale. The moment you add customer-managed encryption to DynamoDB and S3 (which PCI DSS requires), KMS becomes the dominant cost driver. Every DynamoDB read and write generates KMS API calls. Without Bucket Key on S3, every object operation would add to that count.

At 10,000 MAU this is $4.87/month. At 100,000 MAU, KMS call volume scales roughly linearly with request volume. Planning for this in the initial architecture prevents a compliance requirement from arriving as a surprise bill.

Cognito at this scale is free. The first 10,000 MAU are within the AWS free tier perpetually.


What Breaks at Scale

This architecture is appropriate for the stated scale. Three things need revisiting before 10x growth:

1. CloudTrail cost at high Lambda data event volume

CloudTrail logs Lambda data events at $0.10 per 100,000 events. At 330,500 Lambda invocations, that's $0.34/month. At 3.3M invocations, it's $3.30/month. At 33M, $33/month. Worth modelling before reaching that range.

2. GSI read capacity under admin query load

300,000 admin reads per month assumes 20 reads per admin per day. If admin tooling increases query frequency, the GSI read cost scales. Add DynamoDB DAX (DynamoDB Accelerator) at that point, DAX adds $150+/month at minimum instance size.

3. KMS request limits

KMS has a default quota of 5,500 symmetric requests per second in us-east-1. At the traffic levels modelled here, this is not a constraint. At high-throughput payment processing scale, it becomes one. Request a quota increase before hitting the ceiling, not after.


Why This Architecture Over EC2-Based Alternatives

The comparable EC2 architecture (t3.micro + RDS + ALB) runs approximately $50-80/month before adding KMS, CloudTrail, and WAF. Serverless brings this to $4.87/month at 10,000 MAU.

More importantly: there are no servers to patch. A common PCI DSS finding in EC2-based environments is unpatched OS vulnerabilities. Lambda eliminates that finding category entirely with AWS managing the runtime, and the function execution environment is ephemeral.

The tradeoff is cold starts. Lambda cold starts for Python average 100-300ms. For a merchant onboarding API (not a real-time payment path), this is acceptable. For a transaction authorization endpoint, it is not.


Stack

  • Amazon Cognito (authentication, JWT issuance)
  • Amazon API Gateway (JWT validation, request routing)
  • AWS Lambda (Python 3.14, business logic)
  • Amazon DynamoDB (merchant profiles, SSE-KMS, PITR, on-demand)
  • Amazon S3 (KYC documents(for demonstration), SSE-KMS, Bucket Key, versioning)
  • AWS CloudTrail (audit logging, PCI DSS Requirement 10)
  • Amazon CloudWatch (operational monitoring)
  • AWS KMS (customer-managed encryption keys)

If you're hiring for Cloud Engineer, DevOps, or SRE roles with infrastructure security depth, I'm open to remote opportunities.

LinkedIn | GitHub | ojejevictor@gmail.com

Top comments (0)