Ntombizakhona Mabaso

for AWS Community Builders

Posted on May 12

Use Data Stores In Application Development | 🏗️ Build A Product Catalog API

#aws #certification #cloud #developer

Exam Guide: Developer - Associate
🏗️ Domain 1: Development with AWS Services
📘 Task 3: Use Data Stores In Application Development

DynamoDB dominates this task. The need to understand table design, key selection, indexing, consistency models, and how to write efficient queries is essential. As well as caching with ElastiCache and DAX. Plus specialized stores like OpenSearch.

📘 Concepts

DynamoDB Key Concepts

Primary Keys

Every table needs one. Two options:

Simple primary key: Partition key only (PK). Each item has a unique PK.
Composite primary key: Partition key (PK) + Sort key (SK). Multiple items can share a PK if they have different SKs.

Partition Key Selection

The partition key determines which physical partition stores your data. A good partition key has high cardinality which refers to many distinct values so that the data spreads evenly.

Good Partition Keys	Bad Partition Keys
userId, orderId, sessionId	status ("active"/"inactive")
deviceId, transactionId	country (few values, uneven distribution)
email, accountId	date (hot partition for today)

Consistency Models

Model	Behaviour	Cost	Available On
Eventually Consistent	May return stale data (usually consistent within 1 second)	1x read capacity	Base table + GSIs
Strongly Consistent	Always returns the most up-to-date data	2x read capacity	Base table only (NOT GSIs)

Query vs Scan

Operation	How It Works	Cost	When to Use
Query	Finds items by partition key + optional sort key condition	Reads only matching items	Always prefer this
Scan	Reads every item in the table	Reads entire table	Analytics, one-time migrations only
GetItem	Fetches one item by its full primary key	Reads exactly one item	When you know the exact key

FilterExpression does NOT reduce the amount of data read. It only filters what's returned to you. You still pay for the full scan/query. To reduce reads, use better key design or GSIs.

Global Secondary Index (GSI) vs Local Secondary Index (LSI)

Feature	GSI	LSI
Partition Key	Different from base table	Same as base table
Sort Key	Different from base table	Different from base table
When To Create	Anytime	At table creation only
Throughput	Has its own (separate from base table)	Shares with base table
Consistency	Eventually consistent only	Supports strongly consistent
Limit	20 per table	5 per table

GSI Projection Types:

ALL: all attributes (most flexible, most storage cost)
KEYS_ONLY: only key attributes (cheapest)
INCLUDE: keys + specified attributes (balanced)

Caching Options

Service	Use Case	Latency	Works With
DAX	DynamoDB read cache	Microseconds	DynamoDB only, eventually consistent only
ElastiCache Redis	General-purpose cache	Sub-millisecond	Any data source, complex data types, persistence
ElastiCache Memcached	Simple caching	Sub-millisecond	Any data source, multi-threaded, no persistence

Specialized Data Stores

Store	Use Case
DynamoDB	Key-value lookups, known access patterns, serverless
RDS/Aurora	Relational data, complex joins, ACID transactions
OpenSearch	Full-text search, log analytics, complex queries
S3	Object storage, data lake, large files
ElastiCache	Session storage, leaderboards, real-time analytics

Data Lifecycle

DynamoDB TTL

Automatically deletes expired items at no cost. Eventually consistent (up to 48 hours delay).

S3 Lifecycle Policies

Transition objects between storage classes (Standard → IA → Glacier) or expire them after a set time.

🏗️ Build A Product Catalog API

Now let's put these concepts into practice by builidng a Product Catalog API backed by DynamoDB:

A DynamoDB table with a composite primary key and a Global Secondary Index (GSI)
A Lambda function that performs queries, scans, and writes
TTL configured for automatic data expiration
DAX caching in front of DynamoDB
A clear understanding of when to use query vs scan, GSI vs LSI, and strong vs eventual consistency

Prerequisites

An AWS account (free tier covers DynamoDB and Lambda)

Part I

Design the DynamoDB Table

Before creating anything, let's think about access patterns. This is the most important step in DynamoDB design.

Our Access Patterns

Access Pattern	How We'll Query
Get a product by ID	Query PK = `PRODUCT#123`
List all products in a category	Query PK = `CATEGORY#electronics`
Get a product's reviews	Query PK = `PRODUCT#123`, SK begins_with `REVIEW#`
Find products by price range in a category	Query GSI with category + price
List recently added products	Query GSI with status + createdAt

Create the Table

Step 01: Open the DynamoDB console

Step 02: Click Create table

Table name: ProductCatalog
Partition key: PK (String)
Sort key — optional: SK (String)

Step 03: Under Table settings, choose Customize settings

Step 04: Read/write capacity settings: On-demand

⚠️ Don't create the just table yet, let's add a GSI first

Add a Global Secondary Index

Step 05: Scroll down to Secondary indexes → click Create global index:

Index name: GSI1
Partition key: GSI1PK (String)
Sort key: GSI1SK (String)
Attribute projections: All

Click Create index, then click Create table.

✅Green banner: The ProductCatalog table was created successfully.

Why This Design?

We're using the single-table design pattern with overloaded keys:

Base table:
  PK = "PRODUCT#laptop-001"    SK = "METADATA"           → product details
  PK = "PRODUCT#laptop-001"    SK = "REVIEW#2026-04-24"  → a review
  PK = "CATEGORY#electronics"  SK = "PRODUCT#laptop-001" → category listing

GSI1:
  GSI1PK = "CATEGORY#electronics"  GSI1SK = "PRICE#00079.99" → find by price
  GSI1PK = "STATUS#active"         GSI1SK = "2026-04-24"     → find by date

Partition Key Selection: A good partition key has high cardinality (many distinct values). Bad examples: status (only a few values → hot partition), country (uneven distribution). Good examples: userId, productId, orderId.

Part II

Add Sample Data

Using the Console

Step 01: In the DynamoDB console, click on ProductCatalog

Step 02: Click Explore table items

Step 03: Click Create item
Switch to JSON view (toggle at the top)
Paste this item:

{
  "PK": {"S": "PRODUCT#laptop-001"},
  "SK": {"S": "METADATA"},
  "GSI1PK": {"S": "CATEGORY#electronics"},
  "GSI1SK": {"S": "PRICE#00999.99"},
  "name": {"S": "Pro Laptop 15"},
  "description": {"S": "15-inch laptop with 16GB RAM"},
  "price": {"N": "999.99"},
  "category": {"S": "electronics"},
  "status": {"S": "active"},
  "createdAt": {"S": "2026-04-20T10:00:00Z"},
  "stock": {"N": "50"}
}

Step 04: Click Create item

Add a few more products:

{
  "PK": {"S": "PRODUCT#mouse-002"},
  "SK": {"S": "METADATA"},
  "GSI1PK": {"S": "CATEGORY#electronics"},
  "GSI1SK": {"S": "PRICE#00029.99"},
  "name": {"S": "Wireless Mouse"},
  "description": {"S": "Ergonomic wireless mouse"},
  "price": {"N": "29.99"},
  "category": {"S": "electronics"},
  "status": {"S": "active"},
  "createdAt": {"S": "2026-04-22T10:00:00Z"},
  "stock": {"N": "200"}
}

{
  "PK": {"S": "PRODUCT#laptop-001"},
  "SK": {"S": "REVIEW#2026-04-24#user-001"},
  "rating": {"N": "5"},
  "comment": {"S": "Great laptop, fast and reliable"},
  "userId": {"S": "user-001"},
  "createdAt": {"S": "2026-04-24T14:30:00Z"}
}

{
  "PK": {"S": "CATEGORY#electronics"},
  "SK": {"S": "PRODUCT#laptop-001"},
  "name": {"S": "Pro Laptop 15"},
  "price": {"N": "999.99"}
}

{
  "PK": {"S": "CATEGORY#electronics"},
  "SK": {"S": "PRODUCT#mouse-002"},
  "name": {"S": "Wireless Mouse"},
  "price": {"N": "29.99"}
}

Part III

Query vs Scan

Query (Efficient)

Step 01: In the ▼ Scan or query items tab, switch from Scan to Query

Step 02: Set:

Partition key Value: PRODUCT#laptop-001
Sort key Value: Begins with ▼ REVIEW Click Run

✅Green banner: Completed · Items returned: 1 · Items scanned: 1 · Efficiency: 100% · RCUs consumed: 0.5

You'll see only the review items for that product. DynamoDB read exactly the items you asked for. Efficient.

Scan (Expensive)

Step 03: Switch back to Scan
Click Run with no filters

✅Green banner: Completed · Items returned: 5 · Items scanned: 5 · Efficiency: 100% · RCUs consumed: 2

You'll see ALL items in the table. DynamoDB read every single item. On a table with millions of items. This is slow and expensive.

Step 04: Click Add filter

Attribute name: category
Type: String
Condition: Equal to
Value: electronics Click Run

✅Green banner: Completed · Items returned: 2 · Items scanned: 5 · Efficiency: 40% · RCUs consumed: 2

You'll see only electronics items, but DynamoDB still read the entire table and filtered afterward. The filter doesn't reduce the read cost.

💡 FilterExpression does NOT reduce the amount of data read. It only filters what's returned to you. You still pay for the full scan. To reduce reads, use better key design or GSIs.

Query the GSI

Step 05: Switch to Query

Step 06: Change the Index dropdown to GSI1

Partition key (GSI1PK) Value: CATEGORY#electronics
Sort key (GSI1SK) Value: Begins with ▼ PRICE# Click Run

✅Green banner:: Completed · Items returned: 2 · Items scanned: 2 · Efficiency: 100% · RCUs consumed: 0.5

This efficiently finds all electronics products, sorted by price. That's the power of a well-designed GSI.

Part IV

Build the API with Lambda

Create the Lambda Function

Step 01: Open the Lambda console → Create function

Function name: ProductCatalogAPI
Runtime: Python 3.12 Click Create function

✅Green banner:: Successfully created the function "ProductCatalogAPI".

Step 02: Configuration → *General configuration → click Edit

Memory: 256 MB
Timeout: 10 seconds Click Save

Step 03:Add DynamoDB Permissions
Configuration → Permissions → click the role name
Add permissions → Attach policies
Search for and attach AmazonDynamoDBFullAccess

⚠️ In production, scope this down to only the ProductCatalog table. For this tutorial, full access keeps things simple. Brevity.

Step 04: Write the Function Code

import json
import boto3
from decimal import Decimal
from boto3.dynamodb.conditions import Key, Attr

# Initialize OUTSIDE the handler — reused across warm invocations
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ProductCatalog')

class DecimalEncoder(json.JSONEncoder):
    """DynamoDB returns Decimal types — this converts them to float for JSON."""
    def default(self, obj):
        if isinstance(obj, Decimal):
            return float(obj)
        return super().default(obj)

def lambda_handler(event, context):
    """
    Routes requests based on the HTTP method and path.
    Demonstrates query, scan, get_item, and put_item operations.
    """
    http_method = event.get('httpMethod', 'GET')
    path = event.get('path', '/')
    path_params = event.get('pathParameters') or {}
    query_params = event.get('queryStringParameters') or {}

    try:
        if path == '/products' and http_method == 'GET':
            return list_products(query_params)

        elif path.startswith('/products/') and http_method == 'GET':
            product_id = path_params.get('productId', path.split('/')[-1])
            return get_product(product_id)

        elif path == '/products' and http_method == 'POST':
            body = json.loads(event.get('body', '{}'))
            return create_product(body)

        elif path.startswith('/products/') and path.endswith('/reviews'):
            product_id = path_params.get('productId', path.split('/')[2])
            return get_reviews(product_id)

        else:
            return response(404, {'error': 'Not found'})

    except Exception as e:
        print(f"Error: {str(e)}")
        return response(500, {'error': 'Internal server error'})


def list_products(query_params):
    """
    List products by category using QUERY (efficient).
    Falls back to SCAN if no category is specified (expensive — avoid in production).
    """
    category = query_params.get('category')

    if category:
        # QUERY — efficient, reads only matching items
        print(f"Querying products in category: {category}")
        result = table.query(
            KeyConditionExpression=Key('PK').eq(f'CATEGORY#{category}')
        )
    else:
        # SCAN — reads entire table, expensive!
        # In production, require a category or use pagination
        print("WARNING: Scanning entire table — this is expensive!")
        result = table.scan(
            FilterExpression=Attr('SK').eq('METADATA')
        )

    return response(200, {
        'products': result['Items'],
        'count': result['Count'],
        'scannedCount': result['ScannedCount']  # Shows the difference between query and scan
    })


def get_product(product_id):
    """
    Get a single product by ID using GET_ITEM.
    This is the most efficient read — directly accesses one item by its full key.
    """
    result = table.get_item(
        Key={
            'PK': f'PRODUCT#{product_id}',
            'SK': 'METADATA'
        },
        # ConsistentRead=True  # Uncomment for strongly consistent read (2x cost)
    )

    item = result.get('Item')
    if not item:
        return response(404, {'error': f'Product {product_id} not found'})

    return response(200, item)


def get_reviews(product_id):
    """
    Get all reviews for a product using QUERY with sort key condition.
    begins_with on the sort key efficiently finds all reviews.
    """
    result = table.query(
        KeyConditionExpression=(
            Key('PK').eq(f'PRODUCT#{product_id}') &
            Key('SK').begins_with('REVIEW#')
        ),
        ScanIndexForward=False  # Sort descending (newest first)
    )

    return response(200, {
        'productId': product_id,
        'reviews': result['Items'],
        'count': result['Count']
    })


def create_product(body):
    """
    Create a new product using PUT_ITEM.
    Also creates the category listing item for efficient category queries.
    """
    product_id = body['productId']
    category = body['category']
    price = Decimal(str(body['price']))  # DynamoDB requires Decimal, not float!

    # Write the product metadata
    table.put_item(Item={
        'PK': f'PRODUCT#{product_id}',
        'SK': 'METADATA',
        'GSI1PK': f'CATEGORY#{category}',
        'GSI1SK': f'PRICE#{price:010.2f}',  # Zero-padded for correct sort order
        'name': body['name'],
        'description': body.get('description', ''),
        'price': price,
        'category': category,
        'status': 'active',
        'stock': body.get('stock', 0)
    })

    # Write the category listing (denormalization for efficient queries)
    table.put_item(Item={
        'PK': f'CATEGORY#{category}',
        'SK': f'PRODUCT#{product_id}',
        'name': body['name'],
        'price': price
    })

    return response(201, {'message': 'Product created', 'productId': product_id})


def response(status_code, body):
    return {
        'statusCode': status_code,
        'headers': {'Content-Type': 'application/json'},
        'body': json.dumps(body, cls=DecimalEncoder)
    }

Step 05: Click Deploy

✅Green banner: Successfully updated the function "ProductCatalogAPI".

Notice Decimal(str(body['price'])). DynamoDB does NOT support Python float. You must use Decimal. This is a common gotcha.

Step 06: Test The Function
Click the Test tab
Create test events for each operation:

List products by category:

{
  "httpMethod": "GET",
  "path": "/products",
  "queryStringParameters": {"category": "electronics"},
  "pathParameters": null
}

Get a single product:

{
  "httpMethod": "GET",
  "path": "/products/laptop-001",
  "pathParameters": {"productId": "laptop-001"},
  "queryStringParameters": null
}

Get reviews:

{
  "httpMethod": "GET",
  "path": "/products/laptop-001/reviews",
  "pathParameters": {"productId": "laptop-001"},
  "queryStringParameters": null
}

Create a product:

{
  "httpMethod": "POST",
  "path": "/products",
  "body": "{\"productId\":\"keyboard-003\",\"name\":\"Mechanical Keyboard\",\"category\":\"electronics\",\"price\":89.99,\"stock\":100}",
  "pathParameters": null,
  "queryStringParameters": null
}

💡 Run each test and check the results. Look at the count vs scannedCount in the list response. When using query, they'll be equal. When scanning, scannedCount will be higher.

Part V

Consistency Models

See the Difference

Step 01: In the Lambda function, find the get_product function
Uncomment the ConsistentRead=True line
Deploy

✅Green banner: Successfully updated the function "ProductCatalogAPI".

Now get_product uses strongly consistent reads:

Always returns the latest data

Costs 2x the read capacity

Only works on the base table (not GSIs)

💡 GSIs only support eventually consistent reads.

When to Use Each

Use Case	Consistency	Why
Product catalog browsing	Eventually consistent	Stale data for a second is fine
Shopping cart	Strongly consistent	User expects to see what they just added
Inventory check before purchase	Strongly consistent	Must be accurate to avoid overselling
Analytics dashboard	Eventually consistent	Slight delay is acceptable

Part VI

TTL: Automatic Data Expiration

Enable TTL

Step 01: In the DynamoDB console, click on ProductCatalog

Step 02: Click ▼ Actions
Click Turn on TTL

Step 03: Turn on Time to Live (TTL)
TTL attribute name: ttl
Click Turn on TTL

✅Green banner: Successfully activated Time to Live for the ProductCatalog table with the ttl attribute.

Step 04: Add Items with TTL

Create an item with a TTL value (Unix epoch timestamp):

{
  "PK": {"S": "SESSION#user-001"},
  "SK": {"S": "DATA"},
  "userId": {"S": "user-001"},
  "cartItems": {"L": [{"S": "laptop-001"}, {"S": "mouse-002"}]},
  "ttl": {"N": "1745625600"}
}

The ttl value is a Unix timestamp. Items are deleted after this time. Use an epoch converter to set a time a few minutes in the future for testing.

💡 TTL deletion is eventually consistent: items may persist for up to 48 hours after expiration. Don't rely on TTL for exact timing. Always filter out expired items in your queries as a safety measure.

Part VII

Caching with DAX

DAX (DynamoDB Accelerator) is an in-memory cache that sits in front of DynamoDB. It's a drop-in replacement. Same API, just change the client endpoint.

When to Use DAX

Scenario	Use DAX?
Read-heavy workload, same items queried repeatedly	Yes
Microsecond response times needed	Yes
Write-heavy workload	No (DAX is a read cache)
Need strongly consistent reads	No (DAX returns eventually consistent)
Diverse access patterns, rarely same item twice	No (low cache hit rate)

How DAX Works

Without DAX:
  App → DynamoDB (single-digit millisecond reads)

With DAX:
  App → DAX (microsecond reads if cached) → DynamoDB (on cache miss)

Console Walkthrough (Don't Create.Just Understand)

💸 DAX requires a VPC and costs money even when idle. We'll walk through the setup without creating it.

Step 01: Click ▼ DAX

Step 02: Click Create cluster

Cluster name: product-cache
Node type family: t-type family
Node type: dax.t3.small
Cluster size: 3 nodes (for high availability) Click Next

Step 03: Configure networks

Network Type: IPv4
Subnet group: Create new
Subnet group name: MySubnetGroup
VPC ID: defaultVPC
Subnets: Select all
Security group: default ▼ Click View in EC2 console

Step 04: Allow port 8111 from your Lambda functions

✅Green banner: Inbound security group rules successfully modified on security group

Step 05: Configure Security

IAM Service role for DynamoDB access: Create new
IAM role name: DaxToDynamoDB Click Next → Click Next → Click Create cluster

✅Green banner: Successfully created the cluster product-cache.

Step 06: In your Lambda code, you'd change one line:

# Without DAX
dynamodb = boto3.resource('dynamodb')

# With DAX — same API, just different endpoint
import amazondax
dax_client = amazondax.AmazonDaxClient(
    endpoints=['product-cache.abc123.dax-clusters.us-east-1.amazonaws.com:8111']
)
table = dax_client.Table('ProductCatalog')
# All your existing code works unchanged!

DAX is a drop-in replacement for DynamoDB reads. Same API, same code. Just change the client. 💡 But remember: DAX only supports eventually consistent reads and is for read-heavy workloads.

🏗️ What You Built | 📘Exam Concepts Recap

What You Did	Exam Concept
Designed a table around access patterns first	Access-pattern-driven DynamoDB design
Created a composite primary key (PK + SK)	Single-table design, sort key relationships
Added a Global Secondary Index with different keys	GSI for alternate access patterns
Used overloaded keys (`PRODUCT#`, `CATEGORY#`, `REVIEW#`)	Single-table design pattern
Queried by partition key with `begins_with` on sort key	Efficient query operations
Ran a scan and compared `Count` vs `ScannedCount`	Scan is expensive: it reads the entire table
Added a `FilterExpression` to a scan	Filters run AFTER reading: don't save capacity
Used `Decimal(str(price))` instead of `float`	DynamoDB type system: no float support
Toggled `ConsistentRead=True` on `get_item`	Strongly vs eventually consistent reads
Noted GSIs only support eventual consistency	GSI limitations
Enabled TTL on a `ttl` attribute	Automatic data lifecycle management
Walked through DAX setup	Read caching for DynamoDB, microsecond latency

⚠️ Clean Up Protocol

1. DynamoDB → Delete the ProductCatalog table
2. Lambda → Delete ProductCatalogAPI
3. IAM → Delete the Lambda execution role
4. CloudWatch → Delete the log groups

Key Takeaways

Partition key cardinality: high cardinality = even distribution = good performance
Query > Scan: always prefer query. Scan reads the entire table.
FilterExpression doesn't save reads: it filters after reading. Use key design or GSIs instead.
GSIs can be added anytime. LSIs must be created with the table
GSIs are eventually consistent only: no strongly consistent reads on GSIs
Use Decimal, not float for DynamoDB numbers in Python
TTL is free but eventually consistent (up to 48 hours delay)
DAX = DynamoDB read cache (microsecond reads). Same API as DynamoDB.
DAX doesn't support strongly consistent reads
Single-table design with overloaded keys is the recommended DynamoDB pattern

Additional Resources

🏗️

Top comments (1)

Yamashita Sadao • May 14

Thanks for sharing this.

I always appreciate examples that connect infrastructure decisions directly to product-level implementation instead of treating storage choices as purely theoretical.

I’ve been exploring product-focused backend architecture recently, so this was genuinely useful context.

Great read.