DEV Community

Cover image for Use Data Stores In Application Development | πŸ—οΈ Build A Product Catalog API

Use Data Stores In Application Development | πŸ—οΈ Build A Product Catalog API

Exam Guide: Developer - Associate
πŸ—οΈ Domain 1: Development with AWS Services
πŸ“˜ Task 3: Use Data Stores In Application Development

DynamoDB dominates this task. The need to understand table design, key selection, indexing, consistency models, and how to write efficient queries is essential. As well as caching with ElastiCache and DAX. Plus specialized stores like OpenSearch.


πŸ“˜ Concepts

DynamoDB Key Concepts

Primary Keys

Every table needs one. Two options:

  • Simple primary key: Partition key only (PK). Each item has a unique PK.
  • Composite primary key: Partition key (PK) + Sort key (SK). Multiple items can share a PK if they have different SKs.

Partition Key Selection

The partition key determines which physical partition stores your data. A good partition key has high cardinality which refers to many distinct values so that the data spreads evenly.

Good Partition Keys Bad Partition Keys
userId, orderId, sessionId status ("active"/"inactive")
deviceId, transactionId country (few values, uneven distribution)
email, accountId date (hot partition for today)

Consistency Models

Model Behaviour Cost Available On
Eventually Consistent May return stale data (usually consistent within 1 second) 1x read capacity Base table + GSIs
Strongly Consistent Always returns the most up-to-date data 2x read capacity Base table only (NOT GSIs)

Query vs Scan

Operation How It Works Cost When to Use
Query Finds items by partition key + optional sort key condition Reads only matching items Always prefer this
Scan Reads every item in the table Reads entire table Analytics, one-time migrations only
GetItem Fetches one item by its full primary key Reads exactly one item When you know the exact key

FilterExpression does NOT reduce the amount of data read. It only filters what's returned to you. You still pay for the full scan/query. To reduce reads, use better key design or GSIs.

Global Secondary Index (GSI) vs Local Secondary Index (LSI)

Feature GSI LSI
Partition Key Different from base table Same as base table
Sort Key Different from base table Different from base table
When To Create Anytime At table creation only
Throughput Has its own (separate from base table) Shares with base table
Consistency Eventually consistent only Supports strongly consistent
Limit 20 per table 5 per table

GSI Projection Types:

  • ALL: all attributes (most flexible, most storage cost)
  • KEYS_ONLY: only key attributes (cheapest)
  • INCLUDE: keys + specified attributes (balanced)

Caching Options

Service Use Case Latency Works With
DAX DynamoDB read cache Microseconds DynamoDB only, eventually consistent only
ElastiCache Redis General-purpose cache Sub-millisecond Any data source, complex data types, persistence
ElastiCache Memcached Simple caching Sub-millisecond Any data source, multi-threaded, no persistence

Specialized Data Stores

Store Use Case
DynamoDB Key-value lookups, known access patterns, serverless
RDS/Aurora Relational data, complex joins, ACID transactions
OpenSearch Full-text search, log analytics, complex queries
S3 Object storage, data lake, large files
ElastiCache Session storage, leaderboards, real-time analytics

Data Lifecycle

DynamoDB TTL

Automatically deletes expired items at no cost. Eventually consistent (up to 48 hours delay).

S3 Lifecycle Policies

Transition objects between storage classes (Standard β†’ IA β†’ Glacier) or expire them after a set time.


πŸ—οΈ Build A Product Catalog API

Now let's put these concepts into practice by builidng a Product Catalog API backed by DynamoDB:

  • A DynamoDB table with a composite primary key and a Global Secondary Index (GSI)
  • A Lambda function that performs queries, scans, and writes
  • TTL configured for automatic data expiration
  • DAX caching in front of DynamoDB
  • A clear understanding of when to use query vs scan, GSI vs LSI, and strong vs eventual consistency

Prerequisites


Part I

Design the DynamoDB Table

Before creating anything, let's think about access patterns. This is the most important step in DynamoDB design.

Our Access Patterns

Access Pattern How We'll Query
Get a product by ID Query PK = PRODUCT#123
List all products in a category Query PK = CATEGORY#electronics
Get a product's reviews Query PK = PRODUCT#123, SK begins_with REVIEW#
Find products by price range in a category Query GSI with category + price
List recently added products Query GSI with status + createdAt

Create the Table

Step 01: Open the DynamoDB console

Step 02: Click Create table

  • Table name: ProductCatalog
  • Partition key: PK (String)
  • Sort key β€” optional: SK (String)

Step 03: Under Table settings, choose Customize settings

Step 04: Read/write capacity settings: On-demand

⚠️ Don't create the just table yet, let's add a GSI first

Add a Global Secondary Index

Step 05: Scroll down to Secondary indexes β†’ click Create global index:

  • Index name: GSI1
  • Partition key: GSI1PK (String)
  • Sort key: GSI1SK (String)
  • Attribute projections: All

Click Create index, then click Create table.

βœ…Green banner: The ProductCatalog table was created successfully.

Why This Design?

We're using the single-table design pattern with overloaded keys:

Base table:
  PK = "PRODUCT#laptop-001"    SK = "METADATA"           β†’ product details
  PK = "PRODUCT#laptop-001"    SK = "REVIEW#2026-04-24"  β†’ a review
  PK = "CATEGORY#electronics"  SK = "PRODUCT#laptop-001" β†’ category listing

GSI1:
  GSI1PK = "CATEGORY#electronics"  GSI1SK = "PRICE#00079.99" β†’ find by price
  GSI1PK = "STATUS#active"         GSI1SK = "2026-04-24"     β†’ find by date
Enter fullscreen mode Exit fullscreen mode

Partition Key Selection: A good partition key has high cardinality (many distinct values). Bad examples: status (only a few values β†’ hot partition), country (uneven distribution). Good examples: userId, productId, orderId.


Part II

Add Sample Data

Using the Console

Step 01: In the DynamoDB console, click on ProductCatalog

Step 02: Click Explore table items

Step 03: Click Create item
Switch to JSON view (toggle at the top)
Paste this item:

{
  "PK": {"S": "PRODUCT#laptop-001"},
  "SK": {"S": "METADATA"},
  "GSI1PK": {"S": "CATEGORY#electronics"},
  "GSI1SK": {"S": "PRICE#00999.99"},
  "name": {"S": "Pro Laptop 15"},
  "description": {"S": "15-inch laptop with 16GB RAM"},
  "price": {"N": "999.99"},
  "category": {"S": "electronics"},
  "status": {"S": "active"},
  "createdAt": {"S": "2026-04-20T10:00:00Z"},
  "stock": {"N": "50"}
}
Enter fullscreen mode Exit fullscreen mode

Step 04: Click Create item

Add a few more products:

{
  "PK": {"S": "PRODUCT#mouse-002"},
  "SK": {"S": "METADATA"},
  "GSI1PK": {"S": "CATEGORY#electronics"},
  "GSI1SK": {"S": "PRICE#00029.99"},
  "name": {"S": "Wireless Mouse"},
  "description": {"S": "Ergonomic wireless mouse"},
  "price": {"N": "29.99"},
  "category": {"S": "electronics"},
  "status": {"S": "active"},
  "createdAt": {"S": "2026-04-22T10:00:00Z"},
  "stock": {"N": "200"}
}
Enter fullscreen mode Exit fullscreen mode
{
  "PK": {"S": "PRODUCT#laptop-001"},
  "SK": {"S": "REVIEW#2026-04-24#user-001"},
  "rating": {"N": "5"},
  "comment": {"S": "Great laptop, fast and reliable"},
  "userId": {"S": "user-001"},
  "createdAt": {"S": "2026-04-24T14:30:00Z"}
}
Enter fullscreen mode Exit fullscreen mode
{
  "PK": {"S": "CATEGORY#electronics"},
  "SK": {"S": "PRODUCT#laptop-001"},
  "name": {"S": "Pro Laptop 15"},
  "price": {"N": "999.99"}
}
Enter fullscreen mode Exit fullscreen mode
{
  "PK": {"S": "CATEGORY#electronics"},
  "SK": {"S": "PRODUCT#mouse-002"},
  "name": {"S": "Wireless Mouse"},
  "price": {"N": "29.99"}
}
Enter fullscreen mode Exit fullscreen mode

Part III

Query vs Scan

Query (Efficient)

Step 01: In the β–Ό Scan or query items tab, switch from Scan to Query

Step 02: Set:

  • Partition key Value: PRODUCT#laptop-001
  • Sort key Value: Begins with β–Ό REVIEW Click Run

βœ…Green banner: Completed Β· Items returned: 1 Β· Items scanned: 1 Β· Efficiency: 100% Β· RCUs consumed: 0.5

You'll see only the review items for that product. DynamoDB read exactly the items you asked for. Efficient.

Scan (Expensive)

Step 03: Switch back to Scan
Click Run with no filters

βœ…Green banner: Completed Β· Items returned: 5 Β· Items scanned: 5 Β· Efficiency: 100% Β· RCUs consumed: 2

You'll see ALL items in the table. DynamoDB read every single item. On a table with millions of items. This is slow and expensive.

Step 04: Click Add filter

  • Attribute name: category
  • Type: String
  • Condition: Equal to
  • Value: electronics Click Run

βœ…Green banner: Completed Β· Items returned: 2 Β· Items scanned: 5 Β· Efficiency: 40% Β· RCUs consumed: 2

You'll see only electronics items, but DynamoDB still read the entire table and filtered afterward. The filter doesn't reduce the read cost.

πŸ’‘ FilterExpression does NOT reduce the amount of data read. It only filters what's returned to you. You still pay for the full scan. To reduce reads, use better key design or GSIs.

Query the GSI

Step 05: Switch to Query

Step 06: Change the Index dropdown to GSI1

  • Partition key (GSI1PK) Value: CATEGORY#electronics
  • Sort key (GSI1SK) Value: Begins with β–Ό PRICE# Click Run

βœ…Green banner:: Completed Β· Items returned: 2 Β· Items scanned: 2 Β· Efficiency: 100% Β· RCUs consumed: 0.5

This efficiently finds all electronics products, sorted by price. That's the power of a well-designed GSI.


Part IV

Build the API with Lambda

Create the Lambda Function

Step 01: Open the Lambda console β†’ Create function

  • Function name: ProductCatalogAPI
  • Runtime: Python 3.12 Click Create function

βœ…Green banner:: Successfully created the function "ProductCatalogAPI".

Step 02: Configuration β†’ *General configuration β†’ click Edit

  • Memory: 256 MB
  • Timeout: 10 seconds Click Save

Step 03:Add DynamoDB Permissions
Configuration β†’ Permissions β†’ click the role name
Add permissions β†’ Attach policies
Search for and attach AmazonDynamoDBFullAccess

⚠️ In production, scope this down to only the ProductCatalog table. For this tutorial, full access keeps things simple. Brevity.

Step 04: Write the Function Code

import json
import boto3
from decimal import Decimal
from boto3.dynamodb.conditions import Key, Attr

# Initialize OUTSIDE the handler β€” reused across warm invocations
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ProductCatalog')

class DecimalEncoder(json.JSONEncoder):
    """DynamoDB returns Decimal types β€” this converts them to float for JSON."""
    def default(self, obj):
        if isinstance(obj, Decimal):
            return float(obj)
        return super().default(obj)

def lambda_handler(event, context):
    """
    Routes requests based on the HTTP method and path.
    Demonstrates query, scan, get_item, and put_item operations.
    """
    http_method = event.get('httpMethod', 'GET')
    path = event.get('path', '/')
    path_params = event.get('pathParameters') or {}
    query_params = event.get('queryStringParameters') or {}

    try:
        if path == '/products' and http_method == 'GET':
            return list_products(query_params)

        elif path.startswith('/products/') and http_method == 'GET':
            product_id = path_params.get('productId', path.split('/')[-1])
            return get_product(product_id)

        elif path == '/products' and http_method == 'POST':
            body = json.loads(event.get('body', '{}'))
            return create_product(body)

        elif path.startswith('/products/') and path.endswith('/reviews'):
            product_id = path_params.get('productId', path.split('/')[2])
            return get_reviews(product_id)

        else:
            return response(404, {'error': 'Not found'})

    except Exception as e:
        print(f"Error: {str(e)}")
        return response(500, {'error': 'Internal server error'})


def list_products(query_params):
    """
    List products by category using QUERY (efficient).
    Falls back to SCAN if no category is specified (expensive β€” avoid in production).
    """
    category = query_params.get('category')

    if category:
        # QUERY β€” efficient, reads only matching items
        print(f"Querying products in category: {category}")
        result = table.query(
            KeyConditionExpression=Key('PK').eq(f'CATEGORY#{category}')
        )
    else:
        # SCAN β€” reads entire table, expensive!
        # In production, require a category or use pagination
        print("WARNING: Scanning entire table β€” this is expensive!")
        result = table.scan(
            FilterExpression=Attr('SK').eq('METADATA')
        )

    return response(200, {
        'products': result['Items'],
        'count': result['Count'],
        'scannedCount': result['ScannedCount']  # Shows the difference between query and scan
    })


def get_product(product_id):
    """
    Get a single product by ID using GET_ITEM.
    This is the most efficient read β€” directly accesses one item by its full key.
    """
    result = table.get_item(
        Key={
            'PK': f'PRODUCT#{product_id}',
            'SK': 'METADATA'
        },
        # ConsistentRead=True  # Uncomment for strongly consistent read (2x cost)
    )

    item = result.get('Item')
    if not item:
        return response(404, {'error': f'Product {product_id} not found'})

    return response(200, item)


def get_reviews(product_id):
    """
    Get all reviews for a product using QUERY with sort key condition.
    begins_with on the sort key efficiently finds all reviews.
    """
    result = table.query(
        KeyConditionExpression=(
            Key('PK').eq(f'PRODUCT#{product_id}') &
            Key('SK').begins_with('REVIEW#')
        ),
        ScanIndexForward=False  # Sort descending (newest first)
    )

    return response(200, {
        'productId': product_id,
        'reviews': result['Items'],
        'count': result['Count']
    })


def create_product(body):
    """
    Create a new product using PUT_ITEM.
    Also creates the category listing item for efficient category queries.
    """
    product_id = body['productId']
    category = body['category']
    price = Decimal(str(body['price']))  # DynamoDB requires Decimal, not float!

    # Write the product metadata
    table.put_item(Item={
        'PK': f'PRODUCT#{product_id}',
        'SK': 'METADATA',
        'GSI1PK': f'CATEGORY#{category}',
        'GSI1SK': f'PRICE#{price:010.2f}',  # Zero-padded for correct sort order
        'name': body['name'],
        'description': body.get('description', ''),
        'price': price,
        'category': category,
        'status': 'active',
        'stock': body.get('stock', 0)
    })

    # Write the category listing (denormalization for efficient queries)
    table.put_item(Item={
        'PK': f'CATEGORY#{category}',
        'SK': f'PRODUCT#{product_id}',
        'name': body['name'],
        'price': price
    })

    return response(201, {'message': 'Product created', 'productId': product_id})


def response(status_code, body):
    return {
        'statusCode': status_code,
        'headers': {'Content-Type': 'application/json'},
        'body': json.dumps(body, cls=DecimalEncoder)
    }
Enter fullscreen mode Exit fullscreen mode

Step 05: Click Deploy

βœ…Green banner: Successfully updated the function "ProductCatalogAPI".

Notice Decimal(str(body['price'])). DynamoDB does NOT support Python float. You must use Decimal. This is a common gotcha.

Step 06: Test The Function
Click the Test tab
Create test events for each operation:

List products by category:

{
  "httpMethod": "GET",
  "path": "/products",
  "queryStringParameters": {"category": "electronics"},
  "pathParameters": null
}
Enter fullscreen mode Exit fullscreen mode

Get a single product:

{
  "httpMethod": "GET",
  "path": "/products/laptop-001",
  "pathParameters": {"productId": "laptop-001"},
  "queryStringParameters": null
}
Enter fullscreen mode Exit fullscreen mode

Get reviews:

{
  "httpMethod": "GET",
  "path": "/products/laptop-001/reviews",
  "pathParameters": {"productId": "laptop-001"},
  "queryStringParameters": null
}
Enter fullscreen mode Exit fullscreen mode

Create a product:

{
  "httpMethod": "POST",
  "path": "/products",
  "body": "{\"productId\":\"keyboard-003\",\"name\":\"Mechanical Keyboard\",\"category\":\"electronics\",\"price\":89.99,\"stock\":100}",
  "pathParameters": null,
  "queryStringParameters": null
}
Enter fullscreen mode Exit fullscreen mode

πŸ’‘ Run each test and check the results. Look at the count vs scannedCount in the list response. When using query, they'll be equal. When scanning, scannedCount will be higher.


Part V

Consistency Models

See the Difference

Step 01: In the Lambda function, find the get_product function
Uncomment the ConsistentRead=True line
Deploy

βœ…Green banner: Successfully updated the function "ProductCatalogAPI".

Now get_product uses strongly consistent reads:

  • Always returns the latest data
  • Costs 2x the read capacity
  • Only works on the base table (not GSIs)

πŸ’‘ GSIs only support eventually consistent reads.

When to Use Each

Use Case Consistency Why
Product catalog browsing Eventually consistent Stale data for a second is fine
Shopping cart Strongly consistent User expects to see what they just added
Inventory check before purchase Strongly consistent Must be accurate to avoid overselling
Analytics dashboard Eventually consistent Slight delay is acceptable

Part VI

TTL: Automatic Data Expiration

Enable TTL

Step 01: In the DynamoDB console, click on ProductCatalog

Step 02: Click β–Ό Actions
Click Turn on TTL

Step 03: Turn on Time to Live (TTL)
TTL attribute name: ttl
Click Turn on TTL

βœ…Green banner: Successfully activated Time to Live for the ProductCatalog table with the ttl attribute.

Step 04: Add Items with TTL

Create an item with a TTL value (Unix epoch timestamp):

{
  "PK": {"S": "SESSION#user-001"},
  "SK": {"S": "DATA"},
  "userId": {"S": "user-001"},
  "cartItems": {"L": [{"S": "laptop-001"}, {"S": "mouse-002"}]},
  "ttl": {"N": "1745625600"}
}
Enter fullscreen mode Exit fullscreen mode

The ttl value is a Unix timestamp. Items are deleted after this time. Use an epoch converter to set a time a few minutes in the future for testing.

πŸ’‘ TTL deletion is eventually consistent: items may persist for up to 48 hours after expiration. Don't rely on TTL for exact timing. Always filter out expired items in your queries as a safety measure.


Part VII

Caching with DAX

DAX (DynamoDB Accelerator) is an in-memory cache that sits in front of DynamoDB. It's a drop-in replacement. Same API, just change the client endpoint.

When to Use DAX

Scenario Use DAX?
Read-heavy workload, same items queried repeatedly Yes
Microsecond response times needed Yes
Write-heavy workload No (DAX is a read cache)
Need strongly consistent reads No (DAX returns eventually consistent)
Diverse access patterns, rarely same item twice No (low cache hit rate)

How DAX Works

Without DAX:
  App β†’ DynamoDB (single-digit millisecond reads)

With DAX:
  App β†’ DAX (microsecond reads if cached) β†’ DynamoDB (on cache miss)
Enter fullscreen mode Exit fullscreen mode

Console Walkthrough (Don't Create.Just Understand)

πŸ’Έ DAX requires a VPC and costs money even when idle. We'll walk through the setup without creating it.

Step 01: Click β–Ό DAX

Step 02: Click Create cluster

  • Cluster name: product-cache
  • Node type family: t-type family
  • Node type: dax.t3.small
  • Cluster size: 3 nodes (for high availability) Click Next

Step 03: Configure networks

  • Network Type: IPv4
  • Subnet group: Create new
  • Subnet group name: MySubnetGroup
  • VPC ID: defaultVPC
  • Subnets: Select all
  • Security group: default β–Ό Click View in EC2 console

Step 04: Allow port 8111 from your Lambda functions

βœ…Green banner: Inbound security group rules successfully modified on security group

Step 05: Configure Security

  • IAM Service role for DynamoDB access: Create new
  • IAM role name: DaxToDynamoDB Click Next β†’ Click Next β†’ Click Create cluster

βœ…Green banner: Successfully created the cluster product-cache.

Step 06: In your Lambda code, you'd change one line:

# Without DAX
dynamodb = boto3.resource('dynamodb')

# With DAX β€” same API, just different endpoint
import amazondax
dax_client = amazondax.AmazonDaxClient(
    endpoints=['product-cache.abc123.dax-clusters.us-east-1.amazonaws.com:8111']
)
table = dax_client.Table('ProductCatalog')
# All your existing code works unchanged!
Enter fullscreen mode Exit fullscreen mode

DAX is a drop-in replacement for DynamoDB reads. Same API, same code. Just change the client. πŸ’‘ But remember: DAX only supports eventually consistent reads and is for read-heavy workloads.


πŸ—οΈ What You Built | πŸ“˜Exam Concepts Recap

What You Did Exam Concept
Designed a table around access patterns first Access-pattern-driven DynamoDB design
Created a composite primary key (PK + SK) Single-table design, sort key relationships
Added a Global Secondary Index with different keys GSI for alternate access patterns
Used overloaded keys (PRODUCT#, CATEGORY#, REVIEW#) Single-table design pattern
Queried by partition key with begins_with on sort key Efficient query operations
Ran a scan and compared Count vs ScannedCount Scan is expensive: it reads the entire table
Added a FilterExpression to a scan Filters run AFTER reading: don't save capacity
Used Decimal(str(price)) instead of float DynamoDB type system: no float support
Toggled ConsistentRead=True on get_item Strongly vs eventually consistent reads
Noted GSIs only support eventual consistency GSI limitations
Enabled TTL on a ttl attribute Automatic data lifecycle management
Walked through DAX setup Read caching for DynamoDB, microsecond latency

⚠️ Clean Up Protocol

1. DynamoDB β†’ Delete the ProductCatalog table
2. Lambda β†’ Delete ProductCatalogAPI
3. IAM β†’ Delete the Lambda execution role
4. CloudWatch β†’ Delete the log groups


Key Takeaways

  1. Partition key cardinality: high cardinality = even distribution = good performance
  2. Query > Scan: always prefer query. Scan reads the entire table.
  3. FilterExpression doesn't save reads: it filters after reading. Use key design or GSIs instead.
  4. GSIs can be added anytime. LSIs must be created with the table
  5. GSIs are eventually consistent only: no strongly consistent reads on GSIs
  6. Use Decimal, not float for DynamoDB numbers in Python
  7. TTL is free but eventually consistent (up to 48 hours delay)
  8. DAX = DynamoDB read cache (microsecond reads). Same API as DynamoDB.
  9. DAX doesn't support strongly consistent reads
  10. Single-table design with overloaded keys is the recommended DynamoDB pattern

Additional Resources


πŸ—οΈ

Top comments (0)