Exam Guide: Developer - Associate
ποΈ Domain 1: Development with AWS Services
π Task 3: Use Data Stores In Application Development
DynamoDB dominates this task. The need to understand table design, key selection, indexing, consistency models, and how to write efficient queries is essential. As well as caching with ElastiCache and DAX. Plus specialized stores like OpenSearch.
π Concepts
DynamoDB Key Concepts
Primary Keys
Every table needs one. Two options:
- Simple primary key: Partition key only (PK). Each item has a unique PK.
- Composite primary key: Partition key (PK) + Sort key (SK). Multiple items can share a PK if they have different SKs.
Partition Key Selection
The partition key determines which physical partition stores your data. A good partition key has high cardinality which refers to many distinct values so that the data spreads evenly.
| Good Partition Keys | Bad Partition Keys |
|---|---|
| userId, orderId, sessionId | status ("active"/"inactive") |
| deviceId, transactionId | country (few values, uneven distribution) |
| email, accountId | date (hot partition for today) |
Consistency Models
| Model | Behaviour | Cost | Available On |
|---|---|---|---|
| Eventually Consistent | May return stale data (usually consistent within 1 second) | 1x read capacity | Base table + GSIs |
| Strongly Consistent | Always returns the most up-to-date data | 2x read capacity | Base table only (NOT GSIs) |
Query vs Scan
| Operation | How It Works | Cost | When to Use |
|---|---|---|---|
| Query | Finds items by partition key + optional sort key condition | Reads only matching items | Always prefer this |
| Scan | Reads every item in the table | Reads entire table | Analytics, one-time migrations only |
| GetItem | Fetches one item by its full primary key | Reads exactly one item | When you know the exact key |
FilterExpressiondoes NOT reduce the amount of data read. It only filters what's returned to you. You still pay for the full scan/query. To reduce reads, use better key design or GSIs.
Global Secondary Index (GSI) vs Local Secondary Index (LSI)
| Feature | GSI | LSI |
|---|---|---|
| Partition Key | Different from base table | Same as base table |
| Sort Key | Different from base table | Different from base table |
| When To Create | Anytime | At table creation only |
| Throughput | Has its own (separate from base table) | Shares with base table |
| Consistency | Eventually consistent only | Supports strongly consistent |
| Limit | 20 per table | 5 per table |
GSI Projection Types:
- ALL: all attributes (most flexible, most storage cost)
- KEYS_ONLY: only key attributes (cheapest)
- INCLUDE: keys + specified attributes (balanced)
Caching Options
| Service | Use Case | Latency | Works With |
|---|---|---|---|
| DAX | DynamoDB read cache | Microseconds | DynamoDB only, eventually consistent only |
| ElastiCache Redis | General-purpose cache | Sub-millisecond | Any data source, complex data types, persistence |
| ElastiCache Memcached | Simple caching | Sub-millisecond | Any data source, multi-threaded, no persistence |
Specialized Data Stores
| Store | Use Case |
|---|---|
| DynamoDB | Key-value lookups, known access patterns, serverless |
| RDS/Aurora | Relational data, complex joins, ACID transactions |
| OpenSearch | Full-text search, log analytics, complex queries |
| S3 | Object storage, data lake, large files |
| ElastiCache | Session storage, leaderboards, real-time analytics |
Data Lifecycle
DynamoDB TTL
Automatically deletes expired items at no cost. Eventually consistent (up to 48 hours delay).
S3 Lifecycle Policies
Transition objects between storage classes (Standard β IA β Glacier) or expire them after a set time.
ποΈ Build A Product Catalog API
Now let's put these concepts into practice by builidng a Product Catalog API backed by DynamoDB:
- A DynamoDB table with a composite primary key and a Global Secondary Index (GSI)
- A Lambda function that performs queries, scans, and writes
- TTL configured for automatic data expiration
- DAX caching in front of DynamoDB
- A clear understanding of when to use query vs scan, GSI vs LSI, and strong vs eventual consistency
Prerequisites
- An AWS account (free tier covers DynamoDB and Lambda)
Part I
Design the DynamoDB Table
Before creating anything, let's think about access patterns. This is the most important step in DynamoDB design.
Our Access Patterns
| Access Pattern | How We'll Query |
|---|---|
| Get a product by ID | Query PK = PRODUCT#123
|
| List all products in a category | Query PK = CATEGORY#electronics
|
| Get a product's reviews | Query PK = PRODUCT#123, SK begins_with REVIEW#
|
| Find products by price range in a category | Query GSI with category + price |
| List recently added products | Query GSI with status + createdAt |
Create the Table
Step 01: Open the DynamoDB console
Step 02: Click Create table
-
Table name:
ProductCatalog -
Partition key:
PK(String) -
Sort key β optional:
SK(String)
Step 03: Under Table settings, choose Customize settings
Step 04: Read/write capacity settings: On-demand
β οΈ Don't create the just table yet, let's add a GSI first
Add a Global Secondary Index
Step 05: Scroll down to Secondary indexes β click Create global index:
-
Index name:
GSI1 -
Partition key:
GSI1PK(String) -
Sort key:
GSI1SK(String) -
Attribute projections:
All
Click Create index, then click Create table.
β Green banner: The ProductCatalog table was created successfully.
Why This Design?
We're using the single-table design pattern with overloaded keys:
Base table:
PK = "PRODUCT#laptop-001" SK = "METADATA" β product details
PK = "PRODUCT#laptop-001" SK = "REVIEW#2026-04-24" β a review
PK = "CATEGORY#electronics" SK = "PRODUCT#laptop-001" β category listing
GSI1:
GSI1PK = "CATEGORY#electronics" GSI1SK = "PRICE#00079.99" β find by price
GSI1PK = "STATUS#active" GSI1SK = "2026-04-24" β find by date
Partition Key Selection: A good partition key has high cardinality (many distinct values). Bad examples:
status(only a few values β hot partition),country(uneven distribution). Good examples:userId,productId,orderId.
Part II
Add Sample Data
Using the Console
Step 01: In the DynamoDB console, click on ProductCatalog
Step 02: Click Explore table items
Step 03: Click Create item
Switch to JSON view (toggle at the top)
Paste this item:
{
"PK": {"S": "PRODUCT#laptop-001"},
"SK": {"S": "METADATA"},
"GSI1PK": {"S": "CATEGORY#electronics"},
"GSI1SK": {"S": "PRICE#00999.99"},
"name": {"S": "Pro Laptop 15"},
"description": {"S": "15-inch laptop with 16GB RAM"},
"price": {"N": "999.99"},
"category": {"S": "electronics"},
"status": {"S": "active"},
"createdAt": {"S": "2026-04-20T10:00:00Z"},
"stock": {"N": "50"}
}
Step 04: Click Create item
Add a few more products:
{
"PK": {"S": "PRODUCT#mouse-002"},
"SK": {"S": "METADATA"},
"GSI1PK": {"S": "CATEGORY#electronics"},
"GSI1SK": {"S": "PRICE#00029.99"},
"name": {"S": "Wireless Mouse"},
"description": {"S": "Ergonomic wireless mouse"},
"price": {"N": "29.99"},
"category": {"S": "electronics"},
"status": {"S": "active"},
"createdAt": {"S": "2026-04-22T10:00:00Z"},
"stock": {"N": "200"}
}
{
"PK": {"S": "PRODUCT#laptop-001"},
"SK": {"S": "REVIEW#2026-04-24#user-001"},
"rating": {"N": "5"},
"comment": {"S": "Great laptop, fast and reliable"},
"userId": {"S": "user-001"},
"createdAt": {"S": "2026-04-24T14:30:00Z"}
}
{
"PK": {"S": "CATEGORY#electronics"},
"SK": {"S": "PRODUCT#laptop-001"},
"name": {"S": "Pro Laptop 15"},
"price": {"N": "999.99"}
}
{
"PK": {"S": "CATEGORY#electronics"},
"SK": {"S": "PRODUCT#mouse-002"},
"name": {"S": "Wireless Mouse"},
"price": {"N": "29.99"}
}
Part III
Query vs Scan
Query (Efficient)
Step 01: In the βΌ Scan or query items tab, switch from Scan to Query
Step 02: Set:
-
Partition key Value:
PRODUCT#laptop-001 -
Sort key Value: Begins with βΌ
REVIEWClick Run
β Green banner: Completed Β· Items returned: 1 Β· Items scanned: 1 Β· Efficiency: 100% Β· RCUs consumed: 0.5
You'll see only the review items for that product. DynamoDB read exactly the items you asked for. Efficient.
Scan (Expensive)
Step 03: Switch back to Scan
Click Run with no filters
β Green banner: Completed Β· Items returned: 5 Β· Items scanned: 5 Β· Efficiency: 100% Β· RCUs consumed: 2
You'll see ALL items in the table. DynamoDB read every single item. On a table with millions of items. This is slow and expensive.
Step 04: Click Add filter
- Attribute name:
category - Type: String
- Condition: Equal to
- Value:
electronicsClick Run
β Green banner: Completed Β· Items returned: 2 Β· Items scanned: 5 Β· Efficiency: 40% Β· RCUs consumed: 2
You'll see only electronics items, but DynamoDB still read the entire table and filtered afterward. The filter doesn't reduce the read cost.
π‘ FilterExpression does NOT reduce the amount of data read. It only filters what's returned to you. You still pay for the full scan. To reduce reads, use better key design or GSIs.
Query the GSI
Step 05: Switch to Query
Step 06: Change the Index dropdown to GSI1
-
Partition key (GSI1PK) Value:
CATEGORY#electronics -
Sort key (GSI1SK) Value: Begins with βΌ
PRICE#Click Run
β Green banner:: Completed Β· Items returned: 2 Β· Items scanned: 2 Β· Efficiency: 100% Β· RCUs consumed: 0.5
This efficiently finds all electronics products, sorted by price. That's the power of a well-designed GSI.
Part IV
Build the API with Lambda
Create the Lambda Function
Step 01: Open the Lambda console β Create function
-
Function name:
ProductCatalogAPI -
Runtime:
Python 3.12Click Create function
β Green banner:: Successfully created the function "ProductCatalogAPI".
Step 02: Configuration β *General configuration β click Edit
-
Memory:
256 MB -
Timeout:
10 secondsClick Save
Step 03:Add DynamoDB Permissions
Configuration β Permissions β click the role name
Add permissions β Attach policies
Search for and attach AmazonDynamoDBFullAccess
β οΈ In production, scope this down to only the
ProductCatalogtable. For this tutorial, full access keeps things simple. Brevity.
Step 04: Write the Function Code
import json
import boto3
from decimal import Decimal
from boto3.dynamodb.conditions import Key, Attr
# Initialize OUTSIDE the handler β reused across warm invocations
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('ProductCatalog')
class DecimalEncoder(json.JSONEncoder):
"""DynamoDB returns Decimal types β this converts them to float for JSON."""
def default(self, obj):
if isinstance(obj, Decimal):
return float(obj)
return super().default(obj)
def lambda_handler(event, context):
"""
Routes requests based on the HTTP method and path.
Demonstrates query, scan, get_item, and put_item operations.
"""
http_method = event.get('httpMethod', 'GET')
path = event.get('path', '/')
path_params = event.get('pathParameters') or {}
query_params = event.get('queryStringParameters') or {}
try:
if path == '/products' and http_method == 'GET':
return list_products(query_params)
elif path.startswith('/products/') and http_method == 'GET':
product_id = path_params.get('productId', path.split('/')[-1])
return get_product(product_id)
elif path == '/products' and http_method == 'POST':
body = json.loads(event.get('body', '{}'))
return create_product(body)
elif path.startswith('/products/') and path.endswith('/reviews'):
product_id = path_params.get('productId', path.split('/')[2])
return get_reviews(product_id)
else:
return response(404, {'error': 'Not found'})
except Exception as e:
print(f"Error: {str(e)}")
return response(500, {'error': 'Internal server error'})
def list_products(query_params):
"""
List products by category using QUERY (efficient).
Falls back to SCAN if no category is specified (expensive β avoid in production).
"""
category = query_params.get('category')
if category:
# QUERY β efficient, reads only matching items
print(f"Querying products in category: {category}")
result = table.query(
KeyConditionExpression=Key('PK').eq(f'CATEGORY#{category}')
)
else:
# SCAN β reads entire table, expensive!
# In production, require a category or use pagination
print("WARNING: Scanning entire table β this is expensive!")
result = table.scan(
FilterExpression=Attr('SK').eq('METADATA')
)
return response(200, {
'products': result['Items'],
'count': result['Count'],
'scannedCount': result['ScannedCount'] # Shows the difference between query and scan
})
def get_product(product_id):
"""
Get a single product by ID using GET_ITEM.
This is the most efficient read β directly accesses one item by its full key.
"""
result = table.get_item(
Key={
'PK': f'PRODUCT#{product_id}',
'SK': 'METADATA'
},
# ConsistentRead=True # Uncomment for strongly consistent read (2x cost)
)
item = result.get('Item')
if not item:
return response(404, {'error': f'Product {product_id} not found'})
return response(200, item)
def get_reviews(product_id):
"""
Get all reviews for a product using QUERY with sort key condition.
begins_with on the sort key efficiently finds all reviews.
"""
result = table.query(
KeyConditionExpression=(
Key('PK').eq(f'PRODUCT#{product_id}') &
Key('SK').begins_with('REVIEW#')
),
ScanIndexForward=False # Sort descending (newest first)
)
return response(200, {
'productId': product_id,
'reviews': result['Items'],
'count': result['Count']
})
def create_product(body):
"""
Create a new product using PUT_ITEM.
Also creates the category listing item for efficient category queries.
"""
product_id = body['productId']
category = body['category']
price = Decimal(str(body['price'])) # DynamoDB requires Decimal, not float!
# Write the product metadata
table.put_item(Item={
'PK': f'PRODUCT#{product_id}',
'SK': 'METADATA',
'GSI1PK': f'CATEGORY#{category}',
'GSI1SK': f'PRICE#{price:010.2f}', # Zero-padded for correct sort order
'name': body['name'],
'description': body.get('description', ''),
'price': price,
'category': category,
'status': 'active',
'stock': body.get('stock', 0)
})
# Write the category listing (denormalization for efficient queries)
table.put_item(Item={
'PK': f'CATEGORY#{category}',
'SK': f'PRODUCT#{product_id}',
'name': body['name'],
'price': price
})
return response(201, {'message': 'Product created', 'productId': product_id})
def response(status_code, body):
return {
'statusCode': status_code,
'headers': {'Content-Type': 'application/json'},
'body': json.dumps(body, cls=DecimalEncoder)
}
Step 05: Click Deploy
β Green banner: Successfully updated the function "ProductCatalogAPI".
Notice
Decimal(str(body['price'])). DynamoDB does NOT support Pythonfloat. You must useDecimal. This is a common gotcha.
Step 06: Test The Function
Click the Test tab
Create test events for each operation:
List products by category:
{
"httpMethod": "GET",
"path": "/products",
"queryStringParameters": {"category": "electronics"},
"pathParameters": null
}
Get a single product:
{
"httpMethod": "GET",
"path": "/products/laptop-001",
"pathParameters": {"productId": "laptop-001"},
"queryStringParameters": null
}
Get reviews:
{
"httpMethod": "GET",
"path": "/products/laptop-001/reviews",
"pathParameters": {"productId": "laptop-001"},
"queryStringParameters": null
}
Create a product:
{
"httpMethod": "POST",
"path": "/products",
"body": "{\"productId\":\"keyboard-003\",\"name\":\"Mechanical Keyboard\",\"category\":\"electronics\",\"price\":89.99,\"stock\":100}",
"pathParameters": null,
"queryStringParameters": null
}
π‘ Run each test and check the results. Look at the
countvsscannedCountin the list response. When using query, they'll be equal. When scanning,scannedCountwill be higher.
Part V
Consistency Models
See the Difference
Step 01: In the Lambda function, find the get_product function
Uncomment the ConsistentRead=True line
Deploy
β Green banner: Successfully updated the function "ProductCatalogAPI".
Now
get_productuses strongly consistent reads:
- Always returns the latest data
- Costs 2x the read capacity
- Only works on the base table (not GSIs)
π‘ GSIs only support eventually consistent reads.
When to Use Each
| Use Case | Consistency | Why |
|---|---|---|
| Product catalog browsing | Eventually consistent | Stale data for a second is fine |
| Shopping cart | Strongly consistent | User expects to see what they just added |
| Inventory check before purchase | Strongly consistent | Must be accurate to avoid overselling |
| Analytics dashboard | Eventually consistent | Slight delay is acceptable |
Part VI
TTL: Automatic Data Expiration
Enable TTL
Step 01: In the DynamoDB console, click on ProductCatalog
Step 02: Click βΌ Actions
Click Turn on TTL
Step 03: Turn on Time to Live (TTL)
TTL attribute name: ttl
Click Turn on TTL
β Green banner: Successfully activated Time to Live for the ProductCatalog table with the ttl attribute.
Step 04: Add Items with TTL
Create an item with a TTL value (Unix epoch timestamp):
{
"PK": {"S": "SESSION#user-001"},
"SK": {"S": "DATA"},
"userId": {"S": "user-001"},
"cartItems": {"L": [{"S": "laptop-001"}, {"S": "mouse-002"}]},
"ttl": {"N": "1745625600"}
}
The
ttlvalue is a Unix timestamp. Items are deleted after this time. Use an epoch converter to set a time a few minutes in the future for testing.
π‘ TTL deletion is eventually consistent: items may persist for up to 48 hours after expiration. Don't rely on TTL for exact timing. Always filter out expired items in your queries as a safety measure.
Part VII
Caching with DAX
DAX (DynamoDB Accelerator) is an in-memory cache that sits in front of DynamoDB. It's a drop-in replacement. Same API, just change the client endpoint.
When to Use DAX
| Scenario | Use DAX? |
|---|---|
| Read-heavy workload, same items queried repeatedly | Yes |
| Microsecond response times needed | Yes |
| Write-heavy workload | No (DAX is a read cache) |
| Need strongly consistent reads | No (DAX returns eventually consistent) |
| Diverse access patterns, rarely same item twice | No (low cache hit rate) |
How DAX Works
Without DAX:
App β DynamoDB (single-digit millisecond reads)
With DAX:
App β DAX (microsecond reads if cached) β DynamoDB (on cache miss)
Console Walkthrough (Don't Create.Just Understand)
πΈ DAX requires a VPC and costs money even when idle. We'll walk through the setup without creating it.
Step 01: Click βΌ DAX
Step 02: Click Create cluster
-
Cluster name:
product-cache -
Node type family:
t-type family -
Node type:
dax.t3.small -
Cluster size:
3nodes (for high availability) Click Next
Step 03: Configure networks
-
Network Type:
IPv4 -
Subnet group:
Create new -
Subnet group name:
MySubnetGroup -
VPC ID:
defaultVPC -
Subnets:
Select all -
Security group:
default βΌClick View in EC2 console
Step 04: Allow port 8111 from your Lambda functions
β Green banner: Inbound security group rules successfully modified on security group
Step 05: Configure Security
-
IAM Service role for DynamoDB access:
Create new -
IAM role name:
DaxToDynamoDBClick Next β Click Next β Click Create cluster
β Green banner: Successfully created the cluster product-cache.
Step 06: In your Lambda code, you'd change one line:
# Without DAX
dynamodb = boto3.resource('dynamodb')
# With DAX β same API, just different endpoint
import amazondax
dax_client = amazondax.AmazonDaxClient(
endpoints=['product-cache.abc123.dax-clusters.us-east-1.amazonaws.com:8111']
)
table = dax_client.Table('ProductCatalog')
# All your existing code works unchanged!
DAX is a drop-in replacement for DynamoDB reads. Same API, same code. Just change the client. π‘ But remember: DAX only supports eventually consistent reads and is for read-heavy workloads.
ποΈ What You Built | πExam Concepts Recap
| What You Did | Exam Concept |
|---|---|
| Designed a table around access patterns first | Access-pattern-driven DynamoDB design |
| Created a composite primary key (PK + SK) | Single-table design, sort key relationships |
| Added a Global Secondary Index with different keys | GSI for alternate access patterns |
Used overloaded keys (PRODUCT#, CATEGORY#, REVIEW#) |
Single-table design pattern |
Queried by partition key with begins_with on sort key |
Efficient query operations |
Ran a scan and compared Count vs ScannedCount |
Scan is expensive: it reads the entire table |
Added a FilterExpression to a scan |
Filters run AFTER reading: don't save capacity |
Used Decimal(str(price)) instead of float |
DynamoDB type system: no float support |
Toggled ConsistentRead=True on get_item |
Strongly vs eventually consistent reads |
| Noted GSIs only support eventual consistency | GSI limitations |
Enabled TTL on a ttl attribute |
Automatic data lifecycle management |
| Walked through DAX setup | Read caching for DynamoDB, microsecond latency |
β οΈ Clean Up Protocol
1. DynamoDB β Delete the ProductCatalog table
2. Lambda β Delete ProductCatalogAPI
3. IAM β Delete the Lambda execution role
4. CloudWatch β Delete the log groups
Key Takeaways
- Partition key cardinality: high cardinality = even distribution = good performance
- Query > Scan: always prefer query. Scan reads the entire table.
- FilterExpression doesn't save reads: it filters after reading. Use key design or GSIs instead.
- GSIs can be added anytime. LSIs must be created with the table
- GSIs are eventually consistent only: no strongly consistent reads on GSIs
- Use Decimal, not float for DynamoDB numbers in Python
- TTL is free but eventually consistent (up to 48 hours delay)
- DAX = DynamoDB read cache (microsecond reads). Same API as DynamoDB.
- DAX doesn't support strongly consistent reads
- Single-table design with overloaded keys is the recommended DynamoDB pattern
Additional Resources
- Best practices for designing and architecting with DynamoDB
- Querying tables in DynamoDB
- Using Global Secondary Indexes in DynamoDB
- In-memory acceleration with DynamoDB Accelerator (DAX)
- Using Time to Live (TTL) in DynamoDB
ποΈ
Top comments (0)