Over the week, I built a new project for my portfolio. The purpose of this project is to demonstrate my opinionated approach to developing a cost-optimized, autoscaling and compliant alert monitoring system in the cloud.
Why a Cold Chain Monitoring Platform?
Simply, it's a system that monitors temperature sensors from warehouses and commercial vehicles, sends alerts and maintains records for compliance.
Pharmaceutical manufactures loose a lot of money by not being able to stay compliant and losses due to discarding merchandise that were exposes to unsafe temperatures. Temperature excursions can make medication and other chemical products ineffective or even dangerous.
It's serious.
This means that if someone isn't tracking the temperature of your medication, you can end up getting a bad batch that may cause a negative reaction when you take it.
This sounds like a real problem worth solving.
Defining the MVP
Breaking this down molecularly allowed me to focus only on the core features. By defining the functional and non-functional requirements ahead of time prevented me from building things I don't need.
Functional Requirements
- Users can sign up and authenticate using AWS Cognito
- IoT devices send temperature and humidity readings to AWS IoT Core
- Lambda functions process sensor data and store it in DynamoDB
- Users can query the latest 50 sensor readings via a secured API Gateway endpoint
- Alerts are triggered and published to an SNS topic when temperature excursions occur
- Admin users can view excursion counts and individual device status on a dashboard
- An email notification is sent when an excursion is detected
Non-Functional Requirements
- System must support at least 5,000 devices concurrently
- APIs should respond in under 500ms under normal load
- All services must be deployed using Infrastructure as Code (Terraform)
- The application must enforce JWT-based authentication for all API endpoints
- System must auto-scale based on demand using AWS Lambda concurrency
- All data must be encrypted in transit and at rest
- System uptime should meet 99.9% availability
- Audit logs must be retained for a minimum of 90 days
- Dashboard should be mobile-responsive and accessible
Project structure
cold-chain-platform/
├── terraform/
│ ├── modules/
│ │ ├── iot-core/
│ │ ├── lambda-function/
│ │ ├── dynamodb/
│ │ ├── sns/
│ │ ├── eventbridge/
│ │ └── s3-archive/
│ ├── environments/
│ │ ├── dev/
│ │ │ ├── main.tf
│ │ │ ├── variables.tf
│ │ │ ├── terraform.tfvars
│ │ │ └── backend.tf
│ │ └── prod/
│ └── versions.tf
├── lambdas/
│ ├── simulate_sensor/
│ │ ├── index.js
│ │ └── function.zip
├── other functions.../
│ │ ├── index.js
│ │ └── function.zip
├── docs/
│ ├── architecture.png
│ ├── api-spec.yaml
│ └── compliance.md
├── .gitignore
├── README.md
└── LICENSE
Client-Layer / Frontend:
AWS Amplify (Using React.js + Typescript) for rapid development and deployment of a UI for the monitoring dashboard. Amazon Cognito handles the authentication. The routes on the API Gateway have jwt authorizers that will only allow admins from the application's user pool to sign in.
Business Logic / Backend:
Lambda and IoT core processes and retrieves data from sensor. EventBridge is used as a broker to handle the notification events to SNS. In the project, I created an extra Lambda function to simulate temperature excursion events instead of wasting my money buying thousands of thermometers.
Data Layer / Database:
I went with DynamoDB because I needed something that could scale easily without having to worry about server management or performance bottlenecks. Since the system is event-driven and real-time, low-latency access was a must.
DynamoDB also gives me the flexibility to evolve the data model as needed without dealing with rigid schemas or painful migrations. It integrates seamlessly with other AWS services I’m using, like Lambda and SNS, and features like TTL and Streams made it a no-brainer for handling time-based data and triggering actions based on updates.
Security first
Each Lambda has its own IAM policies and strict permissions giving it the least amount of privileges possible. Combining AWS amplify with the other resources in this diagram ensures that data is encrypted in transit and at rest.
Below is a sample of an IAM policy in Terraform attached to a Lambda function only allowing database limited actions to DynamoDB.
resource "aws_iam_policy" "dynamodb_access" {
name = "${var.function_name}-ddb-access" # Name of the custom policy
policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Action = [
"dynamodb:PutItem", # for process-reading
"dynamodb:Scan", # for get_process_reading_records
"dynamodb:GetItem" # optional, for more flexibility
],# Only allow inserting data
Resource = var.dynamodb_table_arn # Target DynamoDB table
}
]
})
}
SNS Notifications
I set a dummy temperature threshold to trigger email alerts using EventBridge. The Lambda function uses an IF/ELSE
statement to determine if an email should be sent or not.
Challenges
Lambda timeouts: I was able to resolve the issue by simply increasing the timeout value. I tested this with 5,000 which is a moderate testing size for a demo.
const simulations = Array.from({ length: 5000 }, (_, i) => {
const temp = +(Math.random() * 10).toFixed(2); // Random temp between 0–10°C
return {
deviceId: `sensor-${i + 1}`,
timestamp: now,
temperature: temp,
humidity: +(30 + Math.random() * 30).toFixed(2), // Humidity between 30–60%
battery: +(50 + Math.random() * 50).toFixed(2), // Battery 50–100%
status: temp > 8 ? "ALERT" : "OK", // If temp > 8°C → mark as alert
type: "TEMPERATURE_READING",
location: {
facility: "Pfizer DC - Detroit",
zone: `Zone-${(i % 10) + 1}`,
gps: {
lat: +(42.3 + Math.random() * 0.1).toFixed(6), // Simulate Detroit lat
lon: +(-83.0 - Math.random() * 0.1).toFixed(6), // Simulate Detroit lon
},
},
};
});
Another mistake that wasn't obvious is when I received a 500 Server Error
with no explanation anywhere. I created a log group in CloudWatch that captured each AWS resource that's involved in the sensor reading lifecycle. I was able to see that the error was cause by me not stringifying the Lambda response from DynamoDB
exports.handler = async (event) => {
console.log("🔒 Event received:", JSON.stringify(event));
// Check if authorizer is attached and parsed
const claims = event?.requestContext?.authorizer?.jwt?.claims;
if (!claims) {
console.error("🚨 Missing Cognito claims");
return {
statusCode: 401,
body: JSON.stringify({ message: "Unauthorized: No claims" }),
};
}
try {
// 📥 Query the DynamoDB table for up to 50 of the most recent sensor readings
const data = await client.send(
new ScanCommand({
TableName: process.env.DYNAMODB_TABLE, // Table name injected via environment variable
Limit: 50, // Prevent loading too many records to maintain speed and reduce cost
})
);
// ✅ On success, return the sensor readings as a JSON response
return {
statusCode: 200,
body: JSON.stringify(data.Items), // ✅ API Gateway requires body to be a string
};
} catch (err) {
// ❌ If something goes wrong (e.g., missing permissions or table), return a 500 error
return {
statusCode: 500,
body: JSON.stringify({
message: 'Error fetching data from DynamoDB',
error: err.message, // Include the raw error for debugging visibility
}),
};
}
};
Final thoughts and future improvements.
As I continue refining this platform, I’m also thinking about how to harden it for real-world enterprise use. Improving security means enforcing strict IAM roles, using customer-managed KMS keys, and enabling detailed audit logs through CloudTrail.
For compliance, adding automated AWS Config rules and security checks through tools like AWS Security Hub can help flag misconfigurations early.
Regarding reliability for real-world use, I’d look at adding retry logic, dead-letter queues, and chaos testing to make sure the system gracefully handles edge cases and unexpected failures. These are the kinds of things that turn a functional prototype into a battle-tested backend that real businesses can trust.
If you're building something similar or just enjoy talking backend architecture, hit me up — would love to hear what you're working on.
Top comments (0)