DEV Community

Cover image for Building a Serverless Sales Analytics Platform with AI Insights for Under $10/Month

Building a Serverless Sales Analytics Platform with AI Insights for Under $10/Month

I have a number of projects I’ve been working on that are not finished yet but this is one I wanted to complete before I finished packing and got on the plane to attend AWS Re:Invent in Las Vegas this year. I'm hoping to pick up new techniques and meet many other people who build event-driven architectures every day to learn from them.

I see so many great examples of using the managed and serverless services that cloud providers like AWS offer. Being able to build such a complete solution that costs less than $10 a month to run is a common occurrence with these types of builds. You can examine your requirements and budget for any given project and choose from the many tools that are available to use with just an API call and only get charged based on how much you use them.

You can try this project out for yourself by checking out the code found in my Github repo here → Github Repo

The Challenge

Smurf Memorabilia Inc., is a fictional retail chain with multiple store locations and it needs a way to:

  • Collect daily sales data from each store location

  • Transform and store that data efficiently

  • Generate AI-powered business insights

  • Visualize results in dashboards

The key requirements include: low cost, minimal operational overhead, and pay only for what you use.

Stores will upload their sales data each day in an agreed format. The data will be processed and analysis will be done. Analytics data will be updated and AI-based recommendations will be made. Key people will receive daily emails or SMS messages of what is happening.

The Solution: 100% Serverless Architecture

My solution involves an event-driven ETL platform using managed AWS services. There are no servers to patch, no capacity to plan, and no minimum fees. You pay only when data flows through the system.

High-level architecture

Services Used

Service Role Pricing Model
AWS Lambda All compute (17 functions) Per invocation + duration
S3 Object storage Per GB stored + requests
DynamoDB Metrics database Per read/write unit (on-demand)
Step Functions Workflow orchestration Per state transition
EventBridge Event routing Free tier covers most use cases
Bedrock AI analysis (Nova Lite) Per token processed
API Gateway REST API Per request
SNS Notifications Per message

These are a few of the managed/serverless offerings from AWS. You can piece together as many of these as you need to build your architecture. These scale automatically from zero to whatever capacity you need.


Smart Data Storage with Apache Parquet

One of the key architectural decisions was converting the raw uploaded JSON sales data into Apache Parquet format. This columnar storage format delivers significant benefits:

Huge Compression

Our 30-day dataset comparison:

  • Raw JSON uploads: 53.1 MB

  • Parquet files: 4.7 MB

The examples I have get an 11x reduction in size using the default parquet compression algorithm but it can be changed to use even higher compression if needed. This results in great savings for storage and faster query performance.

Why Parquet?

  1. Columnar Storage: Only reads the columns you need, not entire rows

  2. Built-in Compression: Uses efficient encoding (dictionary, run-length, delta)

  3. Schema Enforcement: Explicit types prevent data quality issues

  4. Ecosystem Support: Works with Athena, Spark, Pandas, and most analytics tools

Type-Safe Schema

We define an explicit PyArrow schema to ensure data quality. We want to make sure we keep track of which Smurf loot is popular every day and follow the trends.

PARQUET_SCHEMA = pa.schema([
    ("transaction_id", pa.string()),
    ("transaction_timestamp", pa.timestamp("ms")),
    ("item_sku", pa.string()),
    ("item_name", pa.string()),
    ("quantity", pa.int32()),
    ("unit_price", pa.decimal128(10, 2)),
    ("line_total", pa.decimal128(10, 2)),
    ("discount_amount", pa.decimal128(10, 2)),
    ("payment_method", pa.string()),
    ("customer_id", pa.string()),
])
Enter fullscreen mode Exit fullscreen mode

This schema ensures that decimal precision is maintained (critical for financial data) and timestamps are properly typed for time-series analysis.


Hive-Style Partitioning for Efficient Queries

Raw uploads arrive with flat filenames like store_0001_2025-11-27.json. We transform these into a Hive-style partition structure:

s3://bucket/processed/
├── year=2025/
   └── month=11/
       ├── day=27/
          ├── store_id=0001/data.parquet
          ├── store_id=0002/data.parquet
          └── ...
       └── day=28/
           └── ...
Enter fullscreen mode Exit fullscreen mode

Why This Structure Matters

Partition Pruning: When you query "all sales for November 2025", tools like Amazon Athena only scan files in year=2025/month=11/ - not the entire dataset. This means:

  • Faster queries

  • Lower costs (Athena charges per TB scanned)

  • Better organization

The Transformation Code:

# Parse: store_0001_2025-11-27.json
store_id, year, month, day = parse_filename(filename)

# Output: year=2025/month=11/day=27/store_id=0001/data.parquet
output_key = f"processed/year={year}/month={month}/day={day}/store_id={store_id}/data.parquet"
Enter fullscreen mode Exit fullscreen mode

This simple transformation enables sophisticated analytics without complex ETL pipelines.


Two Analytics Options (Web-based and a more standard Business Intelligence approach)

I wanted to show how you could use multiple approaches to analyze the sales data. We need to use the best approaches to keep track of those 3-apple tall blue creatures and all the ways their fans want to remember them. One in a more simple web version that is built in ReactJS and runs in your browser. I also built a prototype version of Amazon Quick Suite dashboards. Depending on the audience one of these approaches will likely work (or you could build something else.

Option 1: React Dashboard (Developer-Friendly)

The project includes a custom ReactJS application that queries the API directly:

ReactJS web-based analytics

The Web-based analytics approach is likely best for:

  • Custom visualizations

  • Embedding in existing applications

  • Provides Full control over the user experience

  • No additional licensing costs

The React dashboard provides:

  • Real-time metrics display

  • File upload interface with drag-and-drop

  • Historical trend charts

  • AI-generated insights and recommendations

Top selling products in web-based view

Ai-based recommendations in web-based display

Option 2: Amazon Quick Suite (Business-Friendly)

This approach offers a managed Business Intelligence (BI) service that imports data from S3:

Quick Suite Analytics

The Quick Suite approach is likely best for:

  • Business users who need self-service analytics

  • Ad-hoc exploration without writing code

  • Sharing dashboards with stakeholders

  • Built-in visualizations (no frontend development)

The current project exports five datasets to S3 in newline-delimited JSON format:

  • Store summaries (daily metrics per store)

  • Top products (best sellers)

  • Anomalies (AI-detected unusual patterns)

  • Trends (week-over-week analysis)

  • Recommendations (AI-generated action items)

Quick Suite's SPICE engine imports this data for fast, interactive dashboards.

Choosing between the analytics approach to use:

Factor React Dashboard Quick Suite
Cost Included (API calls only) $24/month per author, $3/month per reader
Setup Requires development Point-and-click
Customization Unlimited Template-based
User Type Developers Business analysts
Embedding Full control Quick Suite embedding

Many organizations could use both: ReactJS for customer-facing features, Quick Suite for internal analytics.


Event-Driven Processing

The platform uses an event-driven architecture where each component reacts to events rather than polling for work. I always try to use this type of architecture unless the use-case really doesn’t fit it. AWS Step Functions are used to drive the data upload processing as well as the recommendation and analytics flow handling.

Upload Processing Flow

Step function for upload processing

  1. Store uploads JSON file to S3 (via presigned URL)

  2. S3 emits Object Created event

  3. EventBridge routes event to Step Functions

  4. Step Functions orchestrates the processing pipeline:

* Validate schema

* Convert to Parquet

* Calculate metrics

* Store in DynamoDB

* Check if all stores reported
Enter fullscreen mode Exit fullscreen mode

Daily Analysis Trigger

When the last store uploads for a day, the system automatically triggers a smurfy comprehensive analysis:

Handle daily analysis flow

The analysis runs exactly when the data is ready. But what if a store fails to report? A scheduled EventBridge rule runs at 11 PM local time as a fallback, ensuring you always get a daily report - even with partial data. The scheduler checks if analysis already ran for that day and skips if so.

If invalid data is uploaded, the key stakeholders will receive email or SNS notifications to follow up with users. If the processing flow fails on the first attempt it has built-in retry and backoff mechanisms.

Daily Email Reports

Once analysis completes, the platform automatically sends a daily summary email via SNS containing:

  • Total revenue across all stores

  • Top performing store of the day

  • AI-detected anomalies and unusual patterns

  • Business recommendations from Bedrock

Stakeholders receive insights in their inbox without logging into any dashboard.

Daily email of status


AI-Powered Insights with Amazon Bedrock

The solution uses Amazon Bedrock with the Nova Lite model (configurable to whatever model you want) to generate business intelligence:

  • Anomaly Detection: Identifies stores with unusual revenue patterns

  • Trend Analysis: Compares current performance to historical baselines

  • Recommendations: Generates actionable business advice

Bedrock is pay-per-token with no minimum commitment - so it’s perfect for batch processing workloads.


The Cost Breakdown

Here's what this platform actually costs for a typical month (e.g., 330 file uploads = multiple stores × 30 days):

Service Monthly Cost Notes
Lambda ~$2.00 17 functions, ~1000 invocations each
Step Functions ~$0.50 360 workflow executions
DynamoDB ~$1.00 On-demand mode, ~1000 ops
S3 ~$0.01 ~60 MB stored
Bedrock ~$5.00 Nova Lite, 30 daily analyses
EventBridge ~$0.00 Free tier
SNS ~$0.10 Email notifications
CloudWatch Alarms ~$0.00 7 alarms (first 10 free)
Total ~$8.61

Add Quick Suite (if needed) for $24/month per author to build dashboards, or just $3/month per reader for view-only access.

Why is this all so cheap?

  1. ARM64 Architecture: Lambda on Graviton2 is ~20% cheaper than x86

  2. Parquet Compression: ~ 11x less storage than JSON

  3. On-Demand DynamoDB: Pay only for actual read/write operations

  4. Event-Driven: No idle compute costs


Infrastructure as Code (IaC)

I’m a big advocate of using IaC for everything. My favourite tools for this are Terraform, the Serverless Application Model (SAM), and the Cloud Development Kit (CDK). In this case there is VPC provisioning and a lot of resources so I chose my go-to tool Terraform. One command deploys everything with Terraform:

terraform apply
Enter fullscreen mode Exit fullscreen mode

Here are some key snippets from the infrastructure code:

Lambda Functions (ARM64 for Cost Savings)

Lambda is best place to host your business logic when code execution times are short. All 17 Lambda functions use ARM64 architecture (Graviton2) for ~ 20% cost savings:

resource "aws_lambda_function" "process_upload" {
  filename      = data.archive_file.process_upload_zip.output_path
  function_name = "process_upload"
  role          = aws_iam_role.lambda_role.arn
  handler       = "process_upload.lambda_handler"
  runtime       = "python3.13"
  architectures = ["arm64"]
  timeout       = 30
  memory_size   = 1024

  layers = [local.powertools_layer_arn, local.pandas_layer_arn]

  tracing_config {
    mode = "Active"
  }

  environment {
    variables = merge(local.powertools_env_vars, {
      S3_BUCKET        = aws_s3_bucket.upload_bucket.id
      PROCESSED_PREFIX = var.processed_prefix
    })
  }
}
Enter fullscreen mode Exit fullscreen mode

DynamoDB (Pay-Per-Request)

DynamoDB is my favourite database to use with AWS. It is truly serverless and tables are ready to use in seconds. It offers on-demand billing which means zero compute cost when idle:

resource "aws_dynamodb_table" "sales_data" {
  name         = "SalesData"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "PK"
  range_key    = "SK"

  attribute {
    name = "PK"
    type = "S"
  }

  attribute {
    name = "SK"
    type = "S"
  }

  # GSI for querying by date across all stores
  global_secondary_index {
    name            = "GSI1"
    hash_key        = "GSI1PK"
    range_key       = "GSI1SK"
    projection_type = "ALL"
  }
}
Enter fullscreen mode Exit fullscreen mode

EventBridge (S3 to Step Functions)

Eventbridge is my favourite AWS service. It offers rules for reacting to events, pipes for bridging data across AWS services, and a nice scheduler. Here i’m using a simple rule that routes S3 uploads to the processing workflow:

resource "aws_cloudwatch_event_rule" "s3_upload" {
  name        = "capture-s3-uploads"
  description = "Capture all S3 object uploads"

  event_pattern = jsonencode({
    source      = ["aws.s3"]
    detail-type = ["Object Created"]
    detail = {
      bucket = {
        name = [aws_s3_bucket.upload_bucket.id]
      }
      object = {
        key = [{ prefix = var.upload_prefix }]
      }
    }
  })
}

resource "aws_cloudwatch_event_target" "step_function" {
  rule      = aws_cloudwatch_event_rule.s3_upload.name
  target_id = "UploadProcessorStepFunction"
  arn       = aws_sfn_state_machine.upload_processor.arn
  role_arn  = aws_iam_role.eventbridge_step_function_role.arn
}
Enter fullscreen mode Exit fullscreen mode

Step Functions (Workflow Orchestration)

In many cases you want to tightly control and track the flow of processing in your app. AWS Step Function state machines are defined as JSON templates with Lambda ARNs injected:

resource "aws_sfn_state_machine" "upload_processor" {
  name     = "upload-processor"
  role_arn = aws_iam_role.step_function_role.arn

  definition = templatefile("${path.module}/../backend/state-machines/upload-processor.json", {
    process_upload_lambda_arn        = aws_lambda_function.process_upload.arn
    calculate_metrics_lambda_arn     = aws_lambda_function.calculate_metrics.arn
    write_metrics_lambda_arn         = aws_lambda_function.write_metrics.arn
    check_all_stores_lambda_arn      = aws_lambda_function.check_all_stores.arn
    sns_alerts_topic_arn             = aws_sns_topic.sales_alerts.arn
    daily_analysis_state_machine_arn = aws_sfn_state_machine.daily_analysis.arn
  })
}
Enter fullscreen mode Exit fullscreen mode

S3 Bucket (Secure by Default)

S3 is at the core of storing data for so many apps today. My setup has Public access blocked, encryption enabled, and EventBridge notifications on:

resource "aws_s3_bucket_public_access_block" "upload_bucket_public_access_block" {
  bucket = aws_s3_bucket.upload_bucket.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket_notification" "bucket_notification" {
  bucket      = aws_s3_bucket.upload_bucket.id
  eventbridge = true
}
Enter fullscreen mode Exit fullscreen mode

The complete infrastructure includes:

  • 17 Lambda functions

  • 2 Step Functions state machines

  • API Gateway with 5 endpoints

  • DynamoDB table with GSI

  • S3 bucket with security policies

  • EventBridge rules

  • SNS topics

  • IAM roles with least-privilege policies

To set all this up there is no clicking through console pages and no manual configuration drift.


Key Takeaways

  1. Serverless doesn't mean simple - it means you focus on business logic instead of infrastructure.

  2. Parquet is worth the conversion - the great compression pays for itself in storage and query costs.

  3. Hive partitioning enables scale - organize data for how it will be queried, not how it arrives.

  4. Event-driven beats polling - let AWS route events instead of writing schedulers.

  5. Pay-as-you-go works - for variable workloads, managed services beat reserved capacity.

  6. Offer analytics options - different users have different needs; support both custom dashboards and BI tools.


Try It Yourself

The complete source code for my solution is available on GitHub, including:

  • Terraform infrastructure definitions

  • 17 Lambda functions (Python 3.13)

  • React frontend application

  • Sample data generator

  • Quick Suite setup scripts

Deploy your own instance and start processing data in under 30 minutes.


Built with AWS Lambda, Step Functions, S3, DynamoDB, EventBridge, Bedrock, API Gateway, SNS, and optionally Quick Suite.

CLEANUP (IMPORTANT!!)

If you do end up deploying this yourself please understand some of the included resources will cost you a small amount of real money. Please don’t forget about it.

Please MAKE SURE TO DELETE the stack if you are no longer using it. Running terraform destroy can take care of this or you can delete the server in the AWS console.

Try the setup in your AWS account

You can clone the Github Repo and try this out in your own AWS account. The README.md file mentions any changes you need to make for it to work in your AWS account.

Please let me know if you have any suggestions or problems trying out this example project.

For more articles from me please visit my blog at Darryl's World of Cloud or find me on Bluesky, X, LinkedIn, Medium, Dev.to, or the AWS Community.

For tons of great serverless content and discussions please join the Believe In Serverless community we have put together at this link: Believe In Serverless Community

Top comments (0)