DEV Community

Cover image for 6 Mistakes Developers Make When Deploying Generative AI on AWS (And How to Fix Them)
saif ur rahman
saif ur rahman

Posted on

6 Mistakes Developers Make When Deploying Generative AI on AWS (And How to Fix Them)

Generative AI is everywhere right now.

We’re building AI report generators, document summarizers, compliance checkers, risk engines, chatbots — and most of them work perfectly in local development.

Until they hit production.

Then things start breaking.

Timeouts.

Retries gone wrong.

Users refreshing the page 10 times.

S3 buckets accidentally public.

No clear job status.

Lambda costs increasing silently.

I recently built a production-ready serverless Generative AI backend on AWS, and along the way I made (and fixed) almost every mistake in this list.

If you’re deploying GenAI workloads on AWS, especially with Lambda, this article will save you time, money, and headaches.

Let’s break it down.

Mistake #1: Blocking API Calls with LLM Requests

The Problem

The most common mistake I see:

// Inside API handler
const result = await callLLM();
return result;
Enter fullscreen mode Exit fullscreen mode


`

Looks simple.

But here’s what happens in production:

  • API Gateway has a 29-second timeout
  • LLM calls can take 10–60 seconds
  • External APIs (news, sanctions, risk feeds) add latency
  • Users sit there waiting

Eventually:

Timeout.

And your user thinks your AI “doesn’t work”.

The Fix: Asynchronous Architecture with SQS

Instead of blocking the API, decouple it.

Better flow:


Client

API Gateway

Lambda (Request Handler)

SQS

Worker Lambda (long timeout)

Bedrock / External APIs

S3 + DynamoDB

The API only:

  • Validates input
  • Creates a report record
  • Sends message to SQS
  • Returns immediately

The worker handles heavy AI processing.

This removes timeouts completely and makes your system scalable.

Mistake #2: No Retry Logic for AI Failures

The Problem

LLMs fail.
External APIs fail.
Network calls fail.

If you call AI directly inside a request and it fails:

  • The user request fails
  • No retry
  • No recovery
  • No record of what happened

This is dangerous in compliance or risk systems.

The Fix: Let SQS Handle Retries

SQS + Lambda event source mapping automatically:

  • Retries failed messages
  • Respects visibility timeout
  • Supports Dead Letter Queues

Now if your worker fails:

  • The message returns to queue
  • Lambda retries
  • You can configure maxReceiveCount
  • You can attach a DLQ for failed jobs

You get retry logic without writing retry code.

That’s production engineering.

Mistake #3: No Status Tracking for AI Jobs

The Problem

User submits request.

Now what?

You have no idea if the job is:

  • Pending
  • Processing
  • Completed
  • Failed

Users refresh blindly.
You cannot build dashboards.
You cannot monitor performance.


The Fix: DynamoDB Lifecycle Tracking

Use DynamoDB as a job state tracker.

When request is created:

json
{
"status": "PENDING",
"risk_level": null,
"s3_url": null
}

When worker starts:


status → PROCESSING

When completed:


status → COMPLETED
risk_level → High
s3_url → https://...

Now your frontend can:

  • Poll job status
  • Show progress
  • Display result when ready

This is how long-running AI jobs should be handled.

Mistake #4: Making S3 Buckets Public

The Problem

You generate AI reports and store them in S3.

Quick solution?

Make bucket public.

json
{
"Principal": "*",
"Action": "s3:GetObject"
}

Done.

Except now:

  • Anyone can download reports
  • Sensitive data is exposed
  • Compliance risk increases

And yes, I’ve seen this happen.

The Fix: Use Pre-Signed URLs

Keep your bucket private.

When job completes:

js
const url = await getSignedUrl(s3Client, command, {
expiresIn: 600 // 10 minutes
});

Now:

  • URL works temporarily
  • Only authorized user gets access
  • Bucket remains private
  • You avoid major security risks

Public buckets and AI-generated reports should never mix.

Mistake #5: Weak Input Validation

The Problem

Most GenAI systems accept user input like:

json
{
"companyName": "...",
"corporateNumber": "..."
}

Without proper validation:

  • Invalid corporate numbers
  • Injection attempts
  • Broken workflows
  • Garbage-in → garbage-out AI responses

LLMs amplify bad input.

The Fix: Strong Validation Schema

Use a validation layer (e.g., Joi):

js
corporateNumber: Joi.string()
.pattern(/^[a-zA-Z0-9-]+$/)
.required()

Validate:

  • Length
  • Format
  • Required fields
  • Country constraints

Never trust AI to fix bad input.

AI is powerful — not magical.

Mistake #6: Over-Permissive IAM Roles

The Problem

Many developers attach:


AdministratorAccess

To Lambda for convenience.

This is dangerous:

  • S3 access everywhere
  • DynamoDB access everywhere
  • Bedrock access unrestricted
  • Harder to audit

The Fix: Least Privilege IAM

Grant only what you need:

  • sqs:SendMessage
  • sqs:ReceiveMessage
  • dynamodb:UpdateItem
  • s3:PutObject
  • s3:GetObject
  • Specific resource ARNs

Your GenAI backend becomes:

  • More secure
  • Easier to audit
  • Production compliant

Security is part of AI engineering.

The Real Lesson

Generative AI is not just about prompting.

It’s about:

  • Architecture
  • Reliability
  • Security
  • Observability
  • Lifecycle management

If you treat LLMs like simple API calls, your system will fail at scale.

If you treat them like long-running distributed workloads, you’ll build something production-ready.

Final Thoughts

AI is the exciting part.

But infrastructure is what makes it usable.

The difference between a demo and a real product is not the model — it’s the backend design.

If you’re deploying Generative AI on AWS:

  • Use async patterns
  • Track state
  • Avoid blocking APIs
  • Secure your storage
  • Validate aggressively
  • Follow least privilege

That’s how you build AI systems that survive production.

Top comments (0)