Skip to content

DEV Community

Anushka Singh

Posted on Jun 18

Building an AI image intelligence pipeline on AWS

#ai #aws #nextjs #cloudnative

I have begun practising on AWS and what's more productive than to build projects out of AWS console. Cloud Computing is my new favourite along with AI. After various ideation sessions and discussing with Claude what could intersect with AI and a use of cloud native application we joined hands with image intelligence pipeline and Claude gave me all the AWS services names which fall under free tier yes my project is $0 cost and 100% working. By this I realised how much we are dependent on AI bots to think and act. If I had been working with AWS and somehow having unlimited credits I woulda def. deployed the project.
Hi world! This is a simple project but the happiness of the application to be workable is boundless because I take forever to debug but nowadays we are talking about Agentic Engineering helping me to find bugs faster but the concept is going above my head has been used by me a bit and I can tell you some other day about Claude skills and it's wonders but first let me get acquainted with the new SDLC itself.

What it can identify

faces and their confidence score of emotions, age range
object scenes, people, activities and graphic elements and the best part is it recognises the text and writes down whatever is written in the image so you can upload your receipts too.

Demonstration

Github repository

s17anushka / image-intelligence

An AI image intelligence pipeline built fully on AWS free tier and frontend on Next.js

🧠 AI Image Intelligence Pipeline

A serverless AI-powered image analysis app built on AWS. Upload any image and get instant analysis — objects, faces, emotions, text, and receipt line items — all in under 5 seconds.

✨ What it does

Upload any image and the pipeline automatically:

Detects objects & scenes — labels everything visible with confidence scores
Reads faces — estimates age range, dominant emotion, gender, smile
Extracts text — reads any words visible in the image
Parses receipts — extracts line items and amounts from receipts/invoices
Moderates content — flags unsafe or explicit content automatically

The Lambda function acts as an AI agent — it branches conditionally based on what Rekognition finds. If text is detected, it automatically fires a Textract call to extract structured data.

🏗️ Architecture

User uploads image

↓

API Gateway → Lambda (presign) → S3 presigned URL

↓

Browser PUT → S3 bucket

↓

…

TechStack

Frontend

Next.js 15 for app router! Well My frontend is basic enough to demonstrate my idea.
Typescript for type safety
Tailwind CSS for the styling
SWR for data fetching and polling

Backend

Serverless and AWS mediated

Amazon S3 - To upload images and triggering the pipeline by event notification

AWS Lambda (Node.js 24) - there are 2 functions created image-intelligence-orchestrator — the AI agent, triggered by S3 image-intelligence-api-handler — handles REST API requests

Amazon Rekognition - AWS CV AI service fetched from IAM role
Amazon Textract — AWS's document text extraction service
Amazon DynamoDB — NoSQL database storing analysis results

API Gateway (HTTP API) — REST endpoints connecting frontend to Lambda Apart from these services I had taken IAM role access for scoped permissions for lambda to access all the services I kept in mind actually no, this one was suggested by claude that presigned URLs will enable browser to upload images directly to S3 without exposing credentials. The whole pipeline runs in 3 to 5 seconds

What the Project Does - Below written by Claude!!

When you upload an image, here's exactly what happens:
Step 1 - Upload
Browser requests a presigned URL from API Gateway → Lambda generates a secure temporary S3 upload URL → browser uploads the image directly to S3.
Step 2 - Trigger
S3 detects the new file and automatically triggers the orchestrator Lambda.
Step 3 - AI Analysis (the agentic part)
Lambda runs 4 Rekognition calls in parallel:

DetectLabels — identifies objects, scenes, concepts in the image
DetectFaces — finds faces, estimates age range, reads emotions and attributes
DetectText — reads any text visible in the image
DetectModerationLabels — flags unsafe content

Then it makes a decision — if more than 3 words were found, it fires a 5th call to Textract which extracts structured data like receipt line items and tables. This conditional branching is the "agentic behaviour" — Lambda decides what to do next based on what it finds.
Step 4 - Store
All results are aggregated into a single DynamoDB item keyed by imageId.
Step 5 - Display
Frontend polls the API every 2 seconds until DynamoDB has the result, then renders the analysis card with labels, faces, text and receipt data.

More to the Projects like these and will dropping the bombs later
Stay tuned!

Top comments (0)

Subscribe