Girish Mukim

Posted on Dec 14, 2025

How to Build a Scalable RAG-Based Chatbot on AWS?

#aws #agents #rag #ai

This article is written in collaboration with Ajay Pokale, a Senior Architect at Cognizant.

TL;DR (Key Takeaways)

Retrieval-Augmented Generation (RAG) allows LLMs to answer questions using your private data.
Amazon Bedrock + S3 Vectors enable a fully serverless RAG implementation
Amazon Nova Lite provides fast, cost-efficient responses for real-time chatbots
The solution is scalable, secure, and ideal for schools, enterprises, and internal knowledge systems
No servers to manage, minimal cost, production-ready architecture

Introduction

Large Language Models (LLMs) are incredibly powerful. They can generate text, summarize content, and answer complex questions.
However, they have one critical limitation:

👉 They do not know your private or domain-specific data.

This makes them unreliable for scenarios involving:

Internal company policies
School rules and schedules
Proprietary documents
Frequently changing information

Retrieval-Augmented Generation (RAG) solves this problem by combining:

Information retrieval from your own data sources
Text generation using an LLM

The result is an AI assistant that produces accurate, grounded, and trustworthy answers.

In this guide, we will build a scalable, serverless RAG chatbot on AWS using Amazon Bedrock and modern AWS services.

Real-World Use Case: School Assistant Chatbot

Let’s consider a use case for primary schools: a chatbot designed to provide quick answers to everyday questions for parents, students, and staff.

Current Challenges

Parents call the school office for routine questions
Staff repeatedly answer the same queries
Information delivery is slow and inconsistent

Way Forward using chatbot

Parents ask the chatbot 24/7
Answers are retrieved directly from official school documents
Administrative workload is significantly reduced

Example Questions

When is the next school holiday?
What documents are required for admission?
What are the school lunch rules?

This makes RAG an ideal solution for education, HR, compliance, and internal knowledge systems.

Solution Architecture Overview

We use a fully serverless AWS architecture to achieve:

Automatic scaling
High availability
Low operational cost
Minimal infrastructure management

Key Components Explained:

Knowledge Base (Amazon S3 + S3 Vectors)

School documents are stored in Amazon S3
Amazon Bedrock converts documents into vector embeddings using the Titan Embeddings model.
These embeddings are stored in S3 Vectors for fast semantic search.

Why S3 Vectors?

Fully serverless (no cluster management)
Cost-effective for RAG workloads
Seamless upgrade path to Amazon OpenSearch for advanced search needs

Amazon Bedrock Agent (RAG Intelligence Layer)

The Bedrock Agent acts as the brain of the chatbot:

Embeds the user’s question
Searches the vectorized knowledge base
Retrieves relevant context
Generates grounded responses

This ensures accuracy, relevance, and traceability.

API Gateway and Lambda (Serverless Backend)

Amazon API Gateway exposes a secure HTTP endpoint
AWS Lambda invokes the Bedrock Agent
Event-driven execution keeps costs low, and scaling is automatic

Frontend Chatbot (Amazon S3 Static Website)

Chatbot UI hosted on Amazon S3
Lightweight and highly available
Optional Amazon CloudFront for HTTPS, caching, and WAF protection

Step-by-step implementation

Step 1: Create the Knowledge Base

Create an Amazon S3 bucket

Head to the AWS Management Console and create an S3 bucket for the documents.

Upload school documents (PDFs, text files, etc.)

Create a Knowledge Base using the S3 bucket. The above S3 bucket will serve as our data source for the knowledge base (managed RAG).

Head over to the Bedrock service page in the AWS Console. Look for the Build section in the left-hand navigation, and select Knowledge Base. From there, the console will guide you through the initial setup steps. We have included screenshots below to walk you through the entire process visually.

Verify that the vector bucket is created by navigating to the following location -

Step 2: Choose the LLM (Amazon Nova Lite)

This is arguably the most critical decision, as your LLM selection is a direct trade-off between operational cost and response quality tailored to your specific use case.

We selected Amazon Nova Lite because it provides:

Low latency
Cost-efficient inference
Strong performance for RAG use cases

It is ideal for real-time conversational workloads.

Amazon Bedrock offers a wide range of foundation models. You can explore the full catalog and its respective capabilities HERE.

Step 3: Configure the Bedrock Agent

With the Knowledge Base ready, the next logical step is to create the Bedrock Agent that will utilize it. You'll find Agents located under the Build section in the left panel. Simply click there and follow the console instructions to define your agent. For a quick visual walkthrough, reference the screenshots provided below.

Create an agent in Amazon Bedrock.

Select Nova Lite as the foundation model.

You can find “Instructions for the Agent” in the GitHub repository HERE.

Attach the Knowledge Base

💡 Tip: Save the agent before attaching the Knowledge Base to avoid configuration errors.

Once the knowledge is attached successfully to the Agent, click on the “Prepare” button as shown in the below screenshot.

Step 4: Backend Integration

Now that our Bedrock Agent is fully configured (which serves as our backend core), we need a public, scalable interface for the frontend chatbot application to securely interact with it.

This interface is a classic serverless pattern: using Amazon API Gateway as the secure HTTP endpoint and AWS Lambda as the compute layer to orchestrate the request. The Lambda function acts as the handler, taking the user's query from the chatbot frontend and passing it directly to the Bedrock Agent.

Create an AWS Lambda function.

The Lambda code is available on the GitHub repository HERE.

Grant bedrock: InvokeAgent permission

The Lambda Execution Role would need permission to invoke an agent.
Below IAM policy is available on the GitHub repository HERE.

Integrate Lambda with Amazon API Gateway

Check our YouTube video for a tutorial on creating an API Gateway and Lambda integration.

Step 5: Deploy the Chatbot Website

The final step is deploying the frontend interface - the actual chatbot widget on the school website - where users will interact with the agent. To achieve maximum availability, scalability, and cost-efficiency, we will host the static assets using Amazon S3 Static Website Hosting.

This approach ensures your chatbot widget is always available on the main school site. The process involves three simple steps:

Bucket Configuration: Create an S3 bucket and enable Static Website Hosting, configuring the appropriate Index and Error documents.
Asset Upload: Upload all HTML, CSS, JavaScript to this S3 bucket.
Access Control: Ensure the bucket policy grants public read access, allowing the website content to be served correctly.

Below files are available on the GitHub repository HERE.

Enable S3 static website hosting and have an appropriate bucket policy.

For a production environment, we highly recommend using Amazon CloudFront with the S3 bucket as its origin. This provides better security, lower latency via the edge network, and allows you to keep the S3 bucket fully private. However, for simplicity in this tutorial, we opted for direct S3 hosting.

Final Outcome

After all the configuration and code deployment, the final and most satisfying result is the fully operational School Assistant chatbot. Below, you can see the assistant handling a couple of real-world queries, demonstrating how it correctly retrieves and grounds answers using the Knowledge Base.

Meet the School Assistant

Uses verified, document-based knowledge
Delivers accurate, explainable answers
Scales automatically with traffic
Runs at minimal cost

This is production-ready RAG with serverless simplicity.

Cleanup

While the components we used (S3, Lambda, API Gateway) are very cost-effective and offer generous free tiers, the key components for cost management are the Amazon Bedrock Agent and its associated resources. If you are finished with your prototype, cleaning up is essential to prevent ongoing charges.

1.1 Delete the Bedrock Agent:

Navigate to the Amazon Bedrock console, find the Agents section, and delete the agent you created (School Assistant).

1.2 Delete the Knowledge Base:

In the Bedrock console, go to Knowledge bases. Delete the Knowledge Base you created.

1.3 Delete the S3 Bucket and Lambda Function:

S3 Buckets: You must first empty the S3 bucket used for both your Knowledge Base data and your static website before you can delete the bucket itself.
Lambda Function: Delete the Lambda function that served as your API handler.

1.4 Delete the API Gateway:

In the API Gateway console, delete the HTTP API that exposed your Lambda handler.

1.5 Review IAM Roles:

Finally, review the IAM Roles created for the Bedrock Agent and the Lambda function. While these generally incur no cost, deleting them is a good security practice to maintain the principle of least privilege.

Conclusion

We have successfully walked through the entire process of building a highly effective, cost-optimized, and scalable School Assistant powered by Amazon Bedrock. By combining the retrieval power of the Knowledge Base with the efficiency of the Amazon Nova Lite model and tying it all together with a serverless API layer, we have created a truly intelligent application. We encourage you to use the code repository we have provided to deploy this solution today.

We’ll continue building on this foundation, explore how the architecture can be extended to support additional real-world use cases, and keep sharing our knowledge along the way.

Top comments (2)

Ratish jain • Jan 20

Real challenge comes in handling all kinds of queries properly

As user can practically ask anything

Girish Mukim • Jan 21

@ratishjain12, I agree. ensuring responses are grounded is critical. Bedrock guardrails should be used effectively to avoid answering unrelated topic questions.