Lingesh B for AWS Community Builders

Posted on Apr 25 • Originally published at awstip.com

Building an Automated AWS Security Advisor: RAG with AWS Bedrock and OpenSearch Serverless

#ai #aws #rag #tutorial

The Problem: Security Posture Debt at Scale

In large AWS environments spanning multiple accounts, developers and engineers create cloud resources every day — EC2 instances, S3 buckets, ECS clusters, EKS clusters, RDS databases, Lambda functions, VPCs and so many other resources. But it also means security best practices often get deprioritized in the heat of delivery.

The result? AWS Security Hub flags dozens of findings every week. Resources that don’t conform to CIS AWS Foundations Benchmark, PCI DSS controls, or AWS Foundational Security Best Practices (FSBP) accumulate a growing backlog. The security team then has to chase down resource owners, explain what’s wrong, and guide them through remediation — a reactive, time-intensive process.

What if you could shift security left by providing developers with an AI-powered Security Advisor? By leveraging Retrieval-Augmented Generation (RAG), we can build a system that crawls the latest official security standards and provides actionable, real-time remediation advice.

Reactive Mode

User provisions non-compliant resource → Security Hub flags it → Security team investigates → Manually notifies resource owner → Owner remediates (eventually)

Proactive Mode

User asks chatbot “how should I configure my S3 bucket securely?” → RAG retrieves exact CIS / PCI DSS controls → User gets actionable guidance before provisioning

The Solution: A RAG-Powered Security Advisor Chatbot
Build an internal security advisor chatbot powered by Retrieval-Augmented Generation (RAG) using AWS Bedrock Knowledge Bases. The system ingests official AWS security standard documentation, indexes it in a vector store, and answers natural language questions from developers with grounded, citation-backed responses.

The core premise: instead of security standards living in PDFs that no one reads, they become a queryable, conversational knowledge layer that any developer can access in seconds — directly integrated into their existing workflow.

Architecture Overview
🕸Data Sources

Web Crawler pulling CIS Benchmarks, PCI DSS controls, and AWS FSBP documentation from official URLs

🧠Bedrock Knowledge Base

Managed RAG service handling chunking, embedding generation, and retrieval orchestration

🔍OpenSearch Serverless

Vector store for semantic search — scales automatically, no cluster management overhead

💬Claude on Bedrock

Foundation model for response generation — grounded in retrieved context, not hallucinations

🛡️Security Hub

Posture score baseline — the north star metric our chatbot helps improve over time

👤Developer Interface

Chat UI exposed to development teams — internal Slack bot, portal, or CLI wrapper

Building the Knowledge Base in AWS Bedrock
AWS Bedrock Knowledge Bases is a managed service that abstracts the heavy lifting of a RAG pipeline — document ingestion, chunking strategy, embedding model selection, vector store integration, and retrieval. For our use case, it was the natural choice because we needed production-grade reliability without building custom orchestration.

(a) Create Knowledge Base with vector store

(b) Choose “Web Crawler” as data source

(c ) Enter below URL’s as source URL’s

AWS Foundational Security Best Practices standard in Security Hub CSPM
Learn about the AWS Foundational Security Best Practices standard and the applicable security controls in AWS Security…
docs.aws.amazon.com

CIS AWS Foundations Benchmark in Security Hub CSPM
The Center for Internet Security (CIS) AWS Foundations Benchmark serves as a set of security configuration best…
docs.aws.amazon.com

PCI DSS in Security Hub CSPM
AWS Security Hub CSPM supports v.3.2.1 and v4.0.1 of the Payment Card Industry Data Security Standard (PCI DSS). You…
docs.aws.amazon.com

(d) Continue with default selections for Sync scope, Parsing and chunking

(e) Choose “Amazon Titan Text Embeddingsv2” model as Embedding model

(f) Select quick vector store creation

(g) Wait for Knowledge Base and Vector database creation to complete

(h) Once Knowledge Base is created, select the data source and click on sync which will populate the OpenSearch serverless collection(vector database) with embeddings

Vector Store: OpenSearch Serverless
For a production-grade RAG system, a serverless vector database is ideal. It handles the indexing of high-dimensional embeddings without the overhead of managing clusters.

Collection Type: Vector Search.
Logic: When the web crawler ingests data, it breaks the text into chunks, converts them into vectors (using a model like Titan Text Embeddings), and stores them in OSS.
I chose OpenSearch Serverless over the other supported vector stores (Pinecone, Redis Enterprise, Aurora PostgreSQL) for a specific reason: it’s native AWS, supports IAM-based access control, integrates seamlessly with VPC endpoint policies, and removes the operational overhead of managing an OpenSearch cluster entirely.

Why serverless over provisioned OpenSearch?
A knowledge base for internal developer queries has a spiky, unpredictable query pattern — zero traffic at night, bursts during working hours and incident response. Serverless OCUs (OpenSearch Compute Units) scale to zero and burst automatically, making it significantly more cost-efficient for this use case than a provisioned domain with fixed shard capacity.

Important: Bedrock Knowledge Bases requires the OpenSearch Serverless collection to be of type vector search, not time series or search. Set this at collection creation — it cannot be changed later.

Data Ingestion: Web Crawler for Security Standards
The choice of data source is what makes this system genuinely authoritative. Rather than uploading stale PDFs, we pointed the Bedrock web crawler at official, continuously-maintained documentation URLs.

URLs ingested: CIS AWS Foundations Benchmark,PCI DSS v4.0 Requirements, AWS FSBP Controls, AWS Security Hub Docs

Web crawler ingestion means our knowledge base stays current when AWS updates control documentation or when PCI DSS guidance is revised — we just re-run the sync job, no manual uploads needed.

The crawler handles pagination automatically, and Bedrock's sync job can be scheduled or triggered via EventBridge for freshness.

Test the Knowledge Base
Perform a test whether Knowledge Base(RAG) delivers answers as expected based on the security standard recommendations before integrating with your Chatbot app

Example:

Query: “What’s the Security Hub posture score impact if I leave port 22 open to 0.0.0.0/0 on my security group?”

Response : Unrestricted SSH access (0.0.0.0/0 on port 22) violates EC2.19 in AWS FSBP and CIS control 5.2. Security Hub assigns this a HIGH severity finding. It will negatively impact your overall posture score, especially within the Network Security category. Restrict inbound SSH to known IP ranges using your VPN CIDR, or use AWS Systems Manager Session Manager to eliminate the need for SSH entirely.

The RAG Pipeline in Action When a developer asks a question, here’s what happens end-to-end — in under 3 seconds:

1.User query arrives

“What are the CIS benchmark requirements for S3 bucket encryption?”

↓

2.Query embedding

Bedrock embeds the query using Titan Embeddings v2, producing a 1536-dim vector representation of the semantic intent.

↓

3.Vector retrieval from OpenSearch Serverless

Approximate nearest-neighbor search retrieves the top-K most semantically similar chunks from the indexed security standards. K=5 by default, tunable.

↓

4.Augmented prompt construction

Retrieved chunks are injected into a structured prompt alongside the user query. Source URLs are preserved for citation.

↓

5.Response generation via Claude on Bedrock

The foundation model generates a grounded, structured response citing specific control IDs — never fabricating controls that don’t exist in the retrieved context.

Key configuration decisions
When creating the Knowledge Base, we made the following choices that significantly impacted retrieval quality:

1.Embedding model: Amazon Titan Embeddings v2

Optimized for English technical documentation. Produces 1536-dimensional dense vectors. Good semantic fidelity for regulatory and standards language.

↓

Chunking strategy:

Automatically splits text into chunks of 300 tokens in size

↓

Vector store: OpenSearch Serverless collection

Created a dedicated vector search collection type. Bedrock auto-creates the index schema and handles sync.

↓

4.Retrieval: Hybrid search (semantic + keyword)

Combining dense vector search with BM25 keyword matching improves recall for specific control IDs like “CIS 2.1.2” or “PCI DSS Req 6.4”.

The choice of data source is what makes this system genuinely authoritative. Rather than uploading stale PDFs, we pointed the Bedrock web crawler at official, continuously-maintained documentation URLs.

Web crawler ingestion means our knowledge base stays current when AWS updates control documentation or when PCI DSS guidance is revised — we just re-run the sync job, no manual uploads needed.

Implementation Architecture
The flow works as follows:

Frontend: A Slack bot, Microsoft Teams app, or a simple Streamlit web UI.
API Layer: Amazon API Gateway triggers a Lambda function.
Logic Layer: AWS Lambda calls the Bedrock RetrieveAndGenerateAPI.
Data Layer: Bedrock queries OpenSearch Serverless and generates a response using a model like Claude or Amazon Nova

Conclusion
Security posture improvement doesn’t have to be a reactive, ticket-driven grind. By treating security standards as a living knowledge base — queryable, conversational, and always current — you can shift security left and make best practices the path of least resistance for your development teams.

AWS Bedrock Knowledge Bases, OpenSearch Serverless, and web crawler ingestion make this remarkably accessible to build. The hardest part isn’t the technology — it’s getting developers to use the chatbot instead of guessing. Make it fast, make it actionable, and make it available where they already work.

"How are you currently handling security remediation in your organization? Have you experimented with RAG for internal documentation yet? Let’s discuss in the comments!"

Top comments (1)

Lingesh B AWS Community Builders • Apr 25

"I built this because our security backlog was becoming a bottleneck for our sprint velocity. If anyone has questions about AWS Bedrock, Knowledge Base,OpenSearch Serverless scaling or cost, feel free to ask!"