Table of Contents
- Overview
- Architecture
- Prerequisites
- Phase 1: ROSA Cluster Setup
- Phase 2: Red Hat OpenShift AI Installation
- Phase 3: Amazon Bedrock Integration via PrivateLink
- Phase 4: AWS Glue Data Pipeline
- Phase 5: Milvus Vector Database Deployment
- Phase 6: RAG Application Deployment
- Testing and Validation
Overview
Project Purpose
This platform provides an enterprise-grade Retrieval-Augmented Generation (RAG) solution that addresses the primary concern of enterprises: data privacy and security. By leveraging Red Hat OpenShift on AWS (ROSA) to control the data plane while using Amazon Bedrock for AI capabilities, organizations maintain complete control over their sensitive data while accessing state-of-the-art language models.
Key Value Propositions
- Privacy-First Architecture: All sensitive data remains within your controlled OpenShift environment
- Secure Connectivity: AWS PrivateLink ensures AI model calls never traverse the public internet
- Enterprise Compliance: Meets stringent data governance and compliance requirements
- Scalable Infrastructure: Leverages Kubernetes orchestration for production-grade reliability
- Best-of-Breed Components: Combines Red Hat's enterprise Kubernetes with AWS's managed AI services
Solution Components
| Component | Purpose | Layer |
|---|---|---|
| ROSA | Managed OpenShift cluster on AWS | Infrastructure |
| Red Hat OpenShift AI | Model serving gateway and ML platform | Control Plane |
| Amazon Bedrock | Claude 3.5 Sonnet LLM access | Intelligence Plane |
| AWS PrivateLink | Secure private connectivity | Network Security |
| AWS Glue | Document processing and ETL | Data Pipeline |
| Amazon S3 | Document storage | Data Lake |
| Milvus | Vector database for embeddings | Data Plane |
Architecture
High-Level Architecture Diagram
┌─────────────────────────────────────────────────────────────────┐
│ AWS Cloud │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ ROSA Cluster (VPC) │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ Red Hat OpenShift AI │ │ │
│ │ │ ┌────────────────┐ ┌──────────────────────┐ │ │ │
│ │ │ │ Model Serving │ │ RAG Application │ │ │ │
│ │ │ │ Gateway │◄─────┤ (FastAPI/Flask) │ │ │ │
│ │ │ └────────┬───────┘ └──────────┬───────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ └───────────┼─────────────────────────┼───────────────┘ │ │
│ │ │ │ │ │
│ │ │ ┌───────────────▼──────────────┐ │ │
│ │ │ │ Milvus Vector Database │ │ │
│ │ │ │ (Embeddings & Metadata) │ │ │
│ │ │ └──────────────────────────────┘ │ │
│ └──────────────┼──────────────────────────────────────────┘ │
│ │ │
│ │ AWS PrivateLink (Private Connectivity) │
│ │ │
│ ┌──────────────▼──────────────┐ ┌──────────────────────┐ │
│ │ Amazon Bedrock │ │ AWS Glue │ │
│ │ (Claude 3.5 Sonnet) │ │ ┌────────────────┐ │ │
│ │ - Text Generation │ │ │ Glue Crawler │ │ │
│ │ - Embeddings │ │ ├────────────────┤ │ │
│ └─────────────────────────────┘ │ │ ETL Jobs │ │ │
│ │ └────────┬───────┘ │ │
│ └───────────┼──────────┘ │
│ │ │
│ ┌───────────▼──────────┐ │
│ │ Amazon S3 │ │
│ │ (Document Store) │ │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Data Flow
- Document Ingestion: Documents uploaded to S3 bucket
- ETL Processing: AWS Glue crawler discovers and processes documents
- Embedding Generation: Processed documents sent to Bedrock for embedding generation
- Vector Storage: Embeddings stored in Milvus running on ROSA
- Query Processing: User queries received by RAG application
- Vector Search: Application searches Milvus for relevant document chunks
- Context Retrieval: Relevant chunks retrieved from vector database
- LLM Inference: RHOAI gateway forwards prompt + context to Bedrock via PrivateLink
- Response Generation: Claude 3.5 generates response based on retrieved context
- Response Delivery: Answer returned to user through application
Security Architecture
- Network Isolation: ROSA cluster in private subnets with no public ingress
- PrivateLink Encryption: All Bedrock API calls encrypted in transit via AWS PrivateLink
- Data Sovereignty: Document content never leaves controlled environment
- RBAC: OpenShift role-based access control for all components
- Secrets Management: OpenShift secrets for API keys and credentials
Prerequisites
Required Accounts and Subscriptions
- [ ] AWS Account with administrative access
- [ ] Red Hat Account with OpenShift subscription
- [ ] ROSA Enabled in your AWS account (Enable ROSA)
- [ ] Amazon Bedrock Access with Claude 3.5 Sonnet model enabled in your region
Required Tools
Install the following CLI tools on your workstation:
# AWS CLI (v2)
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
# ROSA CLI
wget https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz
tar -xvf rosa-linux.tar.gz
sudo mv rosa /usr/local/bin/rosa
rosa version
# OpenShift CLI (oc)
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz
tar -xvf openshift-client-linux.tar.gz
sudo mv oc kubectl /usr/local/bin/
oc version
# Helm (v3)
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version
AWS Prerequisites
Service Quotas
Verify you have adequate service quotas in your target region:
# Check EC2 vCPU quota (need at least 100 for production ROSA)
aws service-quotas get-service-quota \
--service-code ec2 \
--quota-code L-1216C47A \
--region us-east-1
# Check VPC quota
aws service-quotas get-service-quota \
--service-code vpc \
--quota-code L-F678F1CE \
--region us-east-1
IAM Permissions
Your AWS IAM user/role needs permissions for:
- EC2 (VPC, subnets, security groups, instances)
- IAM (roles, policies)
- S3 (buckets, objects)
- Bedrock (InvokeModel, InvokeModelWithResponseStream)
- Glue (crawlers, jobs, databases)
- CloudWatch (logs, metrics)
Knowledge Prerequisites
You should be familiar with:
- AWS fundamentals (VPC, IAM, S3)
- Kubernetes basics (pods, deployments, services)
- Basic Linux command line
- YAML configuration files
- REST APIs and HTTP concepts
Phase 1: ROSA Cluster Setup
Step 1.1: Configure AWS CLI
# Configure AWS credentials
aws configure
# Verify configuration
aws sts get-caller-identity
Step 1.2: Initialize ROSA
# Log in to Red Hat
rosa login
# Verify ROSA prerequisites
rosa verify quota
rosa verify permissions
# Initialize ROSA in your AWS account (one-time setup)
rosa init
Step 1.3: Create ROSA Cluster
Create a ROSA cluster with appropriate specifications for the RAG workload:
# Set environment variables
export CLUSTER_NAME="rag-platform"
export AWS_REGION="us-east-1"
export MULTI_AZ="true"
export MACHINE_TYPE="m5.2xlarge"
export COMPUTE_NODES=3
# Create ROSA cluster (takes ~40 minutes)
rosa create cluster \
--cluster-name $CLUSTER_NAME \
--region $AWS_REGION \
--multi-az \
--compute-machine-type $MACHINE_TYPE \
--compute-nodes $COMPUTE_NODES \
--machine-cidr 10.0.0.0/16 \
--service-cidr 172.30.0.0/16 \
--pod-cidr 10.128.0.0/14 \
--host-prefix 23 \
--yes
Configuration Rationale:
- m5.2xlarge: 8 vCPUs, 32 GB RAM per node - suitable for vector database and ML workloads
- 3 nodes: High availability across multiple availability zones
- Multi-AZ: Ensures resilience against AZ failures
Step 1.4: Monitor Cluster Creation
# Watch cluster installation progress
rosa logs install --cluster=$CLUSTER_NAME --watch
# Check cluster status
rosa describe cluster --cluster=$CLUSTER_NAME
Wait until the cluster state shows ready.
Step 1.5: Create Admin User
# Create cluster admin user
rosa create admin --cluster=$CLUSTER_NAME
# Save the login command output - it will look like:
# oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \
# --username cluster-admin \
# --password <generated-password>
Step 1.6: Connect to Cluster
# Use the login command from previous step
oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \
--username cluster-admin \
--password <your-password>
# Verify cluster access
oc cluster-info
oc get nodes
oc get projects
Step 1.7: Create Project Namespaces
# Create namespace for RHOAI
oc new-project redhat-ods-applications
# Create namespace for RAG application
oc new-project rag-application
# Create namespace for Milvus
oc new-project milvus
Phase 2: Red Hat OpenShift AI Installation
Step 2.1: Install OpenShift AI Operator
# Create operator subscription
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: redhat-ods-operator
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
name: redhat-ods-operator
namespace: redhat-ods-operator
spec: {}
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: rhods-operator
namespace: redhat-ods-operator
spec:
channel: stable
name: rhods-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
installPlanApproval: Automatic
EOF
Step 2.2: Verify Operator Installation
# Wait for operator to be ready (takes 3-5 minutes)
oc get csv -n redhat-ods-operator -w
# Verify operator is running
oc get pods -n redhat-ods-operator
You should see the rhods-operator pod in Running state.
Step 2.3: Create DataScienceCluster
# Create the DataScienceCluster custom resource
cat <<EOF | oc apply -f -
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
name: default-dsc
spec:
components:
codeflare:
managementState: Removed
dashboard:
managementState: Managed
datasciencepipelines:
managementState: Managed
kserve:
managementState: Managed
serving:
ingressGateway:
certificate:
type: SelfSigned
managementState: Managed
name: knative-serving
modelmeshserving:
managementState: Managed
ray:
managementState: Removed
workbenches:
managementState: Managed
EOF
Step 2.4: Verify RHOAI Installation
# Check DataScienceCluster status
oc get datasciencecluster -n redhat-ods-operator
# Verify all RHOAI components are running
oc get pods -n redhat-ods-applications
oc get pods -n redhat-ods-monitoring
# Get RHOAI dashboard URL
oc get route rhods-dashboard -n redhat-ods-applications -o jsonpath='{.spec.host}'
Access the dashboard URL in your browser and log in with your OpenShift credentials.
Step 2.5: Configure Model Serving
Create a serving runtime for Amazon Bedrock integration:
# Create custom serving runtime for Bedrock
cat <<EOF | oc apply -f -
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: bedrock-runtime
namespace: rag-application
labels:
opendatahub.io/dashboard: "true"
spec:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8080"
containers:
- name: kserve-container
image: quay.io/modh/rest-proxy:latest
env:
- name: AWS_REGION
value: "us-east-1"
- name: BEDROCK_ENDPOINT_URL
value: "bedrock-runtime.us-east-1.amazonaws.com"
ports:
- containerPort: 8080
protocol: TCP
resources:
limits:
cpu: "2"
memory: 4Gi
requests:
cpu: "1"
memory: 2Gi
supportedModelFormats:
- autoSelect: true
name: bedrock
EOF
Phase 3: Amazon Bedrock Integration via PrivateLink
This phase establishes secure, private connectivity between your ROSA cluster and Amazon Bedrock using AWS PrivateLink.
Step 3.1: Enable Amazon Bedrock
# Enable Bedrock in your region (if not already enabled)
aws bedrock list-foundation-models --region us-east-1
# Request access to Claude 3.5 Sonnet (if needed)
# Go to AWS Console > Bedrock > Model access
# Or use the CLI:
aws bedrock put-model-invocation-logging-configuration \
--region us-east-1 \
--logging-config '{"cloudWatchConfig":{"logGroupName":"/aws/bedrock/modelinvocations","roleArn":"arn:aws:iam::ACCOUNT_ID:role/BedrockLoggingRole"}}'
Step 3.2: Identify ROSA VPC
# Get the VPC ID of your ROSA cluster
export ROSA_VPC_ID=$(aws ec2 describe-vpcs \
--filters "Name=tag:Name,Values=*${CLUSTER_NAME}*" \
--query 'Vpcs[0].VpcId' \
--output text \
--region $AWS_REGION)
echo "ROSA VPC ID: $ROSA_VPC_ID"
# Get private subnet IDs
export PRIVATE_SUBNET_IDS=$(aws ec2 describe-subnets \
--filters "Name=vpc-id,Values=$ROSA_VPC_ID" "Name=tag:Name,Values=*private*" \
--query 'Subnets[*].SubnetId' \
--output text \
--region $AWS_REGION)
echo "Private Subnets: $PRIVATE_SUBNET_IDS"
Step 3.3: Create VPC Endpoint for Bedrock
# Create security group for VPC endpoint
export VPC_ENDPOINT_SG=$(aws ec2 create-security-group \
--group-name bedrock-vpc-endpoint-sg \
--description "Security group for Bedrock VPC endpoint" \
--vpc-id $ROSA_VPC_ID \
--region $AWS_REGION \
--output text \
--query 'GroupId')
echo "VPC Endpoint Security Group: $VPC_ENDPOINT_SG"
# Allow HTTPS traffic from ROSA worker nodes
aws ec2 authorize-security-group-ingress \
--group-id $VPC_ENDPOINT_SG \
--protocol tcp \
--port 443 \
--cidr 10.0.0.0/16 \
--region $AWS_REGION
# Create VPC endpoint for Bedrock Runtime
export BEDROCK_VPC_ENDPOINT=$(aws ec2 create-vpc-endpoint \
--vpc-id $ROSA_VPC_ID \
--vpc-endpoint-type Interface \
--service-name com.amazonaws.${AWS_REGION}.bedrock-runtime \
--subnet-ids $PRIVATE_SUBNET_IDS \
--security-group-ids $VPC_ENDPOINT_SG \
--private-dns-enabled \
--region $AWS_REGION \
--output text \
--query 'VpcEndpoint.VpcEndpointId')
echo "Bedrock VPC Endpoint: $BEDROCK_VPC_ENDPOINT"
# Wait for VPC endpoint to be available
aws ec2 wait vpc-endpoint-available \
--vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT \
--region $AWS_REGION
echo "VPC Endpoint is now available"
Step 3.4: Create IAM Role for Bedrock Access
# Create IAM policy for Bedrock access
cat > bedrock-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:${AWS_REGION}::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0"
]
}
]
}
EOF
aws iam create-policy \
--policy-name BedrockInvokePolicy \
--policy-document file://bedrock-policy.json \
--region $AWS_REGION
# Create trust policy for ROSA service account
export OIDC_PROVIDER=$(rosa describe cluster -c $CLUSTER_NAME -o json | jq -r .aws.sts.oidc_endpoint_url | sed 's|https://||')
cat > trust-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):oidc-provider/${OIDC_PROVIDER}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_PROVIDER}:sub": "system:serviceaccount:rag-application:bedrock-sa"
}
}
}
]
}
EOF
# Create IAM role
export BEDROCK_ROLE_ARN=$(aws iam create-role \
--role-name rosa-bedrock-access \
--assume-role-policy-document file://trust-policy.json \
--query 'Role.Arn' \
--output text)
echo "Bedrock IAM Role ARN: $BEDROCK_ROLE_ARN"
# Attach policy to role
aws iam attach-role-policy \
--role-name rosa-bedrock-access \
--policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy
Step 3.5: Create Service Account in OpenShift
# Create service account with IAM role annotation
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
name: bedrock-sa
namespace: rag-application
annotations:
eks.amazonaws.com/role-arn: $BEDROCK_ROLE_ARN
EOF
# Verify service account
oc get sa bedrock-sa -n rag-application
Step 3.6: Test Bedrock Connectivity
# Create test pod with AWS CLI
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
name: bedrock-test
namespace: rag-application
spec:
serviceAccountName: bedrock-sa
containers:
- name: aws-cli
image: amazon/aws-cli:latest
command: ["/bin/sleep", "3600"]
env:
- name: AWS_REGION
value: "$AWS_REGION"
EOF
# Wait for pod to be ready
oc wait --for=condition=ready pod/bedrock-test -n rag-application --timeout=300s
# Test Bedrock API call
oc exec -n rag-application bedrock-test -- aws bedrock-runtime invoke-model \
--model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \
--content-type application/json \
--accept application/json \
--body '{"anthropic_version":"bedrock-2023-05-31","max_tokens":100,"messages":[{"role":"user","content":"Hello, this is a test"}]}' \
/tmp/response.json
# Check the response
oc exec -n rag-application bedrock-test -- cat /tmp/response.json
# Clean up test pod
oc delete pod bedrock-test -n rag-application
If successful, you should see a JSON response from Claude.
Phase 4: AWS Glue Data Pipeline
This phase sets up AWS Glue to process documents from S3 and prepare them for vectorization.
Step 4.1: Create S3 Bucket for Documents
# Create S3 bucket (name must be globally unique)
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export BUCKET_NAME="rag-documents-${ACCOUNT_ID}"
aws s3 mb s3://$BUCKET_NAME --region $AWS_REGION
# Enable versioning
aws s3api put-bucket-versioning \
--bucket $BUCKET_NAME \
--versioning-configuration Status=Enabled \
--region $AWS_REGION
# Create folder structure
aws s3api put-object --bucket $BUCKET_NAME --key raw-documents/
aws s3api put-object --bucket $BUCKET_NAME --key processed-documents/
aws s3api put-object --bucket $BUCKET_NAME --key embeddings/
echo "S3 Bucket created: s3://$BUCKET_NAME"
Step 4.2: Create IAM Role for Glue
# Create trust policy for Glue
cat > glue-trust-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "glue.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
EOF
# Create Glue service role
aws iam create-role \
--role-name AWSGlueServiceRole-RAG \
--assume-role-policy-document file://glue-trust-policy.json
# Attach AWS managed policy
aws iam attach-role-policy \
--role-name AWSGlueServiceRole-RAG \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole
# Create custom policy for S3 access
cat > glue-s3-policy.json <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": [
"arn:aws:s3:::${BUCKET_NAME}/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::${BUCKET_NAME}"
]
}
]
}
EOF
aws iam put-role-policy \
--role-name AWSGlueServiceRole-RAG \
--policy-name S3Access \
--policy-document file://glue-s3-policy.json
Step 4.3: Create Glue Database
# Create Glue database
aws glue create-database \
--database-input '{
"Name": "rag_documents_db",
"Description": "Database for RAG document metadata"
}' \
--region $AWS_REGION
# Verify database creation
aws glue get-database --name rag_documents_db --region $AWS_REGION
Step 4.4: Create Glue Crawler
# Create crawler for raw documents
aws glue create-crawler \
--name rag-document-crawler \
--role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \
--database-name rag_documents_db \
--targets '{
"S3Targets": [
{
"Path": "s3://'$BUCKET_NAME'/raw-documents/"
}
]
}' \
--schema-change-policy '{
"UpdateBehavior": "UPDATE_IN_DATABASE",
"DeleteBehavior": "LOG"
}' \
--region $AWS_REGION
# Start the crawler
aws glue start-crawler --name rag-document-crawler --region $AWS_REGION
echo "Glue crawler created and started"
Step 4.5: Create Glue ETL Job
Create a Python script for document processing:
# Create ETL script
cat > glue-etl-script.py <<'PYTHON_SCRIPT'
import sys
import boto3
import json
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
# Initialize
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'BUCKET_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
bucket_name = args['BUCKET_NAME']
s3_client = boto3.client('s3')
# Read documents from Glue catalog
datasource = glueContext.create_dynamic_frame.from_catalog(
database="rag_documents_db",
table_name="raw_documents"
)
# Document processing function
def process_document(record):
"""
Process document: chunk text, extract metadata
"""
# Simple chunking strategy (500 chars with 50 char overlap)
text = record.get('content', '')
chunk_size = 500
overlap = 50
chunks = []
for i in range(0, len(text), chunk_size - overlap):
chunk = text[i:i + chunk_size]
if chunk:
chunks.append({
'document_id': record.get('document_id'),
'chunk_id': f"{record.get('document_id')}_{i}",
'chunk_text': chunk,
'chunk_index': i // (chunk_size - overlap),
'metadata': {
'source': record.get('source', ''),
'timestamp': record.get('timestamp', ''),
'file_type': record.get('file_type', '')
}
})
return chunks
# Process and write to S3
def process_and_write():
records = datasource.toDF().collect()
all_chunks = []
for record in records:
chunks = process_document(record.asDict())
all_chunks.extend(chunks)
# Write chunks to S3 as JSON
for chunk in all_chunks:
key = f"processed-documents/{chunk['chunk_id']}.json"
s3_client.put_object(
Bucket=bucket_name,
Key=key,
Body=json.dumps(chunk),
ContentType='application/json'
)
print(f"Processed {len(all_chunks)} chunks from {len(records)} documents")
process_and_write()
job.commit()
PYTHON_SCRIPT
# Upload script to S3
aws s3 cp glue-etl-script.py s3://$BUCKET_NAME/glue-scripts/
# Create Glue job
aws glue create-job \
--name rag-document-processor \
--role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \
--command '{
"Name": "glueetl",
"ScriptLocation": "s3://'$BUCKET_NAME'/glue-scripts/glue-etl-script.py",
"PythonVersion": "3"
}' \
--default-arguments '{
"--BUCKET_NAME": "'$BUCKET_NAME'",
"--job-language": "python",
"--enable-metrics": "true",
"--enable-continuous-cloudwatch-log": "true"
}' \
--glue-version "4.0" \
--max-retries 0 \
--timeout 60 \
--region $AWS_REGION
echo "Glue ETL job created"
Step 4.6: Test Glue Pipeline
# Upload sample document
cat > sample-document.txt <<EOF
This is a sample document for testing the RAG pipeline.
It contains multiple sentences that will be chunked and processed.
The Glue ETL job will extract this content and prepare it for vectorization.
This demonstrates the data pipeline from S3 to processed chunks.
EOF
# Upload to S3
aws s3 cp sample-document.txt s3://$BUCKET_NAME/raw-documents/
# Run crawler to detect new file
aws glue start-crawler --name rag-document-crawler --region $AWS_REGION
# Wait for crawler to complete (check status)
aws glue get-crawler --name rag-document-crawler --region $AWS_REGION --query 'Crawler.State'
# Run ETL job
aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION
# Check processed outputs
sleep 60
aws s3 ls s3://$BUCKET_NAME/processed-documents/
Phase 5: Milvus Vector Database Deployment
Deploy Milvus on your ROSA cluster to store and search document embeddings.
Step 5.1: Install Milvus Operator
# Add Milvus Helm repository
helm repo add milvus https://milvus-io.github.io/milvus-helm/
helm repo update
# Install Milvus operator
helm install milvus-operator milvus/milvus-operator \
--namespace milvus \
--create-namespace \
--set operator.image.tag=v0.9.0
# Verify operator installation
oc get pods -n milvus
Step 5.2: Create Persistent Storage
# Create PersistentVolumeClaims for Milvus
cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: milvus-etcd-pvc
namespace: milvus
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: gp3-csi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: milvus-minio-pvc
namespace: milvus
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: gp3-csi
EOF
Step 5.3: Deploy Milvus Cluster
# Create Milvus cluster configuration
cat > milvus-values.yaml <<EOF
cluster:
enabled: true
service:
type: ClusterIP
port: 19530
standalone:
replicas: 1
resources:
limits:
cpu: "4"
memory: 8Gi
requests:
cpu: "2"
memory: 4Gi
etcd:
replicaCount: 1
persistence:
enabled: true
existingClaim: milvus-etcd-pvc
minio:
mode: standalone
persistence:
enabled: true
existingClaim: milvus-minio-pvc
pulsar:
enabled: false
kafka:
enabled: false
metrics:
enabled: true
serviceMonitor:
enabled: true
EOF
# Install Milvus
helm install milvus milvus/milvus \
--namespace milvus \
--values milvus-values.yaml \
--wait
# Verify Milvus installation
oc get pods -n milvus
oc get svc -n milvus
Step 5.4: Configure Milvus Access
# Get Milvus service endpoint
export MILVUS_HOST=$(oc get svc milvus -n milvus -o jsonpath='{.spec.clusterIP}')
export MILVUS_PORT=19530
echo "Milvus Endpoint: $MILVUS_HOST:$MILVUS_PORT"
# Create config map with Milvus connection details
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: milvus-config
namespace: rag-application
data:
MILVUS_HOST: "$MILVUS_HOST"
MILVUS_PORT: "$MILVUS_PORT"
EOF
Step 5.5: Test Milvus Connectivity
# Create test pod with pymilvus
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
name: milvus-test
namespace: rag-application
spec:
containers:
- name: python
image: python:3.11-slim
command: ["/bin/sleep", "3600"]
env:
- name: MILVUS_HOST
valueFrom:
configMapKeyRef:
name: milvus-config
key: MILVUS_HOST
- name: MILVUS_PORT
valueFrom:
configMapKeyRef:
name: milvus-config
key: MILVUS_PORT
EOF
# Wait for pod
oc wait --for=condition=ready pod/milvus-test -n rag-application --timeout=120s
# Install pymilvus and test connection
oc exec -n rag-application milvus-test -- bash -c "
pip install pymilvus && python3 <<PYTHON
from pymilvus import connections, utility
import os
connections.connect(
alias='default',
host=os.environ['MILVUS_HOST'],
port=os.environ['MILVUS_PORT']
)
print('Connected to Milvus successfully!')
print('Milvus version:', utility.get_server_version())
PYTHON
"
# Clean up test pod
oc delete pod milvus-test -n rag-application
Step 5.6: Create Milvus Collection
Create a test collection for document embeddings:
# Create initialization job
cat <<EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata:
name: milvus-init
namespace: rag-application
spec:
template:
spec:
containers:
- name: init
image: python:3.11-slim
env:
- name: MILVUS_HOST
valueFrom:
configMapKeyRef:
name: milvus-config
key: MILVUS_HOST
- name: MILVUS_PORT
valueFrom:
configMapKeyRef:
name: milvus-config
key: MILVUS_PORT
command:
- /bin/bash
- -c
- |
pip install pymilvus
python3 <<PYTHON
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection
import os
# Connect to Milvus
connections.connect(
alias='default',
host=os.environ['MILVUS_HOST'],
port=os.environ['MILVUS_PORT']
)
# Define collection schema
fields = [
FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name='chunk_id', dtype=DataType.VARCHAR, max_length=256),
FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=1024),
FieldSchema(name='text', dtype=DataType.VARCHAR, max_length=65535),
FieldSchema(name='metadata', dtype=DataType.JSON)
]
schema = CollectionSchema(
fields=fields,
description='RAG document embeddings collection'
)
# Create collection
collection = Collection(
name='rag_documents',
schema=schema
)
# Create index
index_params = {
'metric_type': 'L2',
'index_type': 'IVF_FLAT',
'params': {'nlist': 128}
}
collection.create_index(
field_name='embedding',
index_params=index_params
)
print(f'Collection created: {collection.name}')
print(f'Number of entities: {collection.num_entities}')
PYTHON
restartPolicy: Never
backoffLimit: 3
EOF
# Check job status
oc logs job/milvus-init -n rag-application
Phase 6: RAG Application Deployment
Deploy the RAG application that orchestrates the entire pipeline.
Step 6.1: Create Application Code
Create the RAG application source code:
# Create application directory structure
mkdir -p rag-app/{src,config,tests}
# Create requirements.txt
cat > rag-app/requirements.txt <<EOF
fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
pymilvus==2.3.3
boto3==1.29.7
langchain==0.0.350
langchain-community==0.0.1
python-dotenv==1.0.0
httpx==0.25.2
EOF
# Create main application
cat > rag-app/src/main.py <<'PYTHON_CODE'
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional, Dict, Any
import os
import json
import boto3
from pymilvus import connections, Collection
import logging
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Initialize FastAPI app
app = FastAPI(
title="Enterprise RAG API",
description="RAG platform using OpenShift AI, Bedrock, and Milvus",
version="1.0.0"
)
# Configuration
MILVUS_HOST = os.getenv("MILVUS_HOST", "milvus.milvus.svc.cluster.local")
MILVUS_PORT = int(os.getenv("MILVUS_PORT", "19530"))
AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
BEDROCK_MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"
COLLECTION_NAME = "rag_documents"
# Initialize clients
bedrock_runtime = None
milvus_collection = None
@app.on_event("startup")
async def startup_event():
"""Initialize connections on startup"""
global bedrock_runtime, milvus_collection
try:
# Connect to Milvus
connections.connect(
alias="default",
host=MILVUS_HOST,
port=MILVUS_PORT
)
milvus_collection = Collection(COLLECTION_NAME)
milvus_collection.load()
logger.info(f"Connected to Milvus collection: {COLLECTION_NAME}")
# Initialize Bedrock client
bedrock_runtime = boto3.client(
service_name='bedrock-runtime',
region_name=AWS_REGION
)
logger.info("Initialized Bedrock client")
except Exception as e:
logger.error(f"Startup error: {str(e)}")
raise
@app.on_event("shutdown")
async def shutdown_event():
"""Cleanup on shutdown"""
try:
connections.disconnect("default")
logger.info("Disconnected from Milvus")
except Exception as e:
logger.error(f"Shutdown error: {str(e)}")
# Request/Response models
class QueryRequest(BaseModel):
query: str
top_k: Optional[int] = 5
max_tokens: Optional[int] = 1000
class QueryResponse(BaseModel):
answer: str
sources: List[Dict[str, Any]]
metadata: Dict[str, Any]
class HealthResponse(BaseModel):
status: str
milvus_connected: bool
bedrock_available: bool
# API endpoints
@app.get("/health", response_model=HealthResponse)
async def health_check():
"""Health check endpoint"""
milvus_ok = False
bedrock_ok = False
try:
if milvus_collection:
milvus_collection.num_entities
milvus_ok = True
except:
pass
try:
if bedrock_runtime:
bedrock_ok = True
except:
pass
return HealthResponse(
status="healthy" if (milvus_ok and bedrock_ok) else "degraded",
milvus_connected=milvus_ok,
bedrock_available=bedrock_ok
)
@app.post("/query", response_model=QueryResponse)
async def query_rag(request: QueryRequest):
"""
Process RAG query:
1. Generate embedding for query
2. Search similar documents in Milvus
3. Construct prompt with context
4. Call Bedrock for generation
"""
try:
# Step 1: Generate query embedding using Bedrock
query_embedding = await generate_embedding(request.query)
# Step 2: Search Milvus for similar documents
search_params = {
"metric_type": "L2",
"params": {"nprobe": 10}
}
results = milvus_collection.search(
data=[query_embedding],
anns_field="embedding",
param=search_params,
limit=request.top_k,
output_fields=["chunk_id", "text", "metadata"]
)
# Extract context from search results
contexts = []
sources = []
for hit in results[0]:
contexts.append(hit.entity.get("text"))
sources.append({
"chunk_id": hit.entity.get("chunk_id"),
"score": float(hit.score),
"metadata": hit.entity.get("metadata")
})
# Step 3: Construct prompt with context
context_text = "\n\n".join([f"Document {i+1}:\n{ctx}" for i, ctx in enumerate(contexts)])
prompt = f"""You are a helpful AI assistant. Use the following context to answer the user's question.
If the answer cannot be found in the context, say so.
Context:
{context_text}
User Question: {request.query}
Answer:"""
# Step 4: Call Bedrock for generation
response = bedrock_runtime.invoke_model(
modelId=BEDROCK_MODEL_ID,
contentType="application/json",
accept="application/json",
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": request.max_tokens,
"messages": [
{
"role": "user",
"content": prompt
}
],
"temperature": 0.7
})
)
response_body = json.loads(response['body'].read())
answer = response_body['content'][0]['text']
return QueryResponse(
answer=answer,
sources=sources,
metadata={
"query": request.query,
"num_sources": len(sources),
"model": BEDROCK_MODEL_ID
}
)
except Exception as e:
logger.error(f"Query error: {str(e)}")
raise HTTPException(status_code=500, detail=str(e))
async def generate_embedding(text: str) -> List[float]:
"""Generate embedding using Bedrock Titan Embeddings"""
try:
response = bedrock_runtime.invoke_model(
modelId="amazon.titan-embed-text-v2:0",
contentType="application/json",
accept="application/json",
body=json.dumps({
"inputText": text,
"dimensions": 1024,
"normalize": True
})
)
response_body = json.loads(response['body'].read())
return response_body['embedding']
except Exception as e:
logger.error(f"Embedding generation error: {str(e)}")
raise
@app.get("/")
async def root():
"""Root endpoint"""
return {
"message": "Enterprise RAG API",
"version": "1.0.0",
"endpoints": {
"health": "/health",
"query": "/query",
"docs": "/docs"
}
}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
PYTHON_CODE
# Create Dockerfile
cat > rag-app/Dockerfile <<EOF
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY src/ ./src/
# Expose port
EXPOSE 8000
# Run application
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
EOF
Step 6.2: Build and Push Container Image
# Build container image (using podman or docker)
cd rag-app
# Option 1: Build with podman
podman build -t rag-application:v1.0 .
# Option 2: Build with docker
# docker build -t rag-application:v1.0 .
# Tag for OpenShift internal registry
export IMAGE_REGISTRY=$(oc get route default-route -n openshift-image-registry -o jsonpath='{.spec.host}')
# Login to OpenShift registry
podman login -u $(oc whoami) -p $(oc whoami -t) $IMAGE_REGISTRY --tls-verify=false
# Create image stream
oc create imagestream rag-application -n rag-application
# Tag and push
podman tag rag-application:v1.0 $IMAGE_REGISTRY/rag-application/rag-application:v1.0
podman push $IMAGE_REGISTRY/rag-application/rag-application:v1.0 --tls-verify=false
cd ..
Step 6.3: Deploy Application to OpenShift
# Create deployment
cat <<EOF | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: rag-application
namespace: rag-application
labels:
app: rag-application
spec:
replicas: 2
selector:
matchLabels:
app: rag-application
template:
metadata:
labels:
app: rag-application
spec:
serviceAccountName: bedrock-sa
containers:
- name: app
image: image-registry.openshift-image-registry.svc:5000/rag-application/rag-application:v1.0
ports:
- containerPort: 8000
protocol: TCP
env:
- name: MILVUS_HOST
valueFrom:
configMapKeyRef:
name: milvus-config
key: MILVUS_HOST
- name: MILVUS_PORT
valueFrom:
configMapKeyRef:
name: milvus-config
key: MILVUS_PORT
- name: AWS_REGION
value: "us-east-1"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2"
memory: "4Gi"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: rag-application
namespace: rag-application
spec:
selector:
app: rag-application
ports:
- protocol: TCP
port: 80
targetPort: 8000
type: ClusterIP
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: rag-application
namespace: rag-application
spec:
to:
kind: Service
name: rag-application
port:
targetPort: 8000
tls:
termination: edge
insecureEdgeTerminationPolicy: Redirect
EOF
Step 6.4: Verify Deployment
# Check deployment status
oc get deployment rag-application -n rag-application
oc get pods -n rag-application -l app=rag-application
# Get application URL
export RAG_APP_URL=$(oc get route rag-application -n rag-application -o jsonpath='{.spec.host}')
echo "RAG Application URL: https://$RAG_APP_URL"
# Test health endpoint
curl https://$RAG_APP_URL/health
# View application logs
oc logs -f deployment/rag-application -n rag-application
Testing and Validation
End-to-End Testing
Test 1: Document Ingestion and Processing
# Upload test documents to S3
cat > test-doc-1.txt <<EOF
Red Hat OpenShift is an enterprise Kubernetes platform that provides
a complete application platform for developing and deploying containerized
applications. It includes integrated CI/CD, monitoring, and developer tools.
EOF
cat > test-doc-2.txt <<EOF
Amazon Bedrock is a fully managed service that offers foundation models
from leading AI companies through a single API. It provides access to
models like Claude, Llama, and Stable Diffusion for various use cases.
EOF
# Upload to S3
aws s3 cp test-doc-1.txt s3://$BUCKET_NAME/raw-documents/
aws s3 cp test-doc-2.txt s3://$BUCKET_NAME/raw-documents/
# Trigger Glue crawler
aws glue start-crawler --name rag-document-crawler --region $AWS_REGION
# Wait and run ETL job
sleep 120
aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION
# Check processed documents
sleep 60
aws s3 ls s3://$BUCKET_NAME/processed-documents/
Test 2: Embedding Generation and Vector Storage
Create a job to process documents into Milvus:
# Create embedding job
cat <<EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata:
name: embed-documents
namespace: rag-application
spec:
template:
spec:
serviceAccountName: bedrock-sa
containers:
- name: embedder
image: python:3.11-slim
env:
- name: MILVUS_HOST
valueFrom:
configMapKeyRef:
name: milvus-config
key: MILVUS_HOST
- name: MILVUS_PORT
valueFrom:
configMapKeyRef:
name: milvus-config
key: MILVUS_PORT
- name: AWS_REGION
value: "us-east-1"
- name: BUCKET_NAME
value: "$BUCKET_NAME"
command:
- /bin/bash
- -c
- |
pip install pymilvus boto3
python3 <<PYTHON
import boto3
import json
import os
from pymilvus import connections, Collection
# Connect to services
s3 = boto3.client('s3')
bedrock = boto3.client('bedrock-runtime', region_name=os.environ['AWS_REGION'])
connections.connect(
host=os.environ['MILVUS_HOST'],
port=os.environ['MILVUS_PORT']
)
collection = Collection('rag_documents')
# Get processed documents
bucket = os.environ['BUCKET_NAME']
response = s3.list_objects_v2(Bucket=bucket, Prefix='processed-documents/')
for obj in response.get('Contents', []):
if obj['Key'].endswith('.json'):
# Read document chunk
doc = json.loads(s3.get_object(Bucket=bucket, Key=obj['Key'])['Body'].read())
# Generate embedding
embed_response = bedrock.invoke_model(
modelId='amazon.titan-embed-text-v2:0',
body=json.dumps({
'inputText': doc['chunk_text'],
'dimensions': 1024,
'normalize': True
})
)
embedding = json.loads(embed_response['body'].read())['embedding']
# Insert into Milvus
collection.insert([
[doc['chunk_id']],
[embedding],
[doc['chunk_text']],
[doc['metadata']]
])
print(f"Inserted: {doc['chunk_id']}")
collection.flush()
print(f"Total entities in collection: {collection.num_entities}")
PYTHON
restartPolicy: Never
backoffLimit: 3
EOF
# Monitor job
oc logs job/embed-documents -n rag-application -f
Test 3: RAG Query
# Test RAG query endpoint
curl -X POST "https://$RAG_APP_URL/query" \
-H "Content-Type: application/json" \
-d '{
"query": "What is Red Hat OpenShift?",
"top_k": 3,
"max_tokens": 500
}' | jq .
# Test another query
curl -X POST "https://$RAG_APP_URL/query" \
-H "Content-Type: application/json" \
-d '{
"query": "Tell me about Amazon Bedrock foundation models",
"top_k": 3,
"max_tokens": 500
}' | jq .
Performance Testing
# Install Apache Bench for load testing
sudo yum install httpd-tools -y
# Create query payload
cat > query-payload.json <<EOF
{
"query": "What are the benefits of using OpenShift?",
"top_k": 5
}
EOF
# Run load test (100 requests, 10 concurrent)
ab -n 100 -c 10 -p query-payload.json \
-T application/json \
"https://$RAG_APP_URL/query"
Resource Cleanup
To avoid ongoing AWS charges, follow these steps to clean up all resources created during this implementation.
Step 1: Delete OpenShift Resources
# Delete RAG application
oc delete deployment rag-application -n rag-application
oc delete service rag-application -n rag-application
oc delete route rag-application -n rag-application
# Delete Milvus
helm uninstall milvus -n milvus
helm uninstall milvus-operator -n milvus
oc delete pvc --all -n milvus
# Delete RHOAI
oc delete datasciencecluster default-dsc -n redhat-ods-operator
oc delete subscription rhods-operator -n redhat-ods-operator
# Delete projects/namespaces
oc delete project rag-application
oc delete project milvus
oc delete project redhat-ods-applications
oc delete project redhat-ods-operator
oc delete project redhat-ods-monitoring
Step 2: Delete ROSA Cluster
# Delete ROSA cluster (takes ~10-15 minutes)
rosa delete cluster --cluster=$CLUSTER_NAME --yes
# Wait for cluster deletion to complete
rosa logs uninstall --cluster=$CLUSTER_NAME --watch
# Verify cluster is deleted
rosa list clusters
Step 3: Delete AWS Glue Resources
# Delete Glue job
aws glue delete-job --job-name rag-document-processor --region $AWS_REGION
# Delete Glue crawler
aws glue delete-crawler --name rag-document-crawler --region $AWS_REGION
# Delete Glue database
aws glue delete-database --name rag_documents_db --region $AWS_REGION
# Delete Glue IAM role
aws iam delete-role-policy --role-name AWSGlueServiceRole-RAG --policy-name S3Access
aws iam detach-role-policy --role-name AWSGlueServiceRole-RAG --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole
aws iam delete-role --role-name AWSGlueServiceRole-RAG
Step 4: Delete S3 Bucket and Contents
# Delete all objects in bucket
aws s3 rm s3://$BUCKET_NAME --recursive --region $AWS_REGION
# Delete bucket
aws s3 rb s3://$BUCKET_NAME --region $AWS_REGION
echo "S3 bucket deleted: $BUCKET_NAME"
Step 5: Delete VPC Endpoint
# Delete VPC endpoint for Bedrock
aws ec2 delete-vpc-endpoints --vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT --region $AWS_REGION
# Delete security group
aws ec2 delete-security-group --group-id $VPC_ENDPOINT_SG --region $AWS_REGION
echo "VPC endpoint and security group deleted"
Step 6: Delete IAM Resources
# Detach policy from Bedrock role
aws iam detach-role-policy \
--role-name rosa-bedrock-access \
--policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy
# Delete Bedrock role
aws iam delete-role --role-name rosa-bedrock-access
# Delete Bedrock policy
aws iam delete-policy \
--policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy
echo "IAM roles and policies deleted"
Step 7: Clean Up Local Files
# Remove temporary files
rm -f bedrock-policy.json
rm -f trust-policy.json
rm -f glue-trust-policy.json
rm -f glue-s3-policy.json
rm -f glue-etl-script.py
rm -f sample-document.txt
rm -f test-doc-1.txt
rm -f test-doc-2.txt
rm -f query-payload.json
rm -f milvus-values.yaml
rm -rf rag-app/
echo "Local temporary files cleaned up"
Verification
# Verify ROSA cluster is deleted
rosa list clusters
# Verify S3 bucket is deleted
aws s3 ls | grep $BUCKET_NAME
# Verify VPC endpoints are deleted
aws ec2 describe-vpc-endpoints --region $AWS_REGION | grep $BEDROCK_VPC_ENDPOINT
# Verify IAM roles are deleted
aws iam list-roles | grep -E "rosa-bedrock-access|AWSGlueServiceRole-RAG"
echo "Cleanup verification complete"
Top comments (0)