DEV Community

Marco Gonzalez
Marco Gonzalez

Posted on

Enterprise-Grade RAG Platform: Orchestrating Amazon Bedrock Agents via Red Hat OpenShift AI

Table of Contents

  1. Overview
  2. Architecture
  3. Prerequisites
  4. Phase 1: ROSA Cluster Setup
  5. Phase 2: Red Hat OpenShift AI Installation
  6. Phase 3: Amazon Bedrock Integration via PrivateLink
  7. Phase 4: AWS Glue Data Pipeline
  8. Phase 5: Milvus Vector Database Deployment
  9. Phase 6: RAG Application Deployment
  10. Testing and Validation

Overview

Project Purpose

This platform provides an enterprise-grade Retrieval-Augmented Generation (RAG) solution that addresses the primary concern of enterprises: data privacy and security. By leveraging Red Hat OpenShift on AWS (ROSA) to control the data plane while using Amazon Bedrock for AI capabilities, organizations maintain complete control over their sensitive data while accessing state-of-the-art language models.

Key Value Propositions

  • Privacy-First Architecture: All sensitive data remains within your controlled OpenShift environment
  • Secure Connectivity: AWS PrivateLink ensures AI model calls never traverse the public internet
  • Enterprise Compliance: Meets stringent data governance and compliance requirements
  • Scalable Infrastructure: Leverages Kubernetes orchestration for production-grade reliability
  • Best-of-Breed Components: Combines Red Hat's enterprise Kubernetes with AWS's managed AI services

Solution Components

Component Purpose Layer
ROSA Managed OpenShift cluster on AWS Infrastructure
Red Hat OpenShift AI Model serving gateway and ML platform Control Plane
Amazon Bedrock Claude 3.5 Sonnet LLM access Intelligence Plane
AWS PrivateLink Secure private connectivity Network Security
AWS Glue Document processing and ETL Data Pipeline
Amazon S3 Document storage Data Lake
Milvus Vector database for embeddings Data Plane

Architecture

High-Level Architecture Diagram

┌─────────────────────────────────────────────────────────────────┐
│                          AWS Cloud                               │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                    ROSA Cluster (VPC)                       │ │
│  │  ┌──────────────────────────────────────────────────────┐  │ │
│  │  │           Red Hat OpenShift AI                        │  │ │
│  │  │  ┌────────────────┐      ┌──────────────────────┐   │  │ │
│  │  │  │ Model Serving  │      │  RAG Application     │   │  │ │
│  │  │  │   Gateway      │◄─────┤  (FastAPI/Flask)     │   │  │ │
│  │  │  └────────┬───────┘      └──────────┬───────────┘   │  │ │
│  │  │           │                         │               │  │ │
│  │  └───────────┼─────────────────────────┼───────────────┘  │ │
│  │              │                         │                  │ │
│  │              │         ┌───────────────▼──────────────┐   │ │
│  │              │         │   Milvus Vector Database     │   │ │
│  │              │         │   (Embeddings & Metadata)    │   │ │
│  │              │         └──────────────────────────────┘   │ │
│  └──────────────┼──────────────────────────────────────────┘ │
│                 │                                              │
│                 │ AWS PrivateLink (Private Connectivity)       │
│                 │                                              │
│  ┌──────────────▼──────────────┐    ┌──────────────────────┐ │
│  │   Amazon Bedrock            │    │    AWS Glue          │ │
│  │   (Claude 3.5 Sonnet)       │    │  ┌────────────────┐  │ │
│  │   - Text Generation         │    │  │ Glue Crawler   │  │ │
│  │   - Embeddings              │    │  ├────────────────┤  │ │
│  └─────────────────────────────┘    │  │ ETL Jobs       │  │ │
│                                      │  └────────┬───────┘  │ │
│                                      └───────────┼──────────┘ │
│                                                  │            │
│                                      ┌───────────▼──────────┐ │
│                                      │   Amazon S3          │ │
│                                      │   (Document Store)   │ │
│                                      └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Data Flow

  1. Document Ingestion: Documents uploaded to S3 bucket
  2. ETL Processing: AWS Glue crawler discovers and processes documents
  3. Embedding Generation: Processed documents sent to Bedrock for embedding generation
  4. Vector Storage: Embeddings stored in Milvus running on ROSA
  5. Query Processing: User queries received by RAG application
  6. Vector Search: Application searches Milvus for relevant document chunks
  7. Context Retrieval: Relevant chunks retrieved from vector database
  8. LLM Inference: RHOAI gateway forwards prompt + context to Bedrock via PrivateLink
  9. Response Generation: Claude 3.5 generates response based on retrieved context
  10. Response Delivery: Answer returned to user through application

Security Architecture

  • Network Isolation: ROSA cluster in private subnets with no public ingress
  • PrivateLink Encryption: All Bedrock API calls encrypted in transit via AWS PrivateLink
  • Data Sovereignty: Document content never leaves controlled environment
  • RBAC: OpenShift role-based access control for all components
  • Secrets Management: OpenShift secrets for API keys and credentials

Prerequisites

Required Accounts and Subscriptions

  • [ ] AWS Account with administrative access
  • [ ] Red Hat Account with OpenShift subscription
  • [ ] ROSA Enabled in your AWS account (Enable ROSA)
  • [ ] Amazon Bedrock Access with Claude 3.5 Sonnet model enabled in your region

Required Tools

Install the following CLI tools on your workstation:

# AWS CLI (v2)
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install

# ROSA CLI
wget https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz
tar -xvf rosa-linux.tar.gz
sudo mv rosa /usr/local/bin/rosa
rosa version

# OpenShift CLI (oc)
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz
tar -xvf openshift-client-linux.tar.gz
sudo mv oc kubectl /usr/local/bin/
oc version

# Helm (v3)
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version
Enter fullscreen mode Exit fullscreen mode

AWS Prerequisites

Service Quotas

Verify you have adequate service quotas in your target region:

# Check EC2 vCPU quota (need at least 100 for production ROSA)
aws service-quotas get-service-quota \
  --service-code ec2 \
  --quota-code L-1216C47A \
  --region us-east-1

# Check VPC quota
aws service-quotas get-service-quota \
  --service-code vpc \
  --quota-code L-F678F1CE \
  --region us-east-1
Enter fullscreen mode Exit fullscreen mode

IAM Permissions

Your AWS IAM user/role needs permissions for:

  • EC2 (VPC, subnets, security groups, instances)
  • IAM (roles, policies)
  • S3 (buckets, objects)
  • Bedrock (InvokeModel, InvokeModelWithResponseStream)
  • Glue (crawlers, jobs, databases)
  • CloudWatch (logs, metrics)

Knowledge Prerequisites

You should be familiar with:

  • AWS fundamentals (VPC, IAM, S3)
  • Kubernetes basics (pods, deployments, services)
  • Basic Linux command line
  • YAML configuration files
  • REST APIs and HTTP concepts

Phase 1: ROSA Cluster Setup

Step 1.1: Configure AWS CLI

# Configure AWS credentials
aws configure

# Verify configuration
aws sts get-caller-identity
Enter fullscreen mode Exit fullscreen mode

Step 1.2: Initialize ROSA

# Log in to Red Hat
rosa login

# Verify ROSA prerequisites
rosa verify quota
rosa verify permissions

# Initialize ROSA in your AWS account (one-time setup)
rosa init
Enter fullscreen mode Exit fullscreen mode

Step 1.3: Create ROSA Cluster

Create a ROSA cluster with appropriate specifications for the RAG workload:

# Set environment variables
export CLUSTER_NAME="rag-platform"
export AWS_REGION="us-east-1"
export MULTI_AZ="true"
export MACHINE_TYPE="m5.2xlarge"
export COMPUTE_NODES=3

# Create ROSA cluster (takes ~40 minutes)
rosa create cluster \
  --cluster-name $CLUSTER_NAME \
  --region $AWS_REGION \
  --multi-az \
  --compute-machine-type $MACHINE_TYPE \
  --compute-nodes $COMPUTE_NODES \
  --machine-cidr 10.0.0.0/16 \
  --service-cidr 172.30.0.0/16 \
  --pod-cidr 10.128.0.0/14 \
  --host-prefix 23 \
  --yes
Enter fullscreen mode Exit fullscreen mode

Configuration Rationale:

  • m5.2xlarge: 8 vCPUs, 32 GB RAM per node - suitable for vector database and ML workloads
  • 3 nodes: High availability across multiple availability zones
  • Multi-AZ: Ensures resilience against AZ failures

Step 1.4: Monitor Cluster Creation

# Watch cluster installation progress
rosa logs install --cluster=$CLUSTER_NAME --watch

# Check cluster status
rosa describe cluster --cluster=$CLUSTER_NAME
Enter fullscreen mode Exit fullscreen mode

Wait until the cluster state shows ready.

Step 1.5: Create Admin User

# Create cluster admin user
rosa create admin --cluster=$CLUSTER_NAME

# Save the login command output - it will look like:
# oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \
#   --username cluster-admin \
#   --password <generated-password>
Enter fullscreen mode Exit fullscreen mode

Step 1.6: Connect to Cluster

# Use the login command from previous step
oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \
  --username cluster-admin \
  --password <your-password>

# Verify cluster access
oc cluster-info
oc get nodes
oc get projects
Enter fullscreen mode Exit fullscreen mode

Step 1.7: Create Project Namespaces

# Create namespace for RHOAI
oc new-project redhat-ods-applications

# Create namespace for RAG application
oc new-project rag-application

# Create namespace for Milvus
oc new-project milvus
Enter fullscreen mode Exit fullscreen mode

Phase 2: Red Hat OpenShift AI Installation

Step 2.1: Install OpenShift AI Operator

# Create operator subscription
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: redhat-ods-operator
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: redhat-ods-operator
  namespace: redhat-ods-operator
spec: {}
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: rhods-operator
  namespace: redhat-ods-operator
spec:
  channel: stable
  name: rhods-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Automatic
EOF
Enter fullscreen mode Exit fullscreen mode

Step 2.2: Verify Operator Installation

# Wait for operator to be ready (takes 3-5 minutes)
oc get csv -n redhat-ods-operator -w

# Verify operator is running
oc get pods -n redhat-ods-operator
Enter fullscreen mode Exit fullscreen mode

You should see the rhods-operator pod in Running state.

Step 2.3: Create DataScienceCluster

# Create the DataScienceCluster custom resource
cat <<EOF | oc apply -f -
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
  name: default-dsc
spec:
  components:
    codeflare:
      managementState: Removed
    dashboard:
      managementState: Managed
    datasciencepipelines:
      managementState: Managed
    kserve:
      managementState: Managed
      serving:
        ingressGateway:
          certificate:
            type: SelfSigned
        managementState: Managed
        name: knative-serving
    modelmeshserving:
      managementState: Managed
    ray:
      managementState: Removed
    workbenches:
      managementState: Managed
EOF
Enter fullscreen mode Exit fullscreen mode

Step 2.4: Verify RHOAI Installation

# Check DataScienceCluster status
oc get datasciencecluster -n redhat-ods-operator

# Verify all RHOAI components are running
oc get pods -n redhat-ods-applications
oc get pods -n redhat-ods-monitoring

# Get RHOAI dashboard URL
oc get route rhods-dashboard -n redhat-ods-applications -o jsonpath='{.spec.host}'
Enter fullscreen mode Exit fullscreen mode

Access the dashboard URL in your browser and log in with your OpenShift credentials.

Step 2.5: Configure Model Serving

Create a serving runtime for Amazon Bedrock integration:

# Create custom serving runtime for Bedrock
cat <<EOF | oc apply -f -
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  name: bedrock-runtime
  namespace: rag-application
  labels:
    opendatahub.io/dashboard: "true"
spec:
  annotations:
    prometheus.io/path: /metrics
    prometheus.io/port: "8080"
  containers:
  - name: kserve-container
    image: quay.io/modh/rest-proxy:latest
    env:
    - name: AWS_REGION
      value: "us-east-1"
    - name: BEDROCK_ENDPOINT_URL
      value: "bedrock-runtime.us-east-1.amazonaws.com"
    ports:
    - containerPort: 8080
      protocol: TCP
    resources:
      limits:
        cpu: "2"
        memory: 4Gi
      requests:
        cpu: "1"
        memory: 2Gi
  supportedModelFormats:
  - autoSelect: true
    name: bedrock
EOF
Enter fullscreen mode Exit fullscreen mode

Phase 3: Amazon Bedrock Integration via PrivateLink

This phase establishes secure, private connectivity between your ROSA cluster and Amazon Bedrock using AWS PrivateLink.

Step 3.1: Enable Amazon Bedrock

# Enable Bedrock in your region (if not already enabled)
aws bedrock list-foundation-models --region us-east-1

# Request access to Claude 3.5 Sonnet (if needed)
# Go to AWS Console > Bedrock > Model access
# Or use the CLI:
aws bedrock put-model-invocation-logging-configuration \
  --region us-east-1 \
  --logging-config '{"cloudWatchConfig":{"logGroupName":"/aws/bedrock/modelinvocations","roleArn":"arn:aws:iam::ACCOUNT_ID:role/BedrockLoggingRole"}}'
Enter fullscreen mode Exit fullscreen mode

Step 3.2: Identify ROSA VPC

# Get the VPC ID of your ROSA cluster
export ROSA_VPC_ID=$(aws ec2 describe-vpcs \
  --filters "Name=tag:Name,Values=*${CLUSTER_NAME}*" \
  --query 'Vpcs[0].VpcId' \
  --output text \
  --region $AWS_REGION)

echo "ROSA VPC ID: $ROSA_VPC_ID"

# Get private subnet IDs
export PRIVATE_SUBNET_IDS=$(aws ec2 describe-subnets \
  --filters "Name=vpc-id,Values=$ROSA_VPC_ID" "Name=tag:Name,Values=*private*" \
  --query 'Subnets[*].SubnetId' \
  --output text \
  --region $AWS_REGION)

echo "Private Subnets: $PRIVATE_SUBNET_IDS"
Enter fullscreen mode Exit fullscreen mode

Step 3.3: Create VPC Endpoint for Bedrock

# Create security group for VPC endpoint
export VPC_ENDPOINT_SG=$(aws ec2 create-security-group \
  --group-name bedrock-vpc-endpoint-sg \
  --description "Security group for Bedrock VPC endpoint" \
  --vpc-id $ROSA_VPC_ID \
  --region $AWS_REGION \
  --output text \
  --query 'GroupId')

echo "VPC Endpoint Security Group: $VPC_ENDPOINT_SG"

# Allow HTTPS traffic from ROSA worker nodes
aws ec2 authorize-security-group-ingress \
  --group-id $VPC_ENDPOINT_SG \
  --protocol tcp \
  --port 443 \
  --cidr 10.0.0.0/16 \
  --region $AWS_REGION

# Create VPC endpoint for Bedrock Runtime
export BEDROCK_VPC_ENDPOINT=$(aws ec2 create-vpc-endpoint \
  --vpc-id $ROSA_VPC_ID \
  --vpc-endpoint-type Interface \
  --service-name com.amazonaws.${AWS_REGION}.bedrock-runtime \
  --subnet-ids $PRIVATE_SUBNET_IDS \
  --security-group-ids $VPC_ENDPOINT_SG \
  --private-dns-enabled \
  --region $AWS_REGION \
  --output text \
  --query 'VpcEndpoint.VpcEndpointId')

echo "Bedrock VPC Endpoint: $BEDROCK_VPC_ENDPOINT"

# Wait for VPC endpoint to be available
aws ec2 wait vpc-endpoint-available \
  --vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT \
  --region $AWS_REGION

echo "VPC Endpoint is now available"
Enter fullscreen mode Exit fullscreen mode

Step 3.4: Create IAM Role for Bedrock Access

# Create IAM policy for Bedrock access
cat > bedrock-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:${AWS_REGION}::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0"
      ]
    }
  ]
}
EOF

aws iam create-policy \
  --policy-name BedrockInvokePolicy \
  --policy-document file://bedrock-policy.json \
  --region $AWS_REGION

# Create trust policy for ROSA service account
export OIDC_PROVIDER=$(rosa describe cluster -c $CLUSTER_NAME -o json | jq -r .aws.sts.oidc_endpoint_url | sed 's|https://||')

cat > trust-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):oidc-provider/${OIDC_PROVIDER}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_PROVIDER}:sub": "system:serviceaccount:rag-application:bedrock-sa"
        }
      }
    }
  ]
}
EOF

# Create IAM role
export BEDROCK_ROLE_ARN=$(aws iam create-role \
  --role-name rosa-bedrock-access \
  --assume-role-policy-document file://trust-policy.json \
  --query 'Role.Arn' \
  --output text)

echo "Bedrock IAM Role ARN: $BEDROCK_ROLE_ARN"

# Attach policy to role
aws iam attach-role-policy \
  --role-name rosa-bedrock-access \
  --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy
Enter fullscreen mode Exit fullscreen mode

Step 3.5: Create Service Account in OpenShift

# Create service account with IAM role annotation
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: bedrock-sa
  namespace: rag-application
  annotations:
    eks.amazonaws.com/role-arn: $BEDROCK_ROLE_ARN
EOF

# Verify service account
oc get sa bedrock-sa -n rag-application
Enter fullscreen mode Exit fullscreen mode

Step 3.6: Test Bedrock Connectivity

# Create test pod with AWS CLI
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: bedrock-test
  namespace: rag-application
spec:
  serviceAccountName: bedrock-sa
  containers:
  - name: aws-cli
    image: amazon/aws-cli:latest
    command: ["/bin/sleep", "3600"]
    env:
    - name: AWS_REGION
      value: "$AWS_REGION"
EOF

# Wait for pod to be ready
oc wait --for=condition=ready pod/bedrock-test -n rag-application --timeout=300s

# Test Bedrock API call
oc exec -n rag-application bedrock-test -- aws bedrock-runtime invoke-model \
  --model-id anthropic.claude-3-5-sonnet-20241022-v2:0 \
  --content-type application/json \
  --accept application/json \
  --body '{"anthropic_version":"bedrock-2023-05-31","max_tokens":100,"messages":[{"role":"user","content":"Hello, this is a test"}]}' \
  /tmp/response.json

# Check the response
oc exec -n rag-application bedrock-test -- cat /tmp/response.json

# Clean up test pod
oc delete pod bedrock-test -n rag-application
Enter fullscreen mode Exit fullscreen mode

If successful, you should see a JSON response from Claude.

Phase 4: AWS Glue Data Pipeline

This phase sets up AWS Glue to process documents from S3 and prepare them for vectorization.

Step 4.1: Create S3 Bucket for Documents

# Create S3 bucket (name must be globally unique)
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export BUCKET_NAME="rag-documents-${ACCOUNT_ID}"

aws s3 mb s3://$BUCKET_NAME --region $AWS_REGION

# Enable versioning
aws s3api put-bucket-versioning \
  --bucket $BUCKET_NAME \
  --versioning-configuration Status=Enabled \
  --region $AWS_REGION

# Create folder structure
aws s3api put-object --bucket $BUCKET_NAME --key raw-documents/
aws s3api put-object --bucket $BUCKET_NAME --key processed-documents/
aws s3api put-object --bucket $BUCKET_NAME --key embeddings/

echo "S3 Bucket created: s3://$BUCKET_NAME"
Enter fullscreen mode Exit fullscreen mode

Step 4.2: Create IAM Role for Glue

# Create trust policy for Glue
cat > glue-trust-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "glue.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
EOF

# Create Glue service role
aws iam create-role \
  --role-name AWSGlueServiceRole-RAG \
  --assume-role-policy-document file://glue-trust-policy.json

# Attach AWS managed policy
aws iam attach-role-policy \
  --role-name AWSGlueServiceRole-RAG \
  --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole

# Create custom policy for S3 access
cat > glue-s3-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::${BUCKET_NAME}/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::${BUCKET_NAME}"
      ]
    }
  ]
}
EOF

aws iam put-role-policy \
  --role-name AWSGlueServiceRole-RAG \
  --policy-name S3Access \
  --policy-document file://glue-s3-policy.json
Enter fullscreen mode Exit fullscreen mode

Step 4.3: Create Glue Database

# Create Glue database
aws glue create-database \
  --database-input '{
    "Name": "rag_documents_db",
    "Description": "Database for RAG document metadata"
  }' \
  --region $AWS_REGION

# Verify database creation
aws glue get-database --name rag_documents_db --region $AWS_REGION
Enter fullscreen mode Exit fullscreen mode

Step 4.4: Create Glue Crawler

# Create crawler for raw documents
aws glue create-crawler \
  --name rag-document-crawler \
  --role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \
  --database-name rag_documents_db \
  --targets '{
    "S3Targets": [
      {
        "Path": "s3://'$BUCKET_NAME'/raw-documents/"
      }
    ]
  }' \
  --schema-change-policy '{
    "UpdateBehavior": "UPDATE_IN_DATABASE",
    "DeleteBehavior": "LOG"
  }' \
  --region $AWS_REGION

# Start the crawler
aws glue start-crawler --name rag-document-crawler --region $AWS_REGION

echo "Glue crawler created and started"
Enter fullscreen mode Exit fullscreen mode

Step 4.5: Create Glue ETL Job

Create a Python script for document processing:

# Create ETL script
cat > glue-etl-script.py <<'PYTHON_SCRIPT'
import sys
import boto3
import json
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame

# Initialize
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'BUCKET_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

bucket_name = args['BUCKET_NAME']
s3_client = boto3.client('s3')

# Read documents from Glue catalog
datasource = glueContext.create_dynamic_frame.from_catalog(
    database="rag_documents_db",
    table_name="raw_documents"
)

# Document processing function
def process_document(record):
    """
    Process document: chunk text, extract metadata
    """
    # Simple chunking strategy (500 chars with 50 char overlap)
    text = record.get('content', '')
    chunk_size = 500
    overlap = 50

    chunks = []
    for i in range(0, len(text), chunk_size - overlap):
        chunk = text[i:i + chunk_size]
        if chunk:
            chunks.append({
                'document_id': record.get('document_id'),
                'chunk_id': f"{record.get('document_id')}_{i}",
                'chunk_text': chunk,
                'chunk_index': i // (chunk_size - overlap),
                'metadata': {
                    'source': record.get('source', ''),
                    'timestamp': record.get('timestamp', ''),
                    'file_type': record.get('file_type', '')
                }
            })

    return chunks

# Process and write to S3
def process_and_write():
    records = datasource.toDF().collect()
    all_chunks = []

    for record in records:
        chunks = process_document(record.asDict())
        all_chunks.extend(chunks)

    # Write chunks to S3 as JSON
    for chunk in all_chunks:
        key = f"processed-documents/{chunk['chunk_id']}.json"
        s3_client.put_object(
            Bucket=bucket_name,
            Key=key,
            Body=json.dumps(chunk),
            ContentType='application/json'
        )

    print(f"Processed {len(all_chunks)} chunks from {len(records)} documents")

process_and_write()

job.commit()
PYTHON_SCRIPT

# Upload script to S3
aws s3 cp glue-etl-script.py s3://$BUCKET_NAME/glue-scripts/

# Create Glue job
aws glue create-job \
  --name rag-document-processor \
  --role arn:aws:iam::${ACCOUNT_ID}:role/AWSGlueServiceRole-RAG \
  --command '{
    "Name": "glueetl",
    "ScriptLocation": "s3://'$BUCKET_NAME'/glue-scripts/glue-etl-script.py",
    "PythonVersion": "3"
  }' \
  --default-arguments '{
    "--BUCKET_NAME": "'$BUCKET_NAME'",
    "--job-language": "python",
    "--enable-metrics": "true",
    "--enable-continuous-cloudwatch-log": "true"
  }' \
  --glue-version "4.0" \
  --max-retries 0 \
  --timeout 60 \
  --region $AWS_REGION

echo "Glue ETL job created"
Enter fullscreen mode Exit fullscreen mode

Step 4.6: Test Glue Pipeline

# Upload sample document
cat > sample-document.txt <<EOF
This is a sample document for testing the RAG pipeline.
It contains multiple sentences that will be chunked and processed.
The Glue ETL job will extract this content and prepare it for vectorization.
This demonstrates the data pipeline from S3 to processed chunks.
EOF

# Upload to S3
aws s3 cp sample-document.txt s3://$BUCKET_NAME/raw-documents/

# Run crawler to detect new file
aws glue start-crawler --name rag-document-crawler --region $AWS_REGION

# Wait for crawler to complete (check status)
aws glue get-crawler --name rag-document-crawler --region $AWS_REGION --query 'Crawler.State'

# Run ETL job
aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION

# Check processed outputs
sleep 60
aws s3 ls s3://$BUCKET_NAME/processed-documents/
Enter fullscreen mode Exit fullscreen mode

Phase 5: Milvus Vector Database Deployment

Deploy Milvus on your ROSA cluster to store and search document embeddings.

Step 5.1: Install Milvus Operator

# Add Milvus Helm repository
helm repo add milvus https://milvus-io.github.io/milvus-helm/
helm repo update

# Install Milvus operator
helm install milvus-operator milvus/milvus-operator \
  --namespace milvus \
  --create-namespace \
  --set operator.image.tag=v0.9.0

# Verify operator installation
oc get pods -n milvus
Enter fullscreen mode Exit fullscreen mode

Step 5.2: Create Persistent Storage

# Create PersistentVolumeClaims for Milvus
cat <<EOF | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: milvus-etcd-pvc
  namespace: milvus
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: gp3-csi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: milvus-minio-pvc
  namespace: milvus
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: gp3-csi
EOF
Enter fullscreen mode Exit fullscreen mode

Step 5.3: Deploy Milvus Cluster

# Create Milvus cluster configuration
cat > milvus-values.yaml <<EOF
cluster:
  enabled: true

service:
  type: ClusterIP
  port: 19530

standalone:
  replicas: 1
  resources:
    limits:
      cpu: "4"
      memory: 8Gi
    requests:
      cpu: "2"
      memory: 4Gi

etcd:
  replicaCount: 1
  persistence:
    enabled: true
    existingClaim: milvus-etcd-pvc

minio:
  mode: standalone
  persistence:
    enabled: true
    existingClaim: milvus-minio-pvc

pulsar:
  enabled: false

kafka:
  enabled: false

metrics:
  enabled: true
  serviceMonitor:
    enabled: true

EOF

# Install Milvus
helm install milvus milvus/milvus \
  --namespace milvus \
  --values milvus-values.yaml \
  --wait

# Verify Milvus installation
oc get pods -n milvus
oc get svc -n milvus
Enter fullscreen mode Exit fullscreen mode

Step 5.4: Configure Milvus Access

# Get Milvus service endpoint
export MILVUS_HOST=$(oc get svc milvus -n milvus -o jsonpath='{.spec.clusterIP}')
export MILVUS_PORT=19530

echo "Milvus Endpoint: $MILVUS_HOST:$MILVUS_PORT"

# Create config map with Milvus connection details
cat <<EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: milvus-config
  namespace: rag-application
data:
  MILVUS_HOST: "$MILVUS_HOST"
  MILVUS_PORT: "$MILVUS_PORT"
EOF
Enter fullscreen mode Exit fullscreen mode

Step 5.5: Test Milvus Connectivity

# Create test pod with pymilvus
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: milvus-test
  namespace: rag-application
spec:
  containers:
  - name: python
    image: python:3.11-slim
    command: ["/bin/sleep", "3600"]
    env:
    - name: MILVUS_HOST
      valueFrom:
        configMapKeyRef:
          name: milvus-config
          key: MILVUS_HOST
    - name: MILVUS_PORT
      valueFrom:
        configMapKeyRef:
          name: milvus-config
          key: MILVUS_PORT
EOF

# Wait for pod
oc wait --for=condition=ready pod/milvus-test -n rag-application --timeout=120s

# Install pymilvus and test connection
oc exec -n rag-application milvus-test -- bash -c "
pip install pymilvus && python3 <<PYTHON
from pymilvus import connections, utility
import os

connections.connect(
    alias='default',
    host=os.environ['MILVUS_HOST'],
    port=os.environ['MILVUS_PORT']
)

print('Connected to Milvus successfully!')
print('Milvus version:', utility.get_server_version())
PYTHON
"

# Clean up test pod
oc delete pod milvus-test -n rag-application
Enter fullscreen mode Exit fullscreen mode

Step 5.6: Create Milvus Collection

Create a test collection for document embeddings:

# Create initialization job
cat <<EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata:
  name: milvus-init
  namespace: rag-application
spec:
  template:
    spec:
      containers:
      - name: init
        image: python:3.11-slim
        env:
        - name: MILVUS_HOST
          valueFrom:
            configMapKeyRef:
              name: milvus-config
              key: MILVUS_HOST
        - name: MILVUS_PORT
          valueFrom:
            configMapKeyRef:
              name: milvus-config
              key: MILVUS_PORT
        command:
        - /bin/bash
        - -c
        - |
          pip install pymilvus
          python3 <<PYTHON
          from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection
          import os

          # Connect to Milvus
          connections.connect(
              alias='default',
              host=os.environ['MILVUS_HOST'],
              port=os.environ['MILVUS_PORT']
          )

          # Define collection schema
          fields = [
              FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True),
              FieldSchema(name='chunk_id', dtype=DataType.VARCHAR, max_length=256),
              FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=1024),
              FieldSchema(name='text', dtype=DataType.VARCHAR, max_length=65535),
              FieldSchema(name='metadata', dtype=DataType.JSON)
          ]

          schema = CollectionSchema(
              fields=fields,
              description='RAG document embeddings collection'
          )

          # Create collection
          collection = Collection(
              name='rag_documents',
              schema=schema
          )

          # Create index
          index_params = {
              'metric_type': 'L2',
              'index_type': 'IVF_FLAT',
              'params': {'nlist': 128}
          }

          collection.create_index(
              field_name='embedding',
              index_params=index_params
          )

          print(f'Collection created: {collection.name}')
          print(f'Number of entities: {collection.num_entities}')
          PYTHON
      restartPolicy: Never
  backoffLimit: 3
EOF

# Check job status
oc logs job/milvus-init -n rag-application
Enter fullscreen mode Exit fullscreen mode

Phase 6: RAG Application Deployment

Deploy the RAG application that orchestrates the entire pipeline.

Step 6.1: Create Application Code

Create the RAG application source code:

# Create application directory structure
mkdir -p rag-app/{src,config,tests}

# Create requirements.txt
cat > rag-app/requirements.txt <<EOF
fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
pymilvus==2.3.3
boto3==1.29.7
langchain==0.0.350
langchain-community==0.0.1
python-dotenv==1.0.0
httpx==0.25.2
EOF

# Create main application
cat > rag-app/src/main.py <<'PYTHON_CODE'
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional, Dict, Any
import os
import json
import boto3
from pymilvus import connections, Collection
import logging

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Initialize FastAPI app
app = FastAPI(
    title="Enterprise RAG API",
    description="RAG platform using OpenShift AI, Bedrock, and Milvus",
    version="1.0.0"
)

# Configuration
MILVUS_HOST = os.getenv("MILVUS_HOST", "milvus.milvus.svc.cluster.local")
MILVUS_PORT = int(os.getenv("MILVUS_PORT", "19530"))
AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
BEDROCK_MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"
COLLECTION_NAME = "rag_documents"

# Initialize clients
bedrock_runtime = None
milvus_collection = None

@app.on_event("startup")
async def startup_event():
    """Initialize connections on startup"""
    global bedrock_runtime, milvus_collection

    try:
        # Connect to Milvus
        connections.connect(
            alias="default",
            host=MILVUS_HOST,
            port=MILVUS_PORT
        )
        milvus_collection = Collection(COLLECTION_NAME)
        milvus_collection.load()
        logger.info(f"Connected to Milvus collection: {COLLECTION_NAME}")

        # Initialize Bedrock client
        bedrock_runtime = boto3.client(
            service_name='bedrock-runtime',
            region_name=AWS_REGION
        )
        logger.info("Initialized Bedrock client")

    except Exception as e:
        logger.error(f"Startup error: {str(e)}")
        raise

@app.on_event("shutdown")
async def shutdown_event():
    """Cleanup on shutdown"""
    try:
        connections.disconnect("default")
        logger.info("Disconnected from Milvus")
    except Exception as e:
        logger.error(f"Shutdown error: {str(e)}")

# Request/Response models
class QueryRequest(BaseModel):
    query: str
    top_k: Optional[int] = 5
    max_tokens: Optional[int] = 1000

class QueryResponse(BaseModel):
    answer: str
    sources: List[Dict[str, Any]]
    metadata: Dict[str, Any]

class HealthResponse(BaseModel):
    status: str
    milvus_connected: bool
    bedrock_available: bool

# API endpoints
@app.get("/health", response_model=HealthResponse)
async def health_check():
    """Health check endpoint"""
    milvus_ok = False
    bedrock_ok = False

    try:
        if milvus_collection:
            milvus_collection.num_entities
            milvus_ok = True
    except:
        pass

    try:
        if bedrock_runtime:
            bedrock_ok = True
    except:
        pass

    return HealthResponse(
        status="healthy" if (milvus_ok and bedrock_ok) else "degraded",
        milvus_connected=milvus_ok,
        bedrock_available=bedrock_ok
    )

@app.post("/query", response_model=QueryResponse)
async def query_rag(request: QueryRequest):
    """
    Process RAG query:
    1. Generate embedding for query
    2. Search similar documents in Milvus
    3. Construct prompt with context
    4. Call Bedrock for generation
    """
    try:
        # Step 1: Generate query embedding using Bedrock
        query_embedding = await generate_embedding(request.query)

        # Step 2: Search Milvus for similar documents
        search_params = {
            "metric_type": "L2",
            "params": {"nprobe": 10}
        }

        results = milvus_collection.search(
            data=[query_embedding],
            anns_field="embedding",
            param=search_params,
            limit=request.top_k,
            output_fields=["chunk_id", "text", "metadata"]
        )

        # Extract context from search results
        contexts = []
        sources = []
        for hit in results[0]:
            contexts.append(hit.entity.get("text"))
            sources.append({
                "chunk_id": hit.entity.get("chunk_id"),
                "score": float(hit.score),
                "metadata": hit.entity.get("metadata")
            })

        # Step 3: Construct prompt with context
        context_text = "\n\n".join([f"Document {i+1}:\n{ctx}" for i, ctx in enumerate(contexts)])

        prompt = f"""You are a helpful AI assistant. Use the following context to answer the user's question.
        If the answer cannot be found in the context, say so.

Context:
{context_text}

User Question: {request.query}

Answer:"""

        # Step 4: Call Bedrock for generation
        response = bedrock_runtime.invoke_model(
            modelId=BEDROCK_MODEL_ID,
            contentType="application/json",
            accept="application/json",
            body=json.dumps({
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": request.max_tokens,
                "messages": [
                    {
                        "role": "user",
                        "content": prompt
                    }
                ],
                "temperature": 0.7
            })
        )

        response_body = json.loads(response['body'].read())
        answer = response_body['content'][0]['text']

        return QueryResponse(
            answer=answer,
            sources=sources,
            metadata={
                "query": request.query,
                "num_sources": len(sources),
                "model": BEDROCK_MODEL_ID
            }
        )

    except Exception as e:
        logger.error(f"Query error: {str(e)}")
        raise HTTPException(status_code=500, detail=str(e))

async def generate_embedding(text: str) -> List[float]:
    """Generate embedding using Bedrock Titan Embeddings"""
    try:
        response = bedrock_runtime.invoke_model(
            modelId="amazon.titan-embed-text-v2:0",
            contentType="application/json",
            accept="application/json",
            body=json.dumps({
                "inputText": text,
                "dimensions": 1024,
                "normalize": True
            })
        )

        response_body = json.loads(response['body'].read())
        return response_body['embedding']

    except Exception as e:
        logger.error(f"Embedding generation error: {str(e)}")
        raise

@app.get("/")
async def root():
    """Root endpoint"""
    return {
        "message": "Enterprise RAG API",
        "version": "1.0.0",
        "endpoints": {
            "health": "/health",
            "query": "/query",
            "docs": "/docs"
        }
    }

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
PYTHON_CODE

# Create Dockerfile
cat > rag-app/Dockerfile <<EOF
FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY src/ ./src/

# Expose port
EXPOSE 8000

# Run application
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
EOF
Enter fullscreen mode Exit fullscreen mode

Step 6.2: Build and Push Container Image

# Build container image (using podman or docker)
cd rag-app

# Option 1: Build with podman
podman build -t rag-application:v1.0 .

# Option 2: Build with docker
# docker build -t rag-application:v1.0 .

# Tag for OpenShift internal registry
export IMAGE_REGISTRY=$(oc get route default-route -n openshift-image-registry -o jsonpath='{.spec.host}')

# Login to OpenShift registry
podman login -u $(oc whoami) -p $(oc whoami -t) $IMAGE_REGISTRY --tls-verify=false

# Create image stream
oc create imagestream rag-application -n rag-application

# Tag and push
podman tag rag-application:v1.0 $IMAGE_REGISTRY/rag-application/rag-application:v1.0
podman push $IMAGE_REGISTRY/rag-application/rag-application:v1.0 --tls-verify=false

cd ..
Enter fullscreen mode Exit fullscreen mode

Step 6.3: Deploy Application to OpenShift

# Create deployment
cat <<EOF | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rag-application
  namespace: rag-application
  labels:
    app: rag-application
spec:
  replicas: 2
  selector:
    matchLabels:
      app: rag-application
  template:
    metadata:
      labels:
        app: rag-application
    spec:
      serviceAccountName: bedrock-sa
      containers:
      - name: app
        image: image-registry.openshift-image-registry.svc:5000/rag-application/rag-application:v1.0
        ports:
        - containerPort: 8000
          protocol: TCP
        env:
        - name: MILVUS_HOST
          valueFrom:
            configMapKeyRef:
              name: milvus-config
              key: MILVUS_HOST
        - name: MILVUS_PORT
          valueFrom:
            configMapKeyRef:
              name: milvus-config
              key: MILVUS_PORT
        - name: AWS_REGION
          value: "us-east-1"
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "2"
            memory: "4Gi"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: rag-application
  namespace: rag-application
spec:
  selector:
    app: rag-application
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000
  type: ClusterIP
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: rag-application
  namespace: rag-application
spec:
  to:
    kind: Service
    name: rag-application
  port:
    targetPort: 8000
  tls:
    termination: edge
    insecureEdgeTerminationPolicy: Redirect
EOF
Enter fullscreen mode Exit fullscreen mode

Step 6.4: Verify Deployment

# Check deployment status
oc get deployment rag-application -n rag-application
oc get pods -n rag-application -l app=rag-application

# Get application URL
export RAG_APP_URL=$(oc get route rag-application -n rag-application -o jsonpath='{.spec.host}')
echo "RAG Application URL: https://$RAG_APP_URL"

# Test health endpoint
curl https://$RAG_APP_URL/health

# View application logs
oc logs -f deployment/rag-application -n rag-application
Enter fullscreen mode Exit fullscreen mode

Testing and Validation

End-to-End Testing

Test 1: Document Ingestion and Processing

# Upload test documents to S3
cat > test-doc-1.txt <<EOF
Red Hat OpenShift is an enterprise Kubernetes platform that provides
a complete application platform for developing and deploying containerized
applications. It includes integrated CI/CD, monitoring, and developer tools.
EOF

cat > test-doc-2.txt <<EOF
Amazon Bedrock is a fully managed service that offers foundation models
from leading AI companies through a single API. It provides access to
models like Claude, Llama, and Stable Diffusion for various use cases.
EOF

# Upload to S3
aws s3 cp test-doc-1.txt s3://$BUCKET_NAME/raw-documents/
aws s3 cp test-doc-2.txt s3://$BUCKET_NAME/raw-documents/

# Trigger Glue crawler
aws glue start-crawler --name rag-document-crawler --region $AWS_REGION

# Wait and run ETL job
sleep 120
aws glue start-job-run --job-name rag-document-processor --region $AWS_REGION

# Check processed documents
sleep 60
aws s3 ls s3://$BUCKET_NAME/processed-documents/
Enter fullscreen mode Exit fullscreen mode

Test 2: Embedding Generation and Vector Storage

Create a job to process documents into Milvus:

# Create embedding job
cat <<EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata:
  name: embed-documents
  namespace: rag-application
spec:
  template:
    spec:
      serviceAccountName: bedrock-sa
      containers:
      - name: embedder
        image: python:3.11-slim
        env:
        - name: MILVUS_HOST
          valueFrom:
            configMapKeyRef:
              name: milvus-config
              key: MILVUS_HOST
        - name: MILVUS_PORT
          valueFrom:
            configMapKeyRef:
              name: milvus-config
              key: MILVUS_PORT
        - name: AWS_REGION
          value: "us-east-1"
        - name: BUCKET_NAME
          value: "$BUCKET_NAME"
        command:
        - /bin/bash
        - -c
        - |
          pip install pymilvus boto3
          python3 <<PYTHON
          import boto3
          import json
          import os
          from pymilvus import connections, Collection

          # Connect to services
          s3 = boto3.client('s3')
          bedrock = boto3.client('bedrock-runtime', region_name=os.environ['AWS_REGION'])

          connections.connect(
              host=os.environ['MILVUS_HOST'],
              port=os.environ['MILVUS_PORT']
          )
          collection = Collection('rag_documents')

          # Get processed documents
          bucket = os.environ['BUCKET_NAME']
          response = s3.list_objects_v2(Bucket=bucket, Prefix='processed-documents/')

          for obj in response.get('Contents', []):
              if obj['Key'].endswith('.json'):
                  # Read document chunk
                  doc = json.loads(s3.get_object(Bucket=bucket, Key=obj['Key'])['Body'].read())

                  # Generate embedding
                  embed_response = bedrock.invoke_model(
                      modelId='amazon.titan-embed-text-v2:0',
                      body=json.dumps({
                          'inputText': doc['chunk_text'],
                          'dimensions': 1024,
                          'normalize': True
                      })
                  )

                  embedding = json.loads(embed_response['body'].read())['embedding']

                  # Insert into Milvus
                  collection.insert([
                      [doc['chunk_id']],
                      [embedding],
                      [doc['chunk_text']],
                      [doc['metadata']]
                  ])

                  print(f"Inserted: {doc['chunk_id']}")

          collection.flush()
          print(f"Total entities in collection: {collection.num_entities}")
          PYTHON
      restartPolicy: Never
  backoffLimit: 3
EOF

# Monitor job
oc logs job/embed-documents -n rag-application -f
Enter fullscreen mode Exit fullscreen mode

Test 3: RAG Query

# Test RAG query endpoint
curl -X POST "https://$RAG_APP_URL/query" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is Red Hat OpenShift?",
    "top_k": 3,
    "max_tokens": 500
  }' | jq .

# Test another query
curl -X POST "https://$RAG_APP_URL/query" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Tell me about Amazon Bedrock foundation models",
    "top_k": 3,
    "max_tokens": 500
  }' | jq .
Enter fullscreen mode Exit fullscreen mode

Performance Testing

# Install Apache Bench for load testing
sudo yum install httpd-tools -y

# Create query payload
cat > query-payload.json <<EOF
{
  "query": "What are the benefits of using OpenShift?",
  "top_k": 5
}
EOF

# Run load test (100 requests, 10 concurrent)
ab -n 100 -c 10 -p query-payload.json \
  -T application/json \
  "https://$RAG_APP_URL/query"
Enter fullscreen mode Exit fullscreen mode

Resource Cleanup

To avoid ongoing AWS charges, follow these steps to clean up all resources created during this implementation.

Step 1: Delete OpenShift Resources

# Delete RAG application
oc delete deployment rag-application -n rag-application
oc delete service rag-application -n rag-application
oc delete route rag-application -n rag-application

# Delete Milvus
helm uninstall milvus -n milvus
helm uninstall milvus-operator -n milvus
oc delete pvc --all -n milvus

# Delete RHOAI
oc delete datasciencecluster default-dsc -n redhat-ods-operator
oc delete subscription rhods-operator -n redhat-ods-operator

# Delete projects/namespaces
oc delete project rag-application
oc delete project milvus
oc delete project redhat-ods-applications
oc delete project redhat-ods-operator
oc delete project redhat-ods-monitoring
Enter fullscreen mode Exit fullscreen mode

Step 2: Delete ROSA Cluster

# Delete ROSA cluster (takes ~10-15 minutes)
rosa delete cluster --cluster=$CLUSTER_NAME --yes

# Wait for cluster deletion to complete
rosa logs uninstall --cluster=$CLUSTER_NAME --watch

# Verify cluster is deleted
rosa list clusters
Enter fullscreen mode Exit fullscreen mode

Step 3: Delete AWS Glue Resources

# Delete Glue job
aws glue delete-job --job-name rag-document-processor --region $AWS_REGION

# Delete Glue crawler
aws glue delete-crawler --name rag-document-crawler --region $AWS_REGION

# Delete Glue database
aws glue delete-database --name rag_documents_db --region $AWS_REGION

# Delete Glue IAM role
aws iam delete-role-policy --role-name AWSGlueServiceRole-RAG --policy-name S3Access
aws iam detach-role-policy --role-name AWSGlueServiceRole-RAG --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole
aws iam delete-role --role-name AWSGlueServiceRole-RAG
Enter fullscreen mode Exit fullscreen mode

Step 4: Delete S3 Bucket and Contents

# Delete all objects in bucket
aws s3 rm s3://$BUCKET_NAME --recursive --region $AWS_REGION

# Delete bucket
aws s3 rb s3://$BUCKET_NAME --region $AWS_REGION

echo "S3 bucket deleted: $BUCKET_NAME"
Enter fullscreen mode Exit fullscreen mode

Step 5: Delete VPC Endpoint

# Delete VPC endpoint for Bedrock
aws ec2 delete-vpc-endpoints --vpc-endpoint-ids $BEDROCK_VPC_ENDPOINT --region $AWS_REGION

# Delete security group
aws ec2 delete-security-group --group-id $VPC_ENDPOINT_SG --region $AWS_REGION

echo "VPC endpoint and security group deleted"
Enter fullscreen mode Exit fullscreen mode

Step 6: Delete IAM Resources

# Detach policy from Bedrock role
aws iam detach-role-policy \
  --role-name rosa-bedrock-access \
  --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy

# Delete Bedrock role
aws iam delete-role --role-name rosa-bedrock-access

# Delete Bedrock policy
aws iam delete-policy \
  --policy-arn arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/BedrockInvokePolicy

echo "IAM roles and policies deleted"
Enter fullscreen mode Exit fullscreen mode

Step 7: Clean Up Local Files

# Remove temporary files
rm -f bedrock-policy.json
rm -f trust-policy.json
rm -f glue-trust-policy.json
rm -f glue-s3-policy.json
rm -f glue-etl-script.py
rm -f sample-document.txt
rm -f test-doc-1.txt
rm -f test-doc-2.txt
rm -f query-payload.json
rm -f milvus-values.yaml
rm -rf rag-app/

echo "Local temporary files cleaned up"
Enter fullscreen mode Exit fullscreen mode

Verification

# Verify ROSA cluster is deleted
rosa list clusters

# Verify S3 bucket is deleted
aws s3 ls | grep $BUCKET_NAME

# Verify VPC endpoints are deleted
aws ec2 describe-vpc-endpoints --region $AWS_REGION | grep $BEDROCK_VPC_ENDPOINT

# Verify IAM roles are deleted
aws iam list-roles | grep -E "rosa-bedrock-access|AWSGlueServiceRole-RAG"

echo "Cleanup verification complete"
Enter fullscreen mode Exit fullscreen mode

Top comments (0)