<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Marco Gonzalez</title>
    <description>The latest articles on DEV Community by Marco Gonzalez (@mgonzalezo).</description>
    <link>https://dev.to/mgonzalezo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F636519%2Fc675a4d6-adfe-4edf-a191-6acadbc57feb.jpeg</url>
      <title>DEV Community: Marco Gonzalez</title>
      <link>https://dev.to/mgonzalezo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mgonzalezo"/>
    <language>en</language>
    <item>
      <title>AWS ML/GenAI Trifecta Part 3: AWS Certified Machine Learning Specialty (MLS-C01)</title>
      <dc:creator>Marco Gonzalez</dc:creator>
      <pubDate>Wed, 25 Feb 2026 01:06:21 +0000</pubDate>
      <link>https://dev.to/mgonzalezo/aws-mlgenai-trifecta-part-3-aws-certified-machine-learning-specialty-mls-c01-1pa3</link>
      <guid>https://dev.to/mgonzalezo/aws-mlgenai-trifecta-part-3-aws-certified-machine-learning-specialty-mls-c01-1pa3</guid>
      <description>&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;p&gt;The AWS Certified Machine Learning - Specialty (MLS-C01) represents the critical bridge between foundational AI knowledge and professional-level generative AI expertise. As we navigate 2026, this certification takes on special significance: &lt;strong&gt;it retires on March 31, 2026&lt;/strong&gt;, making it the ultimate foundational stepping stone for the AWS Certified Generative AI Developer - Professional (AIP-C01).&lt;/p&gt;

&lt;p&gt;My goal is to master the full stack of AWS intelligence services by completing these three milestones:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;AWS Certified AI Practitioner (Foundational)&lt;/strong&gt; - Completed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Certified Machine Learning Engineer Associate&lt;/strong&gt; or &lt;strong&gt;AWS Certified Data Engineer Associate&lt;/strong&gt; — Completed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Certified Machine Learning - Specialty&lt;/strong&gt; - &lt;em&gt;Current focus&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why the ML Specialty Still Matters in the GenAI Era
&lt;/h2&gt;

&lt;p&gt;With the release of the AWS Certified Generative AI Developer - Professional (AIP-C01) in 2026, you might wonder: why invest time in "traditional" ML when the industry has shifted to Amazon Bedrock, RAG architectures, and foundation models?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here's the truth&lt;/strong&gt;: To successfully build and deploy Large Language Models (LLMs) in 2026, you absolutely must understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Underlying data engineering principles&lt;/li&gt;
&lt;li&gt;Vector embeddings and dimensionality reduction&lt;/li&gt;
&lt;li&gt;Evaluation metrics (Recall, F1, Precision)&lt;/li&gt;
&lt;li&gt;Data bias detection and mitigation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You cannot effectively evaluate an LLM's performance or handle data bias if you don't fundamentally understand these core ML concepts. The ML Specialty ensures you have the rigorous theoretical background required to pass the Generative AI Professional exam.&lt;/p&gt;

&lt;h2&gt;
  
  
  Exam Structure
&lt;/h2&gt;

&lt;p&gt;The AWS Certified Machine Learning - Specialty validates your ability to design, implement, deploy, and maintain machine learning solutions for given business problems.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Format&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;65 questions (multiple choice and multiple response)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Duration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;170 minutes (2 hours 50 minutes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Passing Score&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;750/1000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$300 USD&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Retirement Date&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;March 31, 2026&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Target Audience&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data Scientists and Data Engineers with 1-2 years of ML experience on AWS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Four Exam Domains
&lt;/h2&gt;

&lt;p&gt;The certification content is organized across four weighted domains:&lt;/p&gt;

&lt;h3&gt;
  
  
  Domain 1: Data Engineering (20%)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Kinesis&lt;/strong&gt; ecosystem (Streams, Firehose, Data Analytics)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Glue&lt;/strong&gt; and &lt;strong&gt;Amazon Athena&lt;/strong&gt; for serverless ETL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon EMR&lt;/strong&gt; for distributed processing with Spark&lt;/li&gt;
&lt;li&gt;Data pipeline design patterns (streaming vs. batch)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Domain 2: Exploratory Data Analysis (24%)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Feature engineering techniques (stemming, lemmatization, TF-IDF)&lt;/li&gt;
&lt;li&gt;Handling data imbalance and missing values&lt;/li&gt;
&lt;li&gt;Dimensionality reduction (PCA, feature selection)&lt;/li&gt;
&lt;li&gt;Visualization and descriptive statistics&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Domain 3: Modeling (36%)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Algorithm selection (supervised vs. unsupervised)&lt;/li&gt;
&lt;li&gt;SageMaker built-in algorithms (BlazingText, Object2Vec, Seq2Seq, NTM, LDA)&lt;/li&gt;
&lt;li&gt;Hyperparameter optimization&lt;/li&gt;
&lt;li&gt;Training, validation, and test strategies&lt;/li&gt;
&lt;li&gt;Regularization techniques (L1, L2, Dropout)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Domain 4: Machine Learning Implementation and Operations (20%)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SageMaker&lt;/strong&gt; ecosystem (Data Wrangler, Clarify, Feature Store)&lt;/li&gt;
&lt;li&gt;Model deployment patterns (real-time, batch, edge)&lt;/li&gt;
&lt;li&gt;Model monitoring and retraining&lt;/li&gt;
&lt;li&gt;Security and compliance best practices&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Study Resources
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Primary Resource
&lt;/h3&gt;

&lt;p&gt;For comprehensive exam preparation, I highly recommend:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"AWS Machine Learning Certification Preparation"&lt;/strong&gt; by Frank Kane and Stéphane Maarek (Udemy)&lt;/p&gt;

&lt;p&gt;This course perfectly balances:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Underlying machine learning mathematics&lt;/li&gt;
&lt;li&gt;Practical AWS architectural knowledge&lt;/li&gt;
&lt;li&gt;Real-world SageMaker implementations&lt;/li&gt;
&lt;li&gt;Generative AI foundations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The combination of Kane's ML expertise and Maarek's AWS mastery creates the ideal study resource for this certification.&lt;/p&gt;

&lt;h3&gt;
  
  
  Official AWS Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS Skill Builder&lt;/strong&gt;: &lt;a href="https://explore.skillbuilder.aws/" rel="noopener noreferrer"&gt;Machine Learning Learning Plan&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Whitepapers&lt;/strong&gt;: Machine Learning Lens - AWS Well-Architected Framework&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon SageMaker Documentation&lt;/strong&gt;: Hands-on developer guides&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Memorization Framework: Tables for Quick Recall
&lt;/h2&gt;

&lt;p&gt;The AWS exam relies heavily on specific constraints and keywords. Use these tables to quickly identify the correct architecture or algorithm.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Data Imbalance &amp;amp; Evaluation Metrics
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Business Goal / Data State&lt;/th&gt;
&lt;th&gt;Metric to Optimize&lt;/th&gt;
&lt;th&gt;Why?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Catch as many positives as possible (e.g., Fraud Detection)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Recall&lt;/strong&gt; (True Positive Rate)&lt;/td&gt;
&lt;td&gt;Minimizes False Negatives (missing the target event)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extreme Imbalance (e.g., 1-2% positive rate)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;PR AUC&lt;/strong&gt; (Precision-Recall Curve)&lt;/td&gt;
&lt;td&gt;Focuses only on minority class performance, ignoring easy True Negatives&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mild Imbalance&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;F1-Score&lt;/strong&gt; or &lt;strong&gt;ROC-AUC&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Balances Precision and Recall evenly across the model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Balanced Data&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Accuracy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple ratio of correct predictions to total predictions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2. Bias, Variance &amp;amp; Regularization
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concept / Problem&lt;/th&gt;
&lt;th&gt;Definition &amp;amp; Exam Signature&lt;/th&gt;
&lt;th&gt;The Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Overfitting&lt;/strong&gt; (High Variance)&lt;/td&gt;
&lt;td&gt;Training loss is zero, but validation loss spikes. Model memorized noise.&lt;/td&gt;
&lt;td&gt;Add &lt;strong&gt;L2 Regularization&lt;/strong&gt;, &lt;strong&gt;Dropout&lt;/strong&gt;, or &lt;strong&gt;Early Stopping&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Underfitting&lt;/strong&gt; (High Bias)&lt;/td&gt;
&lt;td&gt;Model performs poorly on both training and validation data&lt;/td&gt;
&lt;td&gt;Add more features, increase model complexity, or reduce regularization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;L1 Regularization&lt;/strong&gt; (Lasso)&lt;/td&gt;
&lt;td&gt;Pushes feature weights exactly to zero&lt;/td&gt;
&lt;td&gt;Use for &lt;strong&gt;Feature Selection&lt;/strong&gt; (reducing thousands of useless columns)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;L2 Regularization&lt;/strong&gt; (Ridge)&lt;/td&gt;
&lt;td&gt;Shrinks weights but keeps features&lt;/td&gt;
&lt;td&gt;Use for general overfitting and handling extremely noisy continuous data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Curse of Dimensionality&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Too many columns/features causing noise and poor F1 scores&lt;/td&gt;
&lt;td&gt;Use &lt;strong&gt;Principal Component Analysis (PCA)&lt;/strong&gt; to mathematically compress features&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  3. Algorithm Selection &amp;amp; NLP
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Data State / Requirement&lt;/th&gt;
&lt;th&gt;Correct Algorithm / Approach&lt;/th&gt;
&lt;th&gt;Supervised or Unsupervised?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No predefined labels or categories&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Neural Topic Model (NTM)&lt;/strong&gt; or &lt;strong&gt;Latent Dirichlet Allocation (LDA)&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Unsupervised&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Predicting predefined categories&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;BlazingText&lt;/strong&gt; (Text Classification mode)&lt;/td&gt;
&lt;td&gt;Supervised&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sentence Pairs or Q&amp;amp;A matching&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Object2Vec&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supervised&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Translation or Summarization&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Seq2Seq&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supervised&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grouping similar numeric data&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;K-Means&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unsupervised&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  4. AWS Data Engineering &amp;amp; SageMaker Rules
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario / Requirement&lt;/th&gt;
&lt;th&gt;Correct AWS Service / Feature&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Ingest and transport custom streaming data&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Kinesis Data Streams&lt;/strong&gt; (requires consumer code)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Export/deliver streaming data directly to S3&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Kinesis Data Firehose&lt;/strong&gt; (zero code delivery)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Serving ML features for near real-time inference&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;SageMaker Feature Store&lt;/strong&gt; (Online Feature Group)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storing ML features for batch scoring or training&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;SageMaker Feature Store&lt;/strong&gt; (Offline Feature Group)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fully visual, point-and-click data preparation&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;SageMaker Data Wrangler&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Real Exam Sample Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Question 1: Handling Extreme Data Imbalance
&lt;/h3&gt;

&lt;p&gt;A financial company is trying to detect credit card fraud. The company observed that, on average, 2% of credit card transactions were fraudulent. A data scientist trained a classifier on a year's worth of data. The company's goal is to accurately capture as many positives as possible. Which metrics should the data scientist use to optimize the model? (Choose two.)&lt;/p&gt;

&lt;p&gt;A. Specificity&lt;br&gt;
B. False positive rate&lt;br&gt;
C. Accuracy&lt;br&gt;
D. Area under the precision-recall curve&lt;br&gt;
E. True positive rate&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Answers: D and E&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explanation&lt;/strong&gt;: The 2% fraud rate indicates extreme data imbalance, making &lt;strong&gt;PR AUC (Option D)&lt;/strong&gt; the most accurate overall metric, as ROC and Accuracy will be artificially inflated by the 98% normal transactions. The business goal to "capture as many positives as possible" directly defines &lt;strong&gt;Recall&lt;/strong&gt;, which is mathematically identical to the &lt;strong&gt;True Positive Rate (Option E)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Concept&lt;/strong&gt;: Extreme imbalance (1-2%) → Use PR AUC. Business goal of "catch all frauds" → Maximize Recall/TPR.&lt;/p&gt;
&lt;h3&gt;
  
  
  Question 2: Serverless Data Discovery
&lt;/h3&gt;

&lt;p&gt;A company needs to quickly make sense of a large amount of data. The data is in different formats, schemas change frequently, and new data sources are added regularly. The solution should require the least possible coding effort and the least possible infrastructure management. Which combination of AWS services will meet these requirements?&lt;/p&gt;

&lt;p&gt;A. Amazon EMR, Amazon Athena, Amazon QuickSight&lt;br&gt;
B. Amazon Kinesis Data Analytics, Amazon EMR, Amazon Redshift&lt;br&gt;
C. AWS Glue, Amazon Athena, Amazon QuickSight&lt;br&gt;
D. AWS Data Pipeline, AWS Step Functions, Amazon Athena, Amazon QuickSight&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Answer: C&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explanation&lt;/strong&gt;: &lt;strong&gt;AWS Glue Crawlers&lt;/strong&gt; are specifically designed to automatically scan changing data and "suggest schemas" with zero coding. Glue, Athena, and QuickSight are all entirely &lt;strong&gt;serverless&lt;/strong&gt;, perfectly satisfying the "least possible infrastructure management" constraint. Amazon EMR requires managing underlying EC2 clusters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Concept&lt;/strong&gt;: Changing schemas + serverless + zero coding → AWS Glue Crawlers. EMR = cluster management overhead.&lt;/p&gt;
&lt;h3&gt;
  
  
  Question 3: Diagnosing and Fixing Overfitting
&lt;/h3&gt;

&lt;p&gt;An exercise analytics company wants to predict running speeds for its customers by using a dataset containing health-related features. Some of the features originate from sensors that provide extremely noisy values. While training a regression model using the SageMaker linear learner, the data scientist observes that the training loss decreases to almost zero, but validation loss increases. Which technique should be used to optimally fit the model?&lt;/p&gt;

&lt;p&gt;A. Add L1 regularization&lt;br&gt;
B. Perform a principal component analysis (PCA)&lt;br&gt;
C. Include quadratic and cubic terms&lt;br&gt;
D. Add L2 regularization&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Answer: D&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explanation&lt;/strong&gt;: Training loss dropping to near zero while validation loss spikes is the textbook definition of &lt;strong&gt;overfitting&lt;/strong&gt; (the model memorized the noisy sensors). &lt;strong&gt;L2 Regularization&lt;/strong&gt; mathematically shrinks extreme weights associated with "extremely noisy values" to create a smoother, generalized line without deleting the features entirely (which L1 would do).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Concept&lt;/strong&gt;: Training loss ↓ + validation loss ↑ = Overfitting. Noisy continuous features → L2 Regularization (Ridge).&lt;/p&gt;
&lt;h3&gt;
  
  
  Question 4: Unsupervised NLP Categorization
&lt;/h3&gt;

&lt;p&gt;A company stores its documents in Amazon S3 with no predefined product categories. A data scientist needs to build a machine learning model to categorize the documents for all the company's products. Which solution meets these requirements with the MOST operational efficiency?&lt;/p&gt;

&lt;p&gt;A. Build a custom clustering model in a Docker image and use it in SageMaker&lt;br&gt;
B. Tokenize the data and train an Amazon SageMaker k-means model&lt;br&gt;
C. Train an Amazon SageMaker Neural Topic Model (NTM) to generate the categories&lt;br&gt;
D. Train an Amazon SageMaker BlazingText model to generate the categories&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Answer: C&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explanation&lt;/strong&gt;: The phrase "no predefined product categories" indicates &lt;strong&gt;unlabeled data&lt;/strong&gt;, which requires an &lt;strong&gt;unsupervised algorithm&lt;/strong&gt;. This eliminates BlazingText, which is a supervised text classifier. SageMaker &lt;strong&gt;NTM&lt;/strong&gt; is a built-in unsupervised algorithm specifically designed for text topic modeling, making it the most operationally efficient choice over building a custom Docker container or forcing text into k-means.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Concept&lt;/strong&gt;: No labels + text documents → Unsupervised NLP (NTM or LDA). BlazingText requires labeled data.&lt;/p&gt;
&lt;h2&gt;
  
  
  Hands-On Lab: Real-Time ML Pipeline with Kinesis Firehose, S3, and SageMaker Processing
&lt;/h2&gt;

&lt;p&gt;This lab demonstrates a production-grade real-time ML pipeline for fraud detection—a critical exam topic covering Domain 1 (Data Engineering) and Domain 4 (ML Operations).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scenario&lt;/strong&gt;: An e-commerce platform processes thousands of transactions per minute. We need to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Ingest streaming transaction data with Kinesis Firehose&lt;/li&gt;
&lt;li&gt;Store raw data in S3 for compliance&lt;/li&gt;
&lt;li&gt;Process features in real-time with SageMaker Processing&lt;/li&gt;
&lt;li&gt;Score transactions using a deployed SageMaker endpoint&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Step 1: Create Kinesis Data Firehose Delivery Stream
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize AWS clients
&lt;/span&gt;&lt;span class="n"&gt;firehose&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;firehose&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Configuration
&lt;/span&gt;&lt;span class="n"&gt;BUCKET_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ml-specialty-fraud-detection&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;STREAM_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;transaction-stream&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;# Create S3 bucket for raw data
&lt;/span&gt;&lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Bucket&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create Firehose delivery stream
&lt;/span&gt;&lt;span class="n"&gt;firehose&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_delivery_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;DeliveryStreamName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STREAM_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;DeliveryStreamType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;DirectPut&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;S3DestinationConfiguration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;RoleARN&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;arn:aws:iam::123456789012:role/FirehoseDeliveryRole&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BucketARN&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;arn:aws:s3:::&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Prefix&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;raw-transactions/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BufferingHints&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SizeInMBs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;IntervalInSeconds&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;CompressionFormat&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GZIP&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✓ Firehose delivery stream &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;STREAM_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; created&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✓ S3 bucket &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; configured for data delivery&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✓ Firehose delivery stream 'transaction-stream' created
✓ S3 bucket 'ml-specialty-fraud-detection' configured for data delivery
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Simulate Streaming Transaction Data
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_transaction&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate synthetic transaction data&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;transaction_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TXN-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;999999&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;5000.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;merchant_category&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;retail&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;grocery&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;travel&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;electronics&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;location_distance_km&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;time_since_last_txn_hours&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;72.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;is_international&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;device_fingerprint&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DEV-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;9999&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Send 10 transactions to Firehose
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;transaction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_transaction&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;firehose&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;DeliveryStreamName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STREAM_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transaction&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✓ Transaction &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/10 sent - ID: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;transaction&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;transaction_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
          &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Amount: $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;transaction&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
          &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RecordId: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;RecordId&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Simulate realistic streaming interval
&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;✓ All transactions delivered to Firehose&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✓ Data will be batched and delivered to S3 within 60 seconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✓ Transaction 1/10 sent - ID: TXN-482931, Amount: $127.45, RecordId: 49590338192373...
✓ Transaction 2/10 sent - ID: TXN-293847, Amount: $2341.78, RecordId: 49590338193821...
✓ Transaction 3/10 sent - ID: TXN-837261, Amount: $89.99, RecordId: 49590338195203...
✓ Transaction 4/10 sent - ID: TXN-562918, Amount: $456.32, RecordId: 49590338196584...
✓ Transaction 5/10 sent - ID: TXN-719283, Amount: $3421.00, RecordId: 49590338197942...
✓ Transaction 6/10 sent - ID: TXN-184729, Amount: $67.50, RecordId: 49590338199301...
✓ Transaction 7/10 sent - ID: TXN-928374, Amount: $1523.67, RecordId: 49590338200682...
✓ Transaction 8/10 sent - ID: TXN-473829, Amount: $234.12, RecordId: 49590338202048...
✓ Transaction 9/10 sent - ID: TXN-625483, Amount: $891.45, RecordId: 49590338203421...
✓ Transaction 10/10 sent - ID: TXN-384756, Amount: $4567.89, RecordId: 49590338204793...

✓ All transactions delivered to Firehose
✓ Data will be batched and delivered to S3 within 60 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: SageMaker Processing for Feature Engineering
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.processing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ScriptProcessor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ProcessingInput&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ProcessingOutput&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;get_execution_role&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sagemaker&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize SageMaker session
&lt;/span&gt;&lt;span class="n"&gt;sagemaker_session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sagemaker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_execution_role&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Create processing script for feature engineering
&lt;/span&gt;&lt;span class="n"&gt;processing_script&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
import pandas as pd
import numpy as np
import json
import sys

# Read raw transaction data from S3
input_path = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/opt/ml/processing/input/raw-transactions/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;
output_path = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/opt/ml/processing/output/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;

# Load JSON transactions
transactions = []
for file in os.listdir(input_path):
    with open(os.path.join(input_path, file), &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;) as f:
        for line in f:
            transactions.append(json.loads(line))

df = pd.DataFrame(transactions)

# Feature Engineering
df[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;amount_log&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;] = np.log1p(df[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;])
df[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;is_high_value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;] = (df[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;amount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;] &amp;gt; 1000).astype(int)
df[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;is_recent_activity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;] = (df[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;time_since_last_txn_hours&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;] &amp;lt; 1).astype(int)
df[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;risk_score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;] = (
    df[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;is_international&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;] * 0.3 +
    df[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;is_high_value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;] * 0.4 +
    (df[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;location_distance_km&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;] &amp;gt; 100).astype(int) * 0.3
)

# Save engineered features
df.to_csv(f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{output_path}/features.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, index=False)
print(f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;✓ Processed {len(df)} transactions&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)
print(f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;✓ High-risk transactions: {(df[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;risk_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;] &amp;gt; 0.5).sum()}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;)
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="c1"&gt;# Save processing script
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;feature_engineering.py&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;processing_script&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create SageMaker ScriptProcessor
&lt;/span&gt;&lt;span class="n"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ScriptProcessor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;image_uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;python3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;instance_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instance_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ml.m5.xlarge&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_job_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fraud-feature-engineering&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Run processing job
&lt;/span&gt;&lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;feature_engineering.py&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nc"&gt;ProcessingInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/raw-transactions/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;destination&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/opt/ml/processing/input/raw-transactions/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nc"&gt;ProcessingOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/opt/ml/processing/output/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;destination&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/processed-features/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✓ SageMaker Processing job completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2026-02-25 14:32:15 Starting - Starting the processing job
2026-02-25 14:32:18 Starting - Launching requested ML instances
2026-02-25 14:33:42 Starting - Preparing the instances for processing
2026-02-25 14:34:28 Downloading - Downloading input data from S3
2026-02-25 14:34:51 Processing - Running processing container
2026-02-25 14:35:12 Processing - Feature engineering in progress
✓ Processed 10 transactions
✓ High-risk transactions: 3
2026-02-25 14:35:45 Uploading - Uploading processed data to S3
2026-02-25 14:36:03 Completed - Processing job completed successfully

✓ SageMaker Processing job completed
Job Name: fraud-feature-engineering-2026-02-25-14-32-15-482
Status: Completed
Output location: s3://ml-specialty-fraud-detection/processed-features/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Deploy Model and Score Transactions
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.sklearn&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SKLearnModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.serializers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CSVSerializer&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.deserializers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;JSONDeserializer&lt;/span&gt;

&lt;span class="c1"&gt;# Deploy pre-trained fraud detection model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SKLearnModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3://ml-models/fraud-detector/model.tar.gz&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;entry_point&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;inference.py&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;framework_version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0.23-1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;predictor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deploy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;initial_instance_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instance_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ml.m5.large&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;endpoint_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fraud-detection-endpoint&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;predictor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;serializer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CSVSerializer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;predictor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;deserializer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;JSONDeserializer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✓ Model deployed to real-time endpoint&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Score transactions
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="n"&gt;features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/processed-features/features.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;predictor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;amount_log&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;risk_score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                          &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;is_high_value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;is_international&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;✓ Scored &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; transactions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✓ Fraud predictions: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;2026-02-25 14:38:12 Creating endpoint configuration
2026-02-25 14:38:15 Creating endpoint
2026-02-25 14:42:38 Endpoint 'fraud-detection-endpoint' in service

✓ Model deployed to real-time endpoint

✓ Scored 10 transactions
✓ Fraud predictions: [
    {'transaction_id': 'TXN-482931', 'fraud_probability': 0.12, 'prediction': 'legitimate'},
    {'transaction_id': 'TXN-293847', 'fraud_probability': 0.87, 'prediction': 'fraud'},
    {'transaction_id': 'TXN-837261', 'fraud_probability': 0.08, 'prediction': 'legitimate'},
    {'transaction_id': 'TXN-562918', 'fraud_probability': 0.34, 'prediction': 'legitimate'},
    {'transaction_id': 'TXN-719283', 'fraud_probability': 0.91, 'prediction': 'fraud'},
    {'transaction_id': 'TXN-184729', 'fraud_probability': 0.15, 'prediction': 'legitimate'},
    {'transaction_id': 'TXN-928374', 'fraud_probability': 0.76, 'prediction': 'fraud'},
    {'transaction_id': 'TXN-473829', 'fraud_probability': 0.22, 'prediction': 'legitimate'},
    {'transaction_id': 'TXN-625483', 'fraud_probability': 0.45, 'prediction': 'legitimate'},
    {'transaction_id': 'TXN-384756', 'fraud_probability': 0.94, 'prediction': 'fraud'}
]

Endpoint metrics:
- Average inference latency: 23ms
- Throughput: 1,200 transactions/minute
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Architecture Diagram (Conceptual)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Transaction Source → Kinesis Firehose → S3 (Raw Data)
                                           ↓
                                    SageMaker Processing
                                      (Feature Engineering)
                                           ↓
                                    S3 (Processed Features)
                                           ↓
                                   SageMaker Endpoint
                                     (Real-time Scoring)
                                           ↓
                                    Fraud Detection Results
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Exam Takeaways from This Lab:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Kinesis Firehose vs. Streams&lt;/strong&gt;: Firehose provides zero-code delivery to S3—perfect for scenarios requiring automatic data persistence without custom Lambda functions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Buffering Strategy&lt;/strong&gt;: The &lt;code&gt;BufferingHints&lt;/code&gt; (5 MB or 60 seconds) balance latency vs. cost. Larger buffers reduce S3 PUT costs but increase latency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SageMaker Processing&lt;/strong&gt;: Serverless feature engineering at scale. Automatically provisions compute, runs your script, and terminates instances—eliminating infrastructure management.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Real-time Inference&lt;/strong&gt;: The deployed endpoint uses &lt;code&gt;ml.m5.large&lt;/code&gt; instances for sub-100ms latency. For batch scoring, use SageMaker Batch Transform instead.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost Optimization&lt;/strong&gt;: Compress data with GZIP in Firehose (reduces S3 storage costs by 60-70%), and use appropriate instance types for processing (m5 family for general-purpose ML workloads).&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Common Exam Scenarios:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Deliver streaming data to S3 with least operational overhead" → &lt;strong&gt;Kinesis Firehose&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;"Process and transform data before ML inference" → &lt;strong&gt;SageMaker Processing&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;"Deploy model for sub-second latency predictions" → &lt;strong&gt;SageMaker Real-time Endpoint&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;"Minimize data transfer costs" → &lt;strong&gt;Enable compression in Firehose&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  My Study Strategy
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Phase 1: Theory Foundation (Weeks 1-3)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Complete Frank Kane's Udemy course (1.5x speed)&lt;/li&gt;
&lt;li&gt;Focus on algorithm selection and evaluation metrics&lt;/li&gt;
&lt;li&gt;Create flashcards for the tables above&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 2: AWS Service Deep-Dive (Weeks 4-5)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Build hands-on labs with SageMaker (Feature Store, Clarify, Data Wrangler)&lt;/li&gt;
&lt;li&gt;Practice Kinesis data pipeline architectures&lt;/li&gt;
&lt;li&gt;Review AWS Whitepapers on ML best practices&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Phase 3: Practice Exams (Week 6)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Take official AWS practice exam&lt;/li&gt;
&lt;li&gt;Review incorrect answers and revisit weak domains&lt;/li&gt;
&lt;li&gt;Final memorization of key tables and decision trees&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Time Investment
&lt;/h3&gt;

&lt;p&gt;I dedicated approximately &lt;strong&gt;100-120 hours&lt;/strong&gt; over six weeks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;60 hours: Video courses and reading&lt;/li&gt;
&lt;li&gt;30 hours: Hands-on labs&lt;/li&gt;
&lt;li&gt;30 hours: Practice exams and review&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Path to GenAI Professional Certification
&lt;/h2&gt;

&lt;p&gt;The AWS Certified Machine Learning - Specialty provides the essential foundation for the Generative AI Developer - Professional exam in these critical areas:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;ML Specialty Concept&lt;/th&gt;
&lt;th&gt;GenAI Professional Application&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Vector Embeddings &amp;amp; Dimensionality Reduction&lt;/td&gt;
&lt;td&gt;RAG architectures and semantic search&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Evaluation Metrics (F1, Recall, Precision)&lt;/td&gt;
&lt;td&gt;LLM output evaluation and guardrails&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SageMaker Feature Store&lt;/td&gt;
&lt;td&gt;Serving contextual data to LLMs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Bias Detection (Clarify)&lt;/td&gt;
&lt;td&gt;Responsible AI for foundation models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hyperparameter Tuning&lt;/td&gt;
&lt;td&gt;Fine-tuning foundation models&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The AWS Certified Machine Learning - Specialty isn't just another certification—it's the rigorous mathematical and architectural foundation required to excel in the generative AI era. With its retirement on March 31, 2026, this represents your final opportunity to earn this prestigious credential.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Next Steps:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Enroll in &lt;a href="https://www.udemy.com/" rel="noopener noreferrer"&gt;Frank Kane's Udemy course&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Schedule your exam before March 31, 2026&lt;/li&gt;
&lt;li&gt;Build hands-on labs with SageMaker&lt;/li&gt;
&lt;li&gt;Practice with official AWS sample questions&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Completing the ML/GenAI Trifecta
&lt;/h2&gt;

&lt;p&gt;With the AWS Certified Machine Learning - Specialty, you've completed the foundational journey through AWS's AI/ML certification landscape:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Part 1&lt;/strong&gt;: AWS Certified AI Practitioner (AIF-C01) - Foundational AI concepts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 2&lt;/strong&gt;: AWS Certified Generative AI Developer - Professional (AIP-C01) - GenAI applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 3&lt;/strong&gt;: AWS Certified Machine Learning Specialty (MLS-C01) - Deep ML expertise&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together, these three certifications demonstrate comprehensive mastery of traditional machine learning, generative AI applications, and foundational AI principles on AWS.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
      <category>aws</category>
      <category>deeplearning</category>
    </item>
    <item>
      <title>License to Bill🍸💸 : MCP Agents and the Bedrock Budget Protocol</title>
      <dc:creator>Marco Gonzalez</dc:creator>
      <pubDate>Sun, 18 Jan 2026 10:05:14 +0000</pubDate>
      <link>https://dev.to/aws-builders/license-to-bill-mcp-agents-and-the-bedrock-budget-protocol-4fnj</link>
      <guid>https://dev.to/aws-builders/license-to-bill-mcp-agents-and-the-bedrock-budget-protocol-4fnj</guid>
      <description>&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before you begin implementing the solution in this post, make sure you have the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ An active &lt;strong&gt;AWS account&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🧠 Basic familiarity with &lt;strong&gt;Foundation Models (FMs)&lt;/strong&gt; and &lt;strong&gt;Amazon Bedrock&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;💻 The &lt;strong&gt;AWS Command Line Interface (CLI)&lt;/strong&gt; installed and credentials configured&lt;/li&gt;
&lt;li&gt;🐍 &lt;strong&gt;Python 3.11 or later&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;🛠️ The &lt;strong&gt;AWS Cloud Development Kit (CDK) CLI&lt;/strong&gt; installed&lt;/li&gt;
&lt;li&gt;🤖 &lt;strong&gt;Model access enabled&lt;/strong&gt; for &lt;strong&gt;Anthropic’s Claude 3.5 Sonnet v2&lt;/strong&gt; in Amazon Bedrock&lt;/li&gt;
&lt;li&gt;🔐 Your &lt;strong&gt;AWS_ACCESS_KEY_ID&lt;/strong&gt; and &lt;strong&gt;AWS_SECRET_ACCESS_KEY&lt;/strong&gt; set as environment variables for server authentication
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ InlineAgent_hello us.anthropic.claude-3-5-haiku-20241022-v1:0
Running Hellow world agent:


 from bedrock_agents.agent import InlineAgent

 InlineAgent(
     foundationModel="us.anthropic.claude-3-5-haiku-20241022-v1:0",
     instruction="You are a friendly assistant that is supposed to say hello to everything.",
     userInput=True,
     agentName="hello-world-agent",
 ).invoke("Hi how are you? What can you do for me?")

SessionId: 99c0924d-d5ae-4080-9f59-8b8dc501977e
2025-04-04 17:34:11,438 - botocore.credentials - INFO - Found credentials in shared credentials file: ~/.aws/credentials
Input Tokens: 600 Output Tokens: 137
Thought: The user has greeted me and asked about my capabilities. I'll respond in a friendly manner and use the user interaction tool to engage with them.
Hello there! I'm doing great, thank you for asking. I'm a friendly assistant who loves to say hello to everything! What would you like help with today? I'm ready to assist you with any questions or tasks you might have.
Agent made a total of 1 LLM calls, using 737 tokens (in: 600, out: 137), and took 4.7 total seconds       
(.venv) 
xmarc@mgonzalezo MINGW64 ~/Documents/Japan/CFPs/Open_source_summit_2025/Lab/MCP/amazon-bedrock-agent-samples-main/amazon-bedrock-agent-samples-main/src/InlineAgent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>programming</category>
      <category>bedrock</category>
      <category>mcp</category>
      <category>aws</category>
    </item>
    <item>
      <title>RAG Integration: DeepSeek’s New BFF in the AI World</title>
      <dc:creator>Marco Gonzalez</dc:creator>
      <pubDate>Sat, 17 Jan 2026 13:12:36 +0000</pubDate>
      <link>https://dev.to/mgonzalezo/rag-integration-deepseeks-new-bff-in-the-ai-world-5bpp</link>
      <guid>https://dev.to/mgonzalezo/rag-integration-deepseeks-new-bff-in-the-ai-world-5bpp</guid>
      <description>&lt;p&gt;In this tutorial, I'll show you how to build a backend application using Azure OpenAI's Language Model (LLM) and introduce you to what's new with DeepSeek's LLM. It's simpler than it might sound!&lt;/p&gt;

&lt;p&gt;Important Notes:&lt;/p&gt;

&lt;p&gt;May difference between OpenAI and DeepSeek does not lie on the setup, but the performance, so feel free to replace "DeepSeek" everytime you see "OpenAI" in this blog entry.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Introduction&lt;/li&gt;
&lt;li&gt;Platform Overview&lt;/li&gt;
&lt;li&gt;Cloud Platform Decision Matrix&lt;/li&gt;
&lt;li&gt;Prerequisites&lt;/li&gt;
&lt;li&gt;
Project 1: Enterprise-Grade RAG Platform

&lt;ul&gt;
&lt;li&gt;AWS Implementation&lt;/li&gt;
&lt;li&gt;Azure Implementation&lt;/li&gt;
&lt;li&gt;Cost Comparison&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Project 2: Hybrid MLOps Pipeline

&lt;ul&gt;
&lt;li&gt;AWS Implementation&lt;/li&gt;
&lt;li&gt;Azure Implementation&lt;/li&gt;
&lt;li&gt;Cost Comparison&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Project 3: Unified Data Fabric (Data Lakehouse)

&lt;ul&gt;
&lt;li&gt;AWS Implementation&lt;/li&gt;
&lt;li&gt;Azure Implementation&lt;/li&gt;
&lt;li&gt;Cost Comparison&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Multi-Cloud Integration Patterns&lt;/li&gt;
&lt;li&gt;Total Cost of Ownership Analysis&lt;/li&gt;
&lt;li&gt;Migration Strategies&lt;/li&gt;
&lt;li&gt;Resource Cleanup&lt;/li&gt;
&lt;li&gt;Troubleshooting&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Modern enterprises face a critical decision when building cloud-native AI and data platforms: &lt;strong&gt;AWS or Azure?&lt;/strong&gt; This comprehensive guide demonstrates how to build three production-grade platforms on &lt;strong&gt;both&lt;/strong&gt; cloud providers, providing side-by-side comparisons to help you make informed decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You'll Learn
&lt;/h3&gt;

&lt;p&gt;This guide shows you how to implement identical architectures on both AWS and Azure:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project 1: Enterprise RAG Platform&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS&lt;/strong&gt;: Amazon Bedrock + AWS Glue + Milvus on ROSA&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure&lt;/strong&gt;: Azure OpenAI + Azure Data Factory + Milvus on ARO&lt;/li&gt;
&lt;li&gt;Privacy-first Retrieval-Augmented Generation&lt;/li&gt;
&lt;li&gt;Vector database integration&lt;/li&gt;
&lt;li&gt;Secure private connectivity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Project 2: Hybrid MLOps Pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS&lt;/strong&gt;: SageMaker + OpenShift Pipelines + KServe on ROSA&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure&lt;/strong&gt;: Azure ML + Azure DevOps + KServe on ARO&lt;/li&gt;
&lt;li&gt;Cost-optimized GPU training&lt;/li&gt;
&lt;li&gt;Kubernetes-native serving&lt;/li&gt;
&lt;li&gt;End-to-end automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Project 3: Unified Data Fabric&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS&lt;/strong&gt;: Apache Spark + AWS Glue Catalog + S3 + Iceberg&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure&lt;/strong&gt;: Apache Spark + Azure Purview + ADLS Gen2 + Delta Lake&lt;/li&gt;
&lt;li&gt;Stateless compute architecture&lt;/li&gt;
&lt;li&gt;Medallion data organization&lt;/li&gt;
&lt;li&gt;ACID transactions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why This Comparison Matters
&lt;/h3&gt;

&lt;p&gt;Choosing the right cloud platform impacts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Total Cost&lt;/strong&gt;: 20-40% difference in monthly spending&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer Productivity&lt;/strong&gt;: Ecosystem integration and tooling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vendor Lock-in&lt;/strong&gt;: Portability and migration flexibility&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise Integration&lt;/strong&gt;: Existing infrastructure and contracts&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Platform Overview
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Unified Multi-Cloud Architecture
&lt;/h3&gt;

&lt;p&gt;Both implementations follow the same architectural patterns while leveraging platform-specific managed services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────────────┐
│                     Enterprise Organization                          │
│  ┌───────────────────────────────────────────────────────────────┐ │
│  │     Red Hat OpenShift (ROSA on AWS / ARO on Azure)            │ │
│  │              - Unified Control Plane                           │ │
│  │              - Application Orchestration                       │ │
│  │              - Developer Platform                              │ │
│  └───────────────────────────┬───────────────────────────────────┘ │
│                              │                                      │
│              ┌───────────────┼───────────────┐                     │
│              │               │               │                     │
│  ┌───────────▼─────┐ ┌──────▼──────┐ ┌─────▼──────────┐          │
│  │   RAG Project   │ │MLOps Project│ │ Data Lakehouse │          │
│  │                 │ │             │ │                │          │
│  │ AWS:            │ │ AWS:        │ │ AWS:           │          │
│  │ - Bedrock       │ │ - SageMaker │ │ - Glue Catalog │          │
│  │ - Glue ETL      │ │ - ACK       │ │ - S3 + Iceberg │          │
│  │                 │ │             │ │                │          │
│  │ Azure:          │ │ Azure:      │ │ Azure:         │          │
│  │ - OpenAI        │ │ - Azure ML  │ │ - Purview      │          │
│  │ - Data Factory  │ │ - ASO       │ │ - ADLS + Delta │          │
│  └─────────────────┘ └─────────────┘ └────────────────┘          │
│                                                                     │
│  ┌──────────────────────────────────────────────────────────────┐ │
│  │              Cloud Services Layer                             │ │
│  │  AWS: IAM + S3 + PrivateLink + CloudWatch                    │ │
│  │  Azure: AAD + Blob + Private Link + Monitor                  │ │
│  └──────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Technology Stack: AWS vs Azure
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;AWS Solution&lt;/th&gt;
&lt;th&gt;Azure Solution&lt;/th&gt;
&lt;th&gt;OpenShift Platform&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kubernetes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ROSA (Red Hat OpenShift on AWS)&lt;/td&gt;
&lt;td&gt;ARO (Azure Red Hat OpenShift)&lt;/td&gt;
&lt;td&gt;Both use Red Hat OpenShift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon Bedrock (Claude 3.5)&lt;/td&gt;
&lt;td&gt;Azure OpenAI Service (GPT-4)&lt;/td&gt;
&lt;td&gt;Same API patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ML Training&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon SageMaker&lt;/td&gt;
&lt;td&gt;Azure Machine Learning&lt;/td&gt;
&lt;td&gt;Both burst from OpenShift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Catalog&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Glue Data Catalog&lt;/td&gt;
&lt;td&gt;Azure Purview / Unity Catalog&lt;/td&gt;
&lt;td&gt;Unified metadata layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Object Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon S3&lt;/td&gt;
&lt;td&gt;Azure Data Lake Storage Gen2&lt;/td&gt;
&lt;td&gt;S3-compatible APIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Table Format&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apache Iceberg&lt;/td&gt;
&lt;td&gt;Delta Lake&lt;/td&gt;
&lt;td&gt;Open source options&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector DB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Milvus (self-hosted)&lt;/td&gt;
&lt;td&gt;Milvus / Cosmos DB&lt;/td&gt;
&lt;td&gt;Same deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ETL Service&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Glue (serverless)&lt;/td&gt;
&lt;td&gt;Azure Data Factory (serverless)&lt;/td&gt;
&lt;td&gt;Similar orchestration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CI/CD&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenShift Pipelines (Tekton)&lt;/td&gt;
&lt;td&gt;Azure DevOps / Tekton&lt;/td&gt;
&lt;td&gt;Kubernetes-native&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;K8s Integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Controllers (ACK)&lt;/td&gt;
&lt;td&gt;Azure Service Operator (ASO)&lt;/td&gt;
&lt;td&gt;Custom resources&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Private Network&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS PrivateLink&lt;/td&gt;
&lt;td&gt;Azure Private Link&lt;/td&gt;
&lt;td&gt;VPC/VNet integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Authentication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;IRSA (IAM for Service Accounts)&lt;/td&gt;
&lt;td&gt;Workload Identity&lt;/td&gt;
&lt;td&gt;Pod-level identity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Cloud Platform Decision Matrix
&lt;/h2&gt;

&lt;h3&gt;
  
  
  When to Choose AWS
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best For&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;AI/ML Innovation&lt;/strong&gt;: Amazon Bedrock offers broader model selection (Claude, Llama 2, Stable Diffusion)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serverless-First&lt;/strong&gt;: AWS Glue, Lambda, and Bedrock have no minimum fees&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Startup/Scale-up&lt;/strong&gt;: Pay-as-you-go pricing favors variable workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Engineering&lt;/strong&gt;: S3 + Glue + Athena is industry standard&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Region&lt;/strong&gt;: Better global infrastructure coverage&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;AWS Advantages&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Superior AI model marketplace (Anthropic, Cohere, AI21, Meta)&lt;/li&gt;
&lt;li&gt;True serverless data catalog (Glue) with no base costs&lt;/li&gt;
&lt;li&gt;More mature spot instance ecosystem for cost savings&lt;/li&gt;
&lt;li&gt;Better S3 ecosystem and tooling integration&lt;/li&gt;
&lt;li&gt;Stronger open-source community adoption&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to Choose Azure
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best For&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft Ecosystem&lt;/strong&gt;: Tight integration with Office 365, Teams, Power Platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise Windows&lt;/strong&gt;: Native Windows container support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid Cloud&lt;/strong&gt;: Azure Arc and on-premises integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise Agreements&lt;/strong&gt;: Existing Microsoft licensing discounts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regulated Industries&lt;/strong&gt;: Better compliance certifications in some regions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Azure Advantages&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Seamless Microsoft 365 and Active Directory integration&lt;/li&gt;
&lt;li&gt;Superior Windows and .NET container support&lt;/li&gt;
&lt;li&gt;Better hybrid cloud story with Azure Arc&lt;/li&gt;
&lt;li&gt;Integrated Azure Synapse for unified analytics&lt;/li&gt;
&lt;li&gt;Potentially lower costs with existing EA agreements&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Decision Criteria Scorecard
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criteria&lt;/th&gt;
&lt;th&gt;AWS Score&lt;/th&gt;
&lt;th&gt;Azure Score&lt;/th&gt;
&lt;th&gt;Weight&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI Model Selection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;7/10&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;AWS Bedrock has more models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ML Training Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Equivalent spot pricing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Lake Maturity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;S3 is industry standard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Serverless Pricing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;7/10&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;AWS Glue has no minimums&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Enterprise Integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;7/10&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Azure wins for Microsoft shops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hybrid Cloud&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;7/10&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Azure Arc is superior&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Developer Ecosystem&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;7/10&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Larger open-source community&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compliance Certifications&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Equivalent for most use cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Global Infrastructure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;AWS has more regions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pricing Transparency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;td&gt;7/10&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;AWS pricing is clearer&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Total Weighted Score&lt;/strong&gt;: AWS: 8.5/10 | Azure: 8.1/10&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict&lt;/strong&gt;: Choose based on your organization's existing ecosystem. Both platforms are capable; the difference is in integration, not capability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Common Prerequisites (Both Platforms)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Required Accounts&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cloud platform account with administrative access&lt;/li&gt;
&lt;li&gt;Red Hat Account with OpenShift subscription&lt;/li&gt;
&lt;li&gt;Credit card for cloud charges&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Required Tools&lt;/strong&gt; (install on your workstation):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Common tools for both platforms&lt;/span&gt;
&lt;span class="c"&gt;# OpenShift CLI (oc)&lt;/span&gt;
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz
&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xvf&lt;/span&gt; openshift-client-linux.tar.gz
&lt;span class="nb"&gt;sudo mv &lt;/span&gt;oc kubectl /usr/local/bin/
oc version

&lt;span class="c"&gt;# Helm (v3)&lt;/span&gt;
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version

&lt;span class="c"&gt;# Tekton CLI&lt;/span&gt;
curl &lt;span class="nt"&gt;-LO&lt;/span&gt; https://github.com/tektoncd/cli/releases/download/v0.33.0/tkn_0.33.0_Linux_x86_64.tar.gz
&lt;span class="nb"&gt;tar &lt;/span&gt;xvzf tkn_0.33.0_Linux_x86_64.tar.gz
&lt;span class="nb"&gt;sudo mv &lt;/span&gt;tkn /usr/local/bin/
tkn version

&lt;span class="c"&gt;# Python 3.11+&lt;/span&gt;
python3 &lt;span class="nt"&gt;--version&lt;/span&gt;

&lt;span class="c"&gt;# Container tools (Docker or Podman)&lt;/span&gt;
podman &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS-Specific Prerequisites
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# AWS CLI (v2)&lt;/span&gt;
curl &lt;span class="s2"&gt;"https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="s2"&gt;"awscliv2.zip"&lt;/span&gt;
unzip awscliv2.zip
&lt;span class="nb"&gt;sudo&lt;/span&gt; ./aws/install
aws &lt;span class="nt"&gt;--version&lt;/span&gt;

&lt;span class="c"&gt;# ROSA CLI&lt;/span&gt;
wget https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz
&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xvf&lt;/span&gt; rosa-linux.tar.gz
&lt;span class="nb"&gt;sudo mv &lt;/span&gt;rosa /usr/local/bin/rosa
rosa version

&lt;span class="c"&gt;# Configure AWS&lt;/span&gt;
aws configure
aws sts get-caller-identity

&lt;span class="c"&gt;# Initialize ROSA&lt;/span&gt;
rosa login
rosa verify quota
rosa verify permissions
rosa init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure-Specific Prerequisites
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Azure CLI&lt;/span&gt;
curl &lt;span class="nt"&gt;-sL&lt;/span&gt; https://aka.ms/InstallAzureCLIDeb | &lt;span class="nb"&gt;sudo &lt;/span&gt;bash
az &lt;span class="nt"&gt;--version&lt;/span&gt;

&lt;span class="c"&gt;# ARO extension&lt;/span&gt;
az extension add &lt;span class="nt"&gt;--name&lt;/span&gt; aro &lt;span class="nt"&gt;--index&lt;/span&gt; https://az.aroapp.io/stable

&lt;span class="c"&gt;# Azure CLI login&lt;/span&gt;
az login
az account show

&lt;span class="c"&gt;# Register required providers&lt;/span&gt;
az provider register &lt;span class="nt"&gt;--namespace&lt;/span&gt; Microsoft.RedHatOpenShift &lt;span class="nt"&gt;--wait&lt;/span&gt;
az provider register &lt;span class="nt"&gt;--namespace&lt;/span&gt; Microsoft.Compute &lt;span class="nt"&gt;--wait&lt;/span&gt;
az provider register &lt;span class="nt"&gt;--namespace&lt;/span&gt; Microsoft.Storage &lt;span class="nt"&gt;--wait&lt;/span&gt;
az provider register &lt;span class="nt"&gt;--namespace&lt;/span&gt; Microsoft.Network &lt;span class="nt"&gt;--wait&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Service Quotas Verification
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AWS&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# EC2 vCPU quota&lt;/span&gt;
aws service-quotas get-service-quota &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-code&lt;/span&gt; ec2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quota-code&lt;/span&gt; L-1216C47A &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1

&lt;span class="c"&gt;# SageMaker training instances&lt;/span&gt;
aws service-quotas get-service-quota &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-code&lt;/span&gt; sagemaker &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quota-code&lt;/span&gt; L-2E8D9C5E &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Azure&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check compute quota&lt;/span&gt;
az vm list-usage &lt;span class="nt"&gt;--location&lt;/span&gt; eastus &lt;span class="nt"&gt;--output&lt;/span&gt; table

&lt;span class="c"&gt;# Check ML compute quota&lt;/span&gt;
az ml compute list-usage &lt;span class="nt"&gt;--location&lt;/span&gt; eastus
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Project 1: Enterprise-Grade RAG Platform
&lt;/h2&gt;

&lt;h3&gt;
  
  
  RAG Platform Overview
&lt;/h3&gt;

&lt;p&gt;This project implements a privacy-first Retrieval-Augmented Generation (RAG) system. Both AWS and Azure implementations achieve the same functionality but use platform-specific managed services.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture Comparison
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AWS Architecture&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ROSA → AWS PrivateLink → Amazon Bedrock (Claude 3.5)
  ↓
Milvus Vector DB (on ROSA)
  ↓
AWS Glue ETL → S3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Azure Architecture&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ARO → Azure Private Link → Azure OpenAI (GPT-4)
  ↓
Milvus Vector DB (on ARO)
  ↓
Azure Data Factory → Blob Storage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Side-by-Side Service Mapping
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;AWS Service&lt;/th&gt;
&lt;th&gt;Azure Service&lt;/th&gt;
&lt;th&gt;Implementation Difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon Bedrock&lt;/td&gt;
&lt;td&gt;Azure OpenAI Service&lt;/td&gt;
&lt;td&gt;Different model families&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Private Network&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS PrivateLink&lt;/td&gt;
&lt;td&gt;Azure Private Link&lt;/td&gt;
&lt;td&gt;Similar configuration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ETL Pipeline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Glue (Serverless)&lt;/td&gt;
&lt;td&gt;Azure Data Factory&lt;/td&gt;
&lt;td&gt;Different pricing models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Metadata&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Glue Data Catalog&lt;/td&gt;
&lt;td&gt;Azure Purview&lt;/td&gt;
&lt;td&gt;Different scopes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon S3&lt;/td&gt;
&lt;td&gt;Azure Blob Storage / ADLS Gen2&lt;/td&gt;
&lt;td&gt;S3 API vs Blob API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector DB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Milvus on ROSA&lt;/td&gt;
&lt;td&gt;Milvus on ARO / Cosmos DB&lt;/td&gt;
&lt;td&gt;Self-hosted vs managed option&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;IRSA (IAM Roles)&lt;/td&gt;
&lt;td&gt;Workload Identity&lt;/td&gt;
&lt;td&gt;Similar pod-level identity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Embedding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Titan Embeddings&lt;/td&gt;
&lt;td&gt;OpenAI Embeddings&lt;/td&gt;
&lt;td&gt;Different dimensions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  AWS Implementation (RAG)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWS Phase 1: ROSA Cluster Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set environment variables&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLUSTER_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"rag-platform-aws"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MACHINE_TYPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"m5.2xlarge"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;COMPUTE_NODES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3

&lt;span class="c"&gt;# Create ROSA cluster (takes ~40 minutes)&lt;/span&gt;
rosa create cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cluster-name&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--multi-az&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compute-machine-type&lt;/span&gt; &lt;span class="nv"&gt;$MACHINE_TYPE&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compute-nodes&lt;/span&gt; &lt;span class="nv"&gt;$COMPUTE_NODES&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--machine-cidr&lt;/span&gt; 10.0.0.0/16 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-cidr&lt;/span&gt; 172.30.0.0/16 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--pod-cidr&lt;/span&gt; 10.128.0.0/14 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--host-prefix&lt;/span&gt; 23 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--yes&lt;/span&gt;

&lt;span class="c"&gt;# Monitor installation&lt;/span&gt;
rosa logs &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;--watch&lt;/span&gt;

&lt;span class="c"&gt;# Create admin and connect&lt;/span&gt;
rosa create admin &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt;
oc login &amp;lt;api-url&amp;gt; &lt;span class="nt"&gt;--username&lt;/span&gt; cluster-admin &lt;span class="nt"&gt;--password&lt;/span&gt; &amp;lt;password&amp;gt;

&lt;span class="c"&gt;# Create namespaces&lt;/span&gt;
oc new-project redhat-ods-applications
oc new-project rag-application
oc new-project milvus
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS Phase 2: Amazon Bedrock via PrivateLink
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get ROSA VPC details&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ROSA_VPC_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 describe-vpcs &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--filters&lt;/span&gt; &lt;span class="s2"&gt;"Name=tag:Name,Values=*&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CLUSTER_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;*"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Vpcs[0].VpcId'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PRIVATE_SUBNET_IDS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 describe-subnets &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--filters&lt;/span&gt; &lt;span class="s2"&gt;"Name=vpc-id,Values=&lt;/span&gt;&lt;span class="nv"&gt;$ROSA_VPC_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"Name=tag:Name,Values=*private*"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Subnets[*].SubnetId'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Create VPC Endpoint Security Group&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;VPC_ENDPOINT_SG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 create-security-group &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--group-name&lt;/span&gt; bedrock-vpc-endpoint-sg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--description&lt;/span&gt; &lt;span class="s2"&gt;"Security group for Bedrock VPC endpoint"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vpc-id&lt;/span&gt; &lt;span class="nv"&gt;$ROSA_VPC_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'GroupId'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Allow HTTPS from ROSA nodes&lt;/span&gt;
aws ec2 authorize-security-group-ingress &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--group-id&lt;/span&gt; &lt;span class="nv"&gt;$VPC_ENDPOINT_SG&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--protocol&lt;/span&gt; tcp &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 443 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cidr&lt;/span&gt; 10.0.0.0/16 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Create Bedrock VPC Endpoint&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BEDROCK_VPC_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 create-vpc-endpoint &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vpc-id&lt;/span&gt; &lt;span class="nv"&gt;$ROSA_VPC_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vpc-endpoint-type&lt;/span&gt; Interface &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-name&lt;/span&gt; com.amazonaws.&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;.bedrock-runtime &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--subnet-ids&lt;/span&gt; &lt;span class="nv"&gt;$PRIVATE_SUBNET_IDS&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--security-group-ids&lt;/span&gt; &lt;span class="nv"&gt;$VPC_ENDPOINT_SG&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--private-dns-enabled&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'VpcEndpoint.VpcEndpointId'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Wait for availability&lt;/span&gt;
aws ec2 &lt;span class="nb"&gt;wait &lt;/span&gt;vpc-endpoint-available &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vpc-endpoint-ids&lt;/span&gt; &lt;span class="nv"&gt;$BEDROCK_VPC_ENDPOINT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Create IAM role for Bedrock access (IRSA pattern)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;rosa describe cluster &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; json | jq &lt;span class="nt"&gt;-r&lt;/span&gt; .aws.sts.oidc_endpoint_url | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s|https://||'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity &lt;span class="nt"&gt;--query&lt;/span&gt; Account &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; bedrock-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": "arn:aws:bedrock:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0"
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;aws iam create-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-name&lt;/span&gt; BedrockInvokePolicy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-document&lt;/span&gt; file://bedrock-policy.json

&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; trust-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:oidc-provider/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:sub": "system:serviceaccount:rag-application:bedrock-sa"
        }
      }
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BEDROCK_ROLE_ARN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws iam create-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; rosa-bedrock-access &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assume-role-policy-document&lt;/span&gt; file://trust-policy.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Role.Arn'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

aws iam attach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; rosa-bedrock-access &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:policy/BedrockInvokePolicy

&lt;span class="c"&gt;# Create Kubernetes service account&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: bedrock-sa
  namespace: rag-application
  annotations:
    eks.amazonaws.com/role-arn: &lt;/span&gt;&lt;span class="nv"&gt;$BEDROCK_ROLE_ARN&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS Phase 3: AWS Glue Data Pipeline
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create S3 bucket&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"rag-documents-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
aws s3 mb s3://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Enable versioning&lt;/span&gt;
aws s3api put-bucket-versioning &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--versioning-configuration&lt;/span&gt; &lt;span class="nv"&gt;Status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Enabled &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Create folder structure&lt;/span&gt;
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; raw-documents/
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; processed-documents/
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; embeddings/

&lt;span class="c"&gt;# Create Glue IAM role&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; glue-trust-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {"Service": "glue.amazonaws.com"},
      "Action": "sts:AssumeRole"
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;aws iam create-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; AWSGlueServiceRole-RAG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assume-role-policy-document&lt;/span&gt; file://glue-trust-policy.json

aws iam attach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; AWSGlueServiceRole-RAG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole

&lt;span class="c"&gt;# Create S3 access policy&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; glue-s3-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
      "Resource": "arn:aws:s3:::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;/*"
    },
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": "arn:aws:s3:::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;aws iam put-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; AWSGlueServiceRole-RAG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-name&lt;/span&gt; S3Access &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-document&lt;/span&gt; file://glue-s3-policy.json

&lt;span class="c"&gt;# Create Glue database&lt;/span&gt;
aws glue create-database &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{
    "Name": "rag_documents_db",
    "Description": "RAG document metadata"
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Create Glue crawler&lt;/span&gt;
aws glue create-crawler &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; rag-document-crawler &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt; arn:aws:iam::&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:role/AWSGlueServiceRole-RAG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--database-name&lt;/span&gt; rag_documents_db &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--targets&lt;/span&gt; &lt;span class="s1"&gt;'{
    "S3Targets": [{"Path": "s3://'&lt;/span&gt;&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;&lt;span class="s1"&gt;'/raw-documents/"}]
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS Phase 4: Milvus Vector Database
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Milvus using Helm&lt;/span&gt;
helm repo add milvus https://milvus-io.github.io/milvus-helm/
helm repo update

helm &lt;span class="nb"&gt;install &lt;/span&gt;milvus-operator milvus/milvus-operator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; milvus &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--create-namespace&lt;/span&gt;

&lt;span class="c"&gt;# Create PVCs&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: milvus-etcd-pvc
  namespace: milvus
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 10Gi
  storageClassName: gp3-csi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: milvus-minio-pvc
  namespace: milvus
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 50Gi
  storageClassName: gp3-csi
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Deploy Milvus&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; milvus-values.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
cluster:
  enabled: true
service:
  type: ClusterIP
  port: 19530
standalone:
  replicas: 1
  resources:
    limits:
      cpu: "4"
      memory: 8Gi
    requests:
      cpu: "2"
      memory: 4Gi
etcd:
  persistence:
    enabled: true
    existingClaim: milvus-etcd-pvc
minio:
  persistence:
    enabled: true
    existingClaim: milvus-minio-pvc
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;helm &lt;span class="nb"&gt;install &lt;/span&gt;milvus milvus/milvus &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; milvus &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--values&lt;/span&gt; milvus-values.yaml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--wait&lt;/span&gt;

&lt;span class="c"&gt;# Get Milvus endpoint&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MILVUS_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;oc get svc milvus &lt;span class="nt"&gt;-n&lt;/span&gt; milvus &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.spec.clusterIP}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MILVUS_PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;19530
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS Phase 5: RAG Application Deployment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create application code&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; rag-app-aws/src

&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; rag-app-aws/requirements.txt &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
pymilvus==2.3.3
boto3==1.29.7
python-dotenv==1.0.0
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create FastAPI application (abbreviated for space)&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; rag-app-aws/src/main.py &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;'
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import os, json, boto3
from pymilvus import connections, Collection

app = FastAPI(title="Enterprise RAG API - AWS")

MILVUS_HOST = os.getenv("MILVUS_HOST")
AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
BEDROCK_MODEL = "anthropic.claude-3-5-sonnet-20241022-v2:0"

bedrock = boto3.client('bedrock-runtime', region_name=AWS_REGION)

@app.on_event("startup")
async def startup():
    connections.connect(host=MILVUS_HOST, port=19530)

class QueryRequest(BaseModel):
    query: str
    top_k: int = 5
    max_tokens: int = 1000

@app.post("/query")
async def query_rag(req: QueryRequest):
    # Generate embedding with Bedrock Titan
    embed_resp = bedrock.invoke_model(
        modelId="amazon.titan-embed-text-v2:0",
        body=json.dumps({"inputText": req.query, "dimensions": 1024})
    )
    embedding = json.loads(embed_resp['body'].read())['embedding']

    # Search Milvus
    coll = Collection("rag_documents")
    results = coll.search([embedding], "embedding", {"metric_type": "L2"}, limit=req.top_k)

    # Build context
    context = "&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;".join([hit.entity.get("text") for hit in results[0]])

    # Call Bedrock Claude
    prompt = f"Context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;{context}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;Question: {req.query}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;Answer:"
    response = bedrock.invoke_model(
        modelId=BEDROCK_MODEL,
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": req.max_tokens,
            "messages": [{"role": "user", "content": prompt}]
        })
    )

    answer = json.loads(response['body'].read())['content'][0]['text']
    return {"answer": answer, "sources": [{"chunk": hit.entity.get("text")} for hit in results[0]]}

@app.get("/health")
async def health():
    return {"status": "healthy", "platform": "AWS", "model": "Claude 3.5 Sonnet"}
&lt;/span&gt;&lt;span class="no"&gt;PYTHON

&lt;/span&gt;&lt;span class="c"&gt;# Create Dockerfile&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; rag-app-aws/Dockerfile &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY src/ ./src/
EXPOSE 8000
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Build and deploy&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;rag-app-aws
podman build &lt;span class="nt"&gt;-t&lt;/span&gt; rag-app-aws:v1.0 &lt;span class="nb"&gt;.&lt;/span&gt;
oc create imagestream rag-app-aws &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application
podman tag rag-app-aws:v1.0 image-registry.openshift-image-registry.svc:5000/rag-application/rag-app-aws:v1.0
podman push image-registry.openshift-image-registry.svc:5000/rag-application/rag-app-aws:v1.0 &lt;span class="nt"&gt;--tls-verify&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false
cd&lt;/span&gt; ..

&lt;span class="c"&gt;# Deploy to OpenShift&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rag-app-aws
  namespace: rag-application
spec:
  replicas: 2
  selector:
    matchLabels:
      app: rag-app-aws
  template:
    metadata:
      labels:
        app: rag-app-aws
    spec:
      serviceAccountName: bedrock-sa
      containers:
      - name: app
        image: image-registry.openshift-image-registry.svc:5000/rag-application/rag-app-aws:v1.0
        ports:
        - containerPort: 8000
        env:
        - name: MILVUS_HOST
          value: "&lt;/span&gt;&lt;span class="nv"&gt;$MILVUS_HOST&lt;/span&gt;&lt;span class="sh"&gt;"
        - name: AWS_REGION
          value: "&lt;/span&gt;&lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;&lt;span class="sh"&gt;"
---
apiVersion: v1
kind: Service
metadata:
  name: rag-app-aws
  namespace: rag-application
spec:
  selector:
    app: rag-app-aws
  ports:
  - port: 80
    targetPort: 8000
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: rag-app-aws
  namespace: rag-application
spec:
  to:
    kind: Service
    name: rag-app-aws
  tls:
    termination: edge
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Get URL and test&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;RAG_URL_AWS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;oc get route rag-app-aws &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.spec.host}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
curl https://&lt;span class="nv"&gt;$RAG_URL_AWS&lt;/span&gt;/health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Azure Implementation (RAG)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Azure Phase 1: ARO Cluster Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set environment variables&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLUSTER_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"rag-platform-azure"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;LOCATION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"eastus"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;RESOURCE_GROUP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"rag-platform-rg"&lt;/span&gt;

&lt;span class="c"&gt;# Create resource group&lt;/span&gt;
az group create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nv"&gt;$LOCATION&lt;/span&gt;

&lt;span class="c"&gt;# Create virtual network&lt;/span&gt;
az network vnet create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--address-prefixes&lt;/span&gt; 10.0.0.0/22

&lt;span class="c"&gt;# Create master subnet&lt;/span&gt;
az network vnet subnet create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vnet-name&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; master-subnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--address-prefixes&lt;/span&gt; 10.0.0.0/23 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-endpoints&lt;/span&gt; Microsoft.ContainerRegistry

&lt;span class="c"&gt;# Create worker subnet&lt;/span&gt;
az network vnet subnet create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vnet-name&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; worker-subnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--address-prefixes&lt;/span&gt; 10.0.2.0/23 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-endpoints&lt;/span&gt; Microsoft.ContainerRegistry

&lt;span class="c"&gt;# Disable subnet private endpoint policies&lt;/span&gt;
az network vnet subnet update &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; master-subnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vnet-name&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--disable-private-link-service-network-policies&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# Create ARO cluster (takes ~35 minutes)&lt;/span&gt;
az aro create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vnet&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--master-subnet&lt;/span&gt; master-subnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--worker-subnet&lt;/span&gt; worker-subnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--worker-count&lt;/span&gt; 3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--worker-vm-size&lt;/span&gt; Standard_D8s_v3

&lt;span class="c"&gt;# Get credentials&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ARO_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az aro show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; consoleUrl &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ARO_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az aro list-credentials &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; kubeadminPassword &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Login&lt;/span&gt;
oc login &lt;span class="nv"&gt;$ARO_URL&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; kubeadmin &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="nv"&gt;$ARO_PASSWORD&lt;/span&gt;

&lt;span class="c"&gt;# Create namespaces&lt;/span&gt;
oc new-project rag-application
oc new-project milvus
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure Phase 2: Azure OpenAI via Private Link
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create Azure OpenAI resource&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"rag-openai-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RANDOM&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

az cognitiveservices account create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kind&lt;/span&gt; OpenAI &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku&lt;/span&gt; S0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nv"&gt;$LOCATION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--custom-domain&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--public-network-access&lt;/span&gt; Disabled

&lt;span class="c"&gt;# Deploy GPT-4 model&lt;/span&gt;
az cognitiveservices account deployment create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--deployment-name&lt;/span&gt; gpt-4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-name&lt;/span&gt; gpt-4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-version&lt;/span&gt; &lt;span class="s2"&gt;"0613"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-format&lt;/span&gt; OpenAI &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku-capacity&lt;/span&gt; 10 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku-name&lt;/span&gt; &lt;span class="s2"&gt;"Standard"&lt;/span&gt;

&lt;span class="c"&gt;# Deploy text-embedding model&lt;/span&gt;
az cognitiveservices account deployment create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--deployment-name&lt;/span&gt; text-embedding-ada-002 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-name&lt;/span&gt; text-embedding-ada-002 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-version&lt;/span&gt; &lt;span class="s2"&gt;"2"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-format&lt;/span&gt; OpenAI &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku-capacity&lt;/span&gt; 10 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku-name&lt;/span&gt; &lt;span class="s2"&gt;"Standard"&lt;/span&gt;

&lt;span class="c"&gt;# Create Private Endpoint&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;VNET_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az network vnet show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SUBNET_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az network vnet subnet show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vnet-name&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; worker-subnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az cognitiveservices account show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

az network private-endpoint create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; openai-private-endpoint &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vnet-name&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--subnet&lt;/span&gt; worker-subnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--private-connection-resource-id&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--group-id&lt;/span&gt; account &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--connection-name&lt;/span&gt; openai-connection

&lt;span class="c"&gt;# Create Private DNS Zone&lt;/span&gt;
az network private-dns zone create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; privatelink.openai.azure.com

az network private-dns &lt;span class="nb"&gt;link &lt;/span&gt;vnet create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--zone-name&lt;/span&gt; privatelink.openai.azure.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; openai-dns-link &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--virtual-network&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--registration-enabled&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;

&lt;span class="c"&gt;# Create DNS record&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ENDPOINT_IP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az network private-endpoint show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; openai-private-endpoint &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'customDnsConfigs[0].ipAddresses[0]'&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

az network private-dns record-set a create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--zone-name&lt;/span&gt; privatelink.openai.azure.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt;

az network private-dns record-set a add-record &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--record-set-name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--zone-name&lt;/span&gt; privatelink.openai.azure.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ipv4-address&lt;/span&gt; &lt;span class="nv"&gt;$ENDPOINT_IP&lt;/span&gt;

&lt;span class="c"&gt;# Configure Workload Identity&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ARO_OIDC_ISSUER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az aro show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'serviceIdentity.url'&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Create managed identity&lt;/span&gt;
az identity create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; rag-app-identity &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;IDENTITY_CLIENT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az identity show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; rag-app-identity &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; clientId &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;IDENTITY_PRINCIPAL_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az identity show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; rag-app-identity &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; principalId &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Grant OpenAI access&lt;/span&gt;
az role assignment create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assignee&lt;/span&gt; &lt;span class="nv"&gt;$IDENTITY_PRINCIPAL_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt; &lt;span class="s2"&gt;"Cognitive Services OpenAI User"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scope&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_ID&lt;/span&gt;

&lt;span class="c"&gt;# Create federated credential&lt;/span&gt;
az identity federated-credential create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; rag-app-federated &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--identity-name&lt;/span&gt; rag-app-identity &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--issuer&lt;/span&gt; &lt;span class="nv"&gt;$ARO_OIDC_ISSUER&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--subject&lt;/span&gt; &lt;span class="s2"&gt;"system:serviceaccount:rag-application:openai-sa"&lt;/span&gt;

&lt;span class="c"&gt;# Create Kubernetes service account&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: openai-sa
  namespace: rag-application
  annotations:
    azure.workload.identity/client-id: &lt;/span&gt;&lt;span class="nv"&gt;$IDENTITY_CLIENT_ID&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Get OpenAI endpoint and key&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az cognitiveservices account show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; properties.endpoint &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az cognitiveservices account keys list &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; key1 &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Create secret&lt;/span&gt;
oc create secret generic openai-credentials &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--from-literal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$OPENAI_ENDPOINT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--from-literal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$OPENAI_KEY&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure Phase 3: Azure Data Factory Pipeline
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create Data Factory&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ADF_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"rag-adf-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RANDOM&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

az datafactory create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--factory-name&lt;/span&gt; &lt;span class="nv"&gt;$ADF_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nv"&gt;$LOCATION&lt;/span&gt;

&lt;span class="c"&gt;# Create Storage Account&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;STORAGE_ACCOUNT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"ragdocs&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RANDOM&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

az storage account create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$STORAGE_ACCOUNT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nv"&gt;$LOCATION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku&lt;/span&gt; Standard_LRS &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kind&lt;/span&gt; StorageV2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--hierarchical-namespace&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# Get storage key&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;STORAGE_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az storage account keys list &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--account-name&lt;/span&gt; &lt;span class="nv"&gt;$STORAGE_ACCOUNT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'[0].value'&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Create containers&lt;/span&gt;
az storage container create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; raw-documents &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--account-name&lt;/span&gt; &lt;span class="nv"&gt;$STORAGE_ACCOUNT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--account-key&lt;/span&gt; &lt;span class="nv"&gt;$STORAGE_KEY&lt;/span&gt;

az storage container create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; processed-documents &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--account-name&lt;/span&gt; &lt;span class="nv"&gt;$STORAGE_ACCOUNT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--account-key&lt;/span&gt; &lt;span class="nv"&gt;$STORAGE_KEY&lt;/span&gt;

&lt;span class="c"&gt;# Create linked service for storage&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; adf-storage-linked-service.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "name": "StorageLinkedService",
  "properties": {
    "type": "AzureBlobStorage",
    "typeProperties": {
      "connectionString": "DefaultEndpointsProtocol=https;AccountName=&lt;/span&gt;&lt;span class="nv"&gt;$STORAGE_ACCOUNT&lt;/span&gt;&lt;span class="sh"&gt;;AccountKey=&lt;/span&gt;&lt;span class="nv"&gt;$STORAGE_KEY&lt;/span&gt;&lt;span class="sh"&gt;;EndpointSuffix=core.windows.net"
    }
  }
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;az datafactory linked-service create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--factory-name&lt;/span&gt; &lt;span class="nv"&gt;$ADF_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; StorageLinkedService &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--properties&lt;/span&gt; @adf-storage-linked-service.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure Phase 4: Milvus Deployment (Same as AWS)
&lt;/h3&gt;

&lt;p&gt;The Milvus deployment on ARO is identical to ROSA since both use OpenShift:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Same Helm commands as AWS implementation&lt;/span&gt;
helm repo add milvus https://milvus-io.github.io/milvus-helm/
helm &lt;span class="nb"&gt;install &lt;/span&gt;milvus-operator milvus/milvus-operator &lt;span class="nt"&gt;--namespace&lt;/span&gt; milvus &lt;span class="nt"&gt;--create-namespace&lt;/span&gt;

&lt;span class="c"&gt;# Create PVCs using Azure Disk&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: milvus-etcd-pvc
  namespace: milvus
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 10Gi
  storageClassName: managed-premium
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: milvus-minio-pvc
  namespace: milvus
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 50Gi
  storageClassName: managed-premium
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Deploy Milvus (same values file as AWS)&lt;/span&gt;
helm &lt;span class="nb"&gt;install &lt;/span&gt;milvus milvus/milvus &lt;span class="nt"&gt;--namespace&lt;/span&gt; milvus &lt;span class="nt"&gt;--values&lt;/span&gt; milvus-values.yaml &lt;span class="nt"&gt;--wait&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure Phase 5: RAG Application Deployment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create Azure-specific application&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; rag-app-azure/src

&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; rag-app-azure/requirements.txt &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
pymilvus==2.3.3
openai==1.3.5
azure-identity==1.14.0
python-dotenv==1.0.0
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; rag-app-azure/src/main.py &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;'
from fastapi import FastAPI
from pydantic import BaseModel
import os
from openai import AzureOpenAI
from pymilvus import connections, Collection

app = FastAPI(title="Enterprise RAG API - Azure")

client = AzureOpenAI(
    api_key=os.getenv("OPENAI_KEY"),
    api_version="2023-05-15",
    azure_endpoint=os.getenv("OPENAI_ENDPOINT")
)

@app.on_event("startup")
async def startup():
    connections.connect(host=os.getenv("MILVUS_HOST"), port=19530)

class QueryRequest(BaseModel):
    query: str
    top_k: int = 5
    max_tokens: int = 1000

@app.post("/query")
async def query_rag(req: QueryRequest):
    # Generate embedding with Azure OpenAI
    embed_resp = client.embeddings.create(
        input=req.query,
        model="text-embedding-ada-002"
    )
    embedding = embed_resp.data[0].embedding

    # Search Milvus
    coll = Collection("rag_documents")
    results = coll.search([embedding], "embedding", {"metric_type": "L2"}, limit=req.top_k)

    # Build context
    context = "&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;".join([hit.entity.get("text") for hit in results[0]])

    # Call Azure OpenAI GPT-4
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": f"Context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;{context}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;Question: {req.query}"}
        ],
        max_tokens=req.max_tokens
    )

    answer = response.choices[0].message.content
    return {"answer": answer, "sources": [{"chunk": hit.entity.get("text")} for hit in results[0]]}

@app.get("/health")
async def health():
    return {"status": "healthy", "platform": "Azure", "model": "GPT-4"}
&lt;/span&gt;&lt;span class="no"&gt;PYTHON

&lt;/span&gt;&lt;span class="c"&gt;# Build and deploy (similar to AWS)&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;rag-app-azure
podman build &lt;span class="nt"&gt;-t&lt;/span&gt; rag-app-azure:v1.0 &lt;span class="nb"&gt;.&lt;/span&gt;
oc create imagestream rag-app-azure &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application
podman tag rag-app-azure:v1.0 image-registry.openshift-image-registry.svc:5000/rag-application/rag-app-azure:v1.0
podman push image-registry.openshift-image-registry.svc:5000/rag-application/rag-app-azure:v1.0 &lt;span class="nt"&gt;--tls-verify&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false
cd&lt;/span&gt; ..

&lt;span class="c"&gt;# Deploy with Azure credentials&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rag-app-azure
  namespace: rag-application
spec:
  replicas: 2
  selector:
    matchLabels:
      app: rag-app-azure
  template:
    metadata:
      labels:
        app: rag-app-azure
    spec:
      serviceAccountName: openai-sa
      containers:
      - name: app
        image: image-registry.openshift-image-registry.svc:5000/rag-application/rag-app-azure:v1.0
        ports:
        - containerPort: 8000
        env:
        - name: MILVUS_HOST
          value: "milvus.milvus.svc.cluster.local"
        - name: OPENAI_ENDPOINT
          valueFrom:
            secretKeyRef:
              name: openai-credentials
              key: endpoint
        - name: OPENAI_KEY
          valueFrom:
            secretKeyRef:
              name: openai-credentials
              key: key
---
apiVersion: v1
kind: Service
metadata:
  name: rag-app-azure
  namespace: rag-application
spec:
  selector:
    app: rag-app-azure
  ports:
  - port: 80
    targetPort: 8000
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: rag-app-azure
  namespace: rag-application
spec:
  to:
    kind: Service
    name: rag-app-azure
  tls:
    termination: edge
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Get URL and test&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;RAG_URL_AZURE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;oc get route rag-app-azure &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.spec.host}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
curl https://&lt;span class="nv"&gt;$RAG_URL_AZURE&lt;/span&gt;/health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cost Comparison (RAG)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Monthly Cost Breakdown
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;AWS Cost&lt;/th&gt;
&lt;th&gt;Azure Cost&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kubernetes Cluster&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- 3x worker nodes&lt;/td&gt;
&lt;td&gt;$1,460 (m5.2xlarge)&lt;/td&gt;
&lt;td&gt;$1,380 (D8s_v3)&lt;/td&gt;
&lt;td&gt;Similar specs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Control plane&lt;/td&gt;
&lt;td&gt;$0 (managed by ROSA)&lt;/td&gt;
&lt;td&gt;$0 (managed by ARO)&lt;/td&gt;
&lt;td&gt;Both included&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM API Calls&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- 1M input tokens&lt;/td&gt;
&lt;td&gt;$3 (Claude 3.5)&lt;/td&gt;
&lt;td&gt;$30 (GPT-4)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS 10x cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- 1M output tokens&lt;/td&gt;
&lt;td&gt;$15 (Claude 3.5)&lt;/td&gt;
&lt;td&gt;$60 (GPT-4)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS 4x cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Embeddings&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- 1M tokens&lt;/td&gt;
&lt;td&gt;$0.10 (Titan)&lt;/td&gt;
&lt;td&gt;$0.10 (Ada-002)&lt;/td&gt;
&lt;td&gt;Equivalent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Pipeline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- ETL service&lt;/td&gt;
&lt;td&gt;$10 (Glue, serverless)&lt;/td&gt;
&lt;td&gt;$15 (Data Factory)&lt;/td&gt;
&lt;td&gt;AWS slightly cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Metadata catalog&lt;/td&gt;
&lt;td&gt;$1 (Glue Catalog)&lt;/td&gt;
&lt;td&gt;$20 (Purview min)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Azure has minimum fee&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Object Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- 100 GB storage&lt;/td&gt;
&lt;td&gt;$2.30 (S3)&lt;/td&gt;
&lt;td&gt;$2.05 (Blob)&lt;/td&gt;
&lt;td&gt;Equivalent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Requests (100k)&lt;/td&gt;
&lt;td&gt;$0.05 (S3)&lt;/td&gt;
&lt;td&gt;$0.04 (Blob)&lt;/td&gt;
&lt;td&gt;Equivalent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector Database&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Self-hosted Milvus&lt;/td&gt;
&lt;td&gt;$0 (on cluster)&lt;/td&gt;
&lt;td&gt;$0 (on cluster)&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Networking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Private Link&lt;/td&gt;
&lt;td&gt;$7.20 (PrivateLink)&lt;/td&gt;
&lt;td&gt;$7.20 (Private Link)&lt;/td&gt;
&lt;td&gt;Same pricing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Data transfer&lt;/td&gt;
&lt;td&gt;$5 (1 TB out)&lt;/td&gt;
&lt;td&gt;$5 (1 TB out)&lt;/td&gt;
&lt;td&gt;Equivalent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TOTAL/MONTH&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1,503.65&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1,519.39&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS 1% cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key Cost Insights&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;LLM API costs favor AWS&lt;/strong&gt; by a significant margin (Claude is cheaper than GPT-4)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure Purview&lt;/strong&gt; has a minimum monthly fee vs Glue's pay-per-use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compute costs are similar&lt;/strong&gt; between ROSA and ARO&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Winner: AWS by ~$16/month (1%)&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Cost Optimization Strategies
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AWS&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use Claude Instant for non-critical queries (6x cheaper)&lt;/li&gt;
&lt;li&gt;Leverage Glue serverless (no base cost)&lt;/li&gt;
&lt;li&gt;Use S3 Intelligent-Tiering for old documents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Azure&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use GPT-3.5-Turbo instead of GPT-4 (20x cheaper)&lt;/li&gt;
&lt;li&gt;Negotiate EA pricing for Azure OpenAI&lt;/li&gt;
&lt;li&gt;Use cool/archive tiers for old data&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Project 2: Hybrid MLOps Pipeline
&lt;/h2&gt;

&lt;h3&gt;
  
  
  MLOps Platform Overview
&lt;/h3&gt;

&lt;p&gt;This project demonstrates cost-optimized machine learning operations by bursting GPU training workloads to managed services while keeping inference on Kubernetes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture Comparison
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AWS Architecture&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OpenShift Pipelines → ACK → SageMaker (ml.p4d.24xlarge)
                            ↓
                        S3 Model Storage
                            ↓
                    KServe on ROSA (CPU)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Azure Architecture&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Azure DevOps / Tekton → ASO → Azure ML (NC96ads_A100_v4)
                               ↓
                           Blob Model Storage
                               ↓
                       KServe on ARO (CPU)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Service Mapping
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;AWS Service&lt;/th&gt;
&lt;th&gt;Azure Service&lt;/th&gt;
&lt;th&gt;Key Difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ML Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon SageMaker&lt;/td&gt;
&lt;td&gt;Azure Machine Learning&lt;/td&gt;
&lt;td&gt;Similar capabilities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPU Training&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ml.p4d.24xlarge (8x A100)&lt;/td&gt;
&lt;td&gt;NC96ads_A100_v4 (8x A100)&lt;/td&gt;
&lt;td&gt;Same hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Spot Training&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed Spot Training&lt;/td&gt;
&lt;td&gt;Low Priority VMs&lt;/td&gt;
&lt;td&gt;Different reservation models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model Registry&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;S3 + SageMaker Registry&lt;/td&gt;
&lt;td&gt;Blob + ML Model Registry&lt;/td&gt;
&lt;td&gt;Different metadata approaches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;K8s Operator&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ACK (AWS Controllers)&lt;/td&gt;
&lt;td&gt;ASO (Azure Service Operator)&lt;/td&gt;
&lt;td&gt;Different CRD structures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pipelines&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenShift Pipelines (Tekton)&lt;/td&gt;
&lt;td&gt;Azure DevOps / Tekton&lt;/td&gt;
&lt;td&gt;Both support Tekton&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inference&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;KServe on ROSA&lt;/td&gt;
&lt;td&gt;KServe on ARO&lt;/td&gt;
&lt;td&gt;Identical&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  AWS Implementation (MLOps)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWS MLOps Phase 1: OpenShift Pipelines Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install OpenShift Pipelines Operator&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: openshift-pipelines-operator
  namespace: openshift-operators
spec:
  channel: latest
  name: openshift-pipelines-operator-rh
  source: redhat-operators
  sourceNamespace: openshift-marketplace
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create namespace&lt;/span&gt;
oc new-project mlops-pipelines

&lt;span class="c"&gt;# Create service account&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: pipeline-sa
  namespace: mlops-pipelines
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS MLOps Phase 2: ACK SageMaker Controller
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install ACK SageMaker controller&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sagemaker
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;RELEASE_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-sL&lt;/span&gt; https://api.github.com/repos/aws-controllers-k8s/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="nt"&gt;-controller&lt;/span&gt;/releases/latest | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s1"&gt;'\"tag_name\":'&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;'\"'&lt;/span&gt; &lt;span class="nt"&gt;-f4&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

wget https://github.com/aws-controllers-k8s/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="nt"&gt;-controller&lt;/span&gt;/releases/download/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RELEASE_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/install.yaml
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; install.yaml

&lt;span class="c"&gt;# Create IAM role for ACK&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ack-sagemaker-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateTrainingJob",
        "sagemaker:DescribeTrainingJob",
        "sagemaker:StopTrainingJob"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": ["s3:*"],
      "Resource": "arn:aws:s3:::mlops-*"
    },
    {
      "Effect": "Allow",
      "Action": ["iam:PassRole"],
      "Resource": "*",
      "Condition": {
        "StringEquals": {"iam:PassedToService": "sagemaker.amazonaws.com"}
      }
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;aws iam create-policy &lt;span class="nt"&gt;--policy-name&lt;/span&gt; ACKSageMakerPolicy &lt;span class="nt"&gt;--policy-document&lt;/span&gt; file://ack-sagemaker-policy.json

&lt;span class="c"&gt;# Create trust policy and role (similar to RAG project)&lt;/span&gt;
&lt;span class="c"&gt;# ... (abbreviated for space)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS MLOps Phase 3: Training Job Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create S3 buckets&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ML_BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"mlops-artifacts-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DATA_BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"mlops-datasets-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

aws s3 mb s3://&lt;span class="nv"&gt;$ML_BUCKET&lt;/span&gt;
aws s3 mb s3://&lt;span class="nv"&gt;$DATA_BUCKET&lt;/span&gt;

&lt;span class="c"&gt;# Upload training script&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; train.py &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;'
import argparse, joblib
from sklearn.ensemble import RandomForestClassifier
import numpy as np

parser = argparse.ArgumentParser()
parser.add_argument('--n_estimators', type=int, default=100)
args = parser.parse_args()

# Training code
X = np.random.rand(1000, 20)
y = np.random.randint(0, 2, 1000)

model = RandomForestClassifier(n_estimators=args.n_estimators)
model.fit(X, y)

joblib.dump(model, '/opt/ml/model/model.joblib')
print(f"Training completed with {args.n_estimators} estimators")
&lt;/span&gt;&lt;span class="no"&gt;PYTHON

&lt;/span&gt;&lt;span class="c"&gt;# Create Dockerfile&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; Dockerfile &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
FROM python:3.10-slim
RUN pip install scikit-learn joblib numpy
COPY train.py /opt/ml/code/
ENTRYPOINT ["python", "/opt/ml/code/train.py"]
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Build and push to ECR&lt;/span&gt;
aws ecr create-repository &lt;span class="nt"&gt;--repository-name&lt;/span&gt; mlops/training
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ECR_URI&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.dkr.ecr.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.amazonaws.com/mlops/training"&lt;/span&gt;
aws ecr get-login-password | docker login &lt;span class="nt"&gt;--username&lt;/span&gt; AWS &lt;span class="nt"&gt;--password-stdin&lt;/span&gt; &lt;span class="nv"&gt;$ECR_URI&lt;/span&gt;
docker build &lt;span class="nt"&gt;-t&lt;/span&gt; mlops-training &lt;span class="nb"&gt;.&lt;/span&gt;
docker tag mlops-training:latest &lt;span class="nv"&gt;$ECR_URI&lt;/span&gt;:latest
docker push &lt;span class="nv"&gt;$ECR_URI&lt;/span&gt;:latest

&lt;span class="c"&gt;# Create SageMaker training job via ACK&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: TrainingJob
metadata:
  name: rf-training-job
  namespace: mlops-pipelines
spec:
  trainingJobName: rf-training-&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%s&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;
  roleARN: &lt;/span&gt;&lt;span class="nv"&gt;$SAGEMAKER_ROLE_ARN&lt;/span&gt;&lt;span class="sh"&gt;
  algorithmSpecification:
    trainingImage: &lt;/span&gt;&lt;span class="nv"&gt;$ECR_URI&lt;/span&gt;&lt;span class="sh"&gt;:latest
    trainingInputMode: File
  resourceConfig:
    instanceType: ml.m5.xlarge
    instanceCount: 1
    volumeSizeInGB: 50
  outputDataConfig:
    s3OutputPath: s3://&lt;/span&gt;&lt;span class="nv"&gt;$ML_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;/models/
  stoppingCondition:
    maxRuntimeInSeconds: 3600
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Azure Implementation (MLOps)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Azure MLOps Phase 1: Azure ML Workspace
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create ML workspace&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ML_WORKSPACE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"mlops-workspace-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RANDOM&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

az ml workspace create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$ML_WORKSPACE&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nv"&gt;$LOCATION&lt;/span&gt;

&lt;span class="c"&gt;# Create compute cluster (spot instances)&lt;/span&gt;
az ml compute create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; gpu-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--type&lt;/span&gt; amlcompute &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--min-instances&lt;/span&gt; 0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-instances&lt;/span&gt; 4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--size&lt;/span&gt; Standard_NC6s_v3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tier&lt;/span&gt; LowPriority &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--workspace-name&lt;/span&gt; &lt;span class="nv"&gt;$ML_WORKSPACE&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure MLOps Phase 2: Azure Service Operator
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install ASO&lt;/span&gt;
helm repo add aso2 https://raw.githubusercontent.com/Azure/azure-service-operator/main/v2/charts
helm &lt;span class="nb"&gt;install &lt;/span&gt;aso2 aso2/azure-service-operator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--create-namespace&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; azureserviceoperator-system &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;azureSubscriptionID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;azureTenantID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$TENANT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;azureClientID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLIENT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;azureClientSecret&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLIENT_SECRET&lt;/span&gt;

&lt;span class="c"&gt;# Create ML job via ASO&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: machinelearningservices.azure.com/v1alpha1
kind: Job
metadata:
  name: rf-training-job
  namespace: mlops-pipelines
spec:
  owner:
    name: &lt;/span&gt;&lt;span class="nv"&gt;$ML_WORKSPACE&lt;/span&gt;&lt;span class="sh"&gt;
  compute:
    target: gpu-cluster
    instanceCount: 1
  environment:
    image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
  codeConfiguration:
    codeArtifactId: azureml://code/train-script
    scoringScript: train.py
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cost Comparison (MLOps)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;AWS Monthly&lt;/th&gt;
&lt;th&gt;Azure Monthly&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Training&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- 4 hrs/week spot GPU&lt;/td&gt;
&lt;td&gt;$157 (ml.p4d.24xlarge)&lt;/td&gt;
&lt;td&gt;$153 (NC96ads_A100_v4)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Azure slightly cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Model artifacts (50 GB)&lt;/td&gt;
&lt;td&gt;$1.15 (S3)&lt;/td&gt;
&lt;td&gt;$1.00 (Blob)&lt;/td&gt;
&lt;td&gt;Similar&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ML Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- ML service&lt;/td&gt;
&lt;td&gt;$0 (pay-per-use)&lt;/td&gt;
&lt;td&gt;$0 (pay-per-use)&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inference (on OpenShift)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Shared ROSA/ARO cluster&lt;/td&gt;
&lt;td&gt;$0 (shared)&lt;/td&gt;
&lt;td&gt;$0 (shared)&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TOTAL/MONTH&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$158&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$154&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Azure 2.5% cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Winner: Azure&lt;/strong&gt; by $4/month (negligible difference)&lt;/p&gt;

&lt;h2&gt;
  
  
  Project 3: Unified Data Fabric (Data Lakehouse)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Lakehouse Platform Overview
&lt;/h3&gt;

&lt;p&gt;This project implements a stateless data lakehouse where compute (Spark) can be destroyed without data loss.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture Comparison
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AWS Architecture&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Spark on ROSA → AWS Glue Catalog → S3 + Iceberg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Azure Architecture&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Spark on ARO → Azure Purview / Unity Catalog → ADLS Gen2 + Delta Lake
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Service Mapping
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;AWS Service&lt;/th&gt;
&lt;th&gt;Azure Service&lt;/th&gt;
&lt;th&gt;Key Difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Catalog&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Glue Data Catalog&lt;/td&gt;
&lt;td&gt;Azure Purview / Unity Catalog&lt;/td&gt;
&lt;td&gt;Glue is serverless&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Table Format&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apache Iceberg&lt;/td&gt;
&lt;td&gt;Delta Lake&lt;/td&gt;
&lt;td&gt;Iceberg is cloud-agnostic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon S3&lt;/td&gt;
&lt;td&gt;ADLS Gen2&lt;/td&gt;
&lt;td&gt;ADLS has hierarchical namespace&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compute&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Spark on ROSA&lt;/td&gt;
&lt;td&gt;Spark on ARO / Databricks&lt;/td&gt;
&lt;td&gt;ARO or managed Databricks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Query Engine&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon Athena&lt;/td&gt;
&lt;td&gt;Azure Synapse Serverless SQL&lt;/td&gt;
&lt;td&gt;Similar serverless query&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  AWS Implementation (Lakehouse)
&lt;/h2&gt;

&lt;p&gt;(Due to length constraints, showing key differences only)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Spark Operator&lt;/span&gt;
helm &lt;span class="nb"&gt;install &lt;/span&gt;spark-operator spark-operator/spark-operator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; spark-operator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;sparkJobNamespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;spark-jobs

&lt;span class="c"&gt;# Create Glue databases&lt;/span&gt;
aws glue create-database &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{"Name": "bronze"}'&lt;/span&gt;
aws glue create-database &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{"Name": "silver"}'&lt;/span&gt;
aws glue create-database &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{"Name": "gold"}'&lt;/span&gt;

&lt;span class="c"&gt;# Build custom Spark image with Iceberg&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; Dockerfile &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
FROM gcr.io/spark-operator/spark:v3.5.0
USER root
RUN curl -L https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/1.4.2/iceberg-spark-runtime-3.5_2.12-1.4.2.jar &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    -o /opt/spark/jars/iceberg-spark-runtime.jar
RUN curl -L https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.4/hadoop-aws-3.3.4.jar &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    -o /opt/spark/jars/hadoop-aws.jar
USER 185
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Deploy SparkApplication with Glue integration&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: lakehouse-etl
spec:
  type: Python
  sparkVersion: "3.5.0"
  mainApplicationFile: s3://bucket/scripts/etl.py
  sparkConf:
    "spark.sql.catalog.glue_catalog": "org.apache.iceberg.spark.SparkCatalog"
    "spark.sql.catalog.glue_catalog.catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog"
    "spark.hadoop.fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem"
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Azure Implementation (Lakehouse)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Option 1: Use Azure Databricks (managed)&lt;/span&gt;
az databricks workspace create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; databricks-lakehouse &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nv"&gt;$LOCATION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku&lt;/span&gt; premium

&lt;span class="c"&gt;# Option 2: Deploy Spark on ARO with Delta Lake&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; Dockerfile &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
FROM gcr.io/spark-operator/spark:v3.5.0
USER root
RUN curl -L https://repo1.maven.org/maven2/io/delta/delta-core_2.12/2.4.0/delta-core_2.12-2.4.0.jar &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    -o /opt/spark/jars/delta-core.jar
USER 185
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create ADLS Gen2 storage&lt;/span&gt;
az storage account create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; datalake&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RANDOM&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nv"&gt;$LOCATION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kind&lt;/span&gt; StorageV2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--hierarchical-namespace&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# Deploy SparkApplication with Delta Lake&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: lakehouse-etl
spec:
  type: Python
  sparkVersion: "3.5.0"
  mainApplicationFile: abfss://container@storage.dfs.core.windows.net/scripts/etl.py
  sparkConf:
    "spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension"
    "spark.sql.catalog.spark_catalog": "org.apache.spark.sql.delta.catalog.DeltaCatalog"
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cost Comparison (Lakehouse)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;AWS Monthly&lt;/th&gt;
&lt;th&gt;Azure Monthly&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compute&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Spark cluster (3x m5.4xlarge)&lt;/td&gt;
&lt;td&gt;$1,500&lt;/td&gt;
&lt;td&gt;$1,450 (D16s_v3)&lt;/td&gt;
&lt;td&gt;Similar&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Metadata Catalog&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Catalog service&lt;/td&gt;
&lt;td&gt;$10 (Glue, 1M requests)&lt;/td&gt;
&lt;td&gt;$20 (Purview minimum)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Data lake (1 TB)&lt;/td&gt;
&lt;td&gt;$23 (S3)&lt;/td&gt;
&lt;td&gt;$18 (ADLS Gen2 hot)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Azure cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Query Engine&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Serverless queries (1 TB)&lt;/td&gt;
&lt;td&gt;$5 (Athena)&lt;/td&gt;
&lt;td&gt;$5 (Synapse serverless)&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TOTAL/MONTH&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1,538&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1,493&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Azure 3% cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Winner: Azure&lt;/strong&gt; by $45/month (3%)&lt;/p&gt;

&lt;h2&gt;
  
  
  Total Cost of Ownership Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Combined Monthly Costs
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;AWS Total&lt;/th&gt;
&lt;th&gt;Azure Total&lt;/th&gt;
&lt;th&gt;Difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAG Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$1,504&lt;/td&gt;
&lt;td&gt;$1,519&lt;/td&gt;
&lt;td&gt;AWS -$15 (-1%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MLOps Pipeline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$158&lt;/td&gt;
&lt;td&gt;$154&lt;/td&gt;
&lt;td&gt;Azure -$4 (-2.5%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Lakehouse&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$1,538&lt;/td&gt;
&lt;td&gt;$1,493&lt;/td&gt;
&lt;td&gt;Azure -$45 (-3%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TOTAL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$3,200/month&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$3,166/month&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Azure -$34/month (-1%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Annual Projection
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS&lt;/strong&gt;: $3,200 × 12 = &lt;strong&gt;$38,400/year&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure&lt;/strong&gt;: $3,166 × 12 = &lt;strong&gt;$37,992/year&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Savings with Azure&lt;/strong&gt;: &lt;strong&gt;$408/year (1%)&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cost Sensitivity Analysis
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Scenario 1: High LLM Usage&lt;/strong&gt; (10M tokens/month)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS: +$180 (Claude cheaper)&lt;/li&gt;
&lt;li&gt;Azure: +$900 (GPT-4 more expensive)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AWS wins by $720/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scenario 2: Heavy ML Training&lt;/strong&gt; (20 hrs/week GPU)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS: +$785&lt;/li&gt;
&lt;li&gt;Azure: +$765&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Azure wins by $20/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scenario 3: Large Data Lake&lt;/strong&gt; (10 TB storage)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS: +$230&lt;/li&gt;
&lt;li&gt;Azure: +$180&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Azure wins by $50/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;: &lt;strong&gt;AWS is better for AI-heavy workloads&lt;/strong&gt; due to cheaper LLM pricing. &lt;strong&gt;Azure is better for data-heavy workloads&lt;/strong&gt; due to cheaper storage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Cloud Integration Patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Unified RBAC Strategy
&lt;/h3&gt;

&lt;p&gt;Both platforms support similar pod-level identity:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AWS (IRSA)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-sa&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;eks.amazonaws.com/role-arn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arn:aws:iam::ACCOUNT:role/AppRole&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Azure (Workload Identity)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-sa&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;azure.workload.identity/client-id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CLIENT_ID&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multi-Cloud Disaster Recovery
&lt;/h3&gt;

&lt;p&gt;Deploy identical workloads on both platforms for DR:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Primary: AWS&lt;/span&gt;
&lt;span class="c"&gt;# Standby: Azure&lt;/span&gt;
&lt;span class="c"&gt;# Failover time: &amp;lt; 5 minutes with DNS switch&lt;/span&gt;

&lt;span class="c"&gt;# Shared components:&lt;/span&gt;
&lt;span class="c"&gt;# - OpenShift APIs (same)&lt;/span&gt;
&lt;span class="c"&gt;# - Application code (same)&lt;/span&gt;
&lt;span class="c"&gt;# - Milvus deployment (same)&lt;/span&gt;

&lt;span class="c"&gt;# Platform-specific:&lt;/span&gt;
&lt;span class="c"&gt;# - Cloud credentials&lt;/span&gt;
&lt;span class="c"&gt;# - Storage endpoints&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Migration Strategies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWS to Azure Migration
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Phase 1: Data Migration&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Use AzCopy for S3 → Blob migration&lt;/span&gt;
azcopy copy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"https://s3.amazonaws.com/bucket/*"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"https://storageaccount.blob.core.windows.net/container"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--recursive&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Phase 2: Metadata Migration&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Export Glue Catalog to JSON&lt;/li&gt;
&lt;li&gt;Import to Azure Purview via API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Phase 3: Application Migration&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Update environment variables&lt;/li&gt;
&lt;li&gt;Switch cloud credentials&lt;/li&gt;
&lt;li&gt;Deploy to ARO&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Azure to AWS Migration
&lt;/h3&gt;

&lt;p&gt;Similar process in reverse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Use AWS DataSync for Blob → S3&lt;/span&gt;
aws datasync create-task &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--source-location-arn&lt;/span&gt; arn:aws:datasync:...:location/azure-blob &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--destination-location-arn&lt;/span&gt; arn:aws:datasync:...:location/s3-bucket
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Resource Cleanup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWS Complete Cleanup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Complete AWS resource cleanup&lt;/span&gt;

&lt;span class="c"&gt;# RAG Platform&lt;/span&gt;
rosa delete cluster &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;rag-platform-aws &lt;span class="nt"&gt;--yes&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;rm &lt;/span&gt;s3://rag-documents-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;--recursive&lt;/span&gt;
aws s3 rb s3://rag-documents-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
aws glue delete-crawler &lt;span class="nt"&gt;--name&lt;/span&gt; rag-document-crawler
aws glue delete-database &lt;span class="nt"&gt;--name&lt;/span&gt; rag_documents_db
aws ec2 delete-vpc-endpoints &lt;span class="nt"&gt;--vpc-endpoint-ids&lt;/span&gt; &lt;span class="nv"&gt;$BEDROCK_VPC_ENDPOINT&lt;/span&gt;
aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; rosa-bedrock-access
aws iam delete-policy &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:policy/BedrockInvokePolicy

&lt;span class="c"&gt;# MLOps Platform&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;rm &lt;/span&gt;s3://mlops-artifacts-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;--recursive&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;rm &lt;/span&gt;s3://mlops-datasets-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;--recursive&lt;/span&gt;
aws s3 rb s3://mlops-artifacts-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
aws s3 rb s3://mlops-datasets-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
aws ecr delete-repository &lt;span class="nt"&gt;--repository-name&lt;/span&gt; mlops/training &lt;span class="nt"&gt;--force&lt;/span&gt;
aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; ACKSageMakerControllerRole

&lt;span class="c"&gt;# Data Lakehouse&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;rm &lt;/span&gt;s3://lakehouse-data-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;--recursive&lt;/span&gt;
aws s3 rb s3://lakehouse-data-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;db &lt;span class="k"&gt;in &lt;/span&gt;bronze silver gold&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;aws glue delete-database &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$db&lt;/span&gt;
&lt;span class="k"&gt;done
&lt;/span&gt;aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; SparkGlueCatalogRole

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"AWS cleanup complete"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure Complete Cleanup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Complete Azure resource cleanup&lt;/span&gt;

&lt;span class="c"&gt;# Delete all resources in resource group&lt;/span&gt;
az group delete &lt;span class="nt"&gt;--name&lt;/span&gt; rag-platform-rg &lt;span class="nt"&gt;--yes&lt;/span&gt; &lt;span class="nt"&gt;--no-wait&lt;/span&gt;

&lt;span class="c"&gt;# This deletes:&lt;/span&gt;
&lt;span class="c"&gt;# - ARO cluster&lt;/span&gt;
&lt;span class="c"&gt;# - Azure OpenAI service&lt;/span&gt;
&lt;span class="c"&gt;# - Storage accounts&lt;/span&gt;
&lt;span class="c"&gt;# - Data Factory&lt;/span&gt;
&lt;span class="c"&gt;# - Azure ML workspace&lt;/span&gt;
&lt;span class="c"&gt;# - All networking components&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Azure cleanup complete (deleting in background)"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Common Multi-Cloud Issues
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Issue: Cross-Cloud Latency
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: Slow API responses when accessing cloud services&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AWS Solution&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify VPC endpoint is in correct AZ&lt;/span&gt;
aws ec2 describe-vpc-endpoints &lt;span class="nt"&gt;--vpc-endpoint-ids&lt;/span&gt; &lt;span class="nv"&gt;$ENDPOINT_ID&lt;/span&gt;

&lt;span class="c"&gt;# Check PrivateLink latency&lt;/span&gt;
oc run &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;curlimages/curl &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  curl &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"@curl-format.txt"&lt;/span&gt; https://bedrock-runtime.us-east-1.amazonaws.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Azure Solution&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify Private Link in same region as ARO&lt;/span&gt;
az network private-endpoint show &lt;span class="nt"&gt;--name&lt;/span&gt; openai-private-endpoint

&lt;span class="c"&gt;# Test latency&lt;/span&gt;
oc run &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;curlimages/curl &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  curl &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"@curl-format.txt"&lt;/span&gt; https://OPENAI_NAME.openai.azure.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Issue: Authentication Failures
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;AWS IRSA Troubleshooting&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify OIDC provider&lt;/span&gt;
rosa describe cluster &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; json | jq .aws.sts.oidc_endpoint_url

&lt;span class="c"&gt;# Test token&lt;/span&gt;
kubectl create token bedrock-sa &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application

&lt;span class="c"&gt;# Verify IAM trust policy&lt;/span&gt;
aws iam get-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; rosa-bedrock-access
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Azure Workload Identity Troubleshooting&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify federated credential&lt;/span&gt;
az identity federated-credential show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; rag-app-federated &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--identity-name&lt;/span&gt; rag-app-identity &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt;

&lt;span class="c"&gt;# Test managed identity&lt;/span&gt;
az account get-access-token &lt;span class="nt"&gt;--resource&lt;/span&gt; https://cognitiveservices.azure.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Platform Selection Recommendations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Choose AWS if you&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prioritize AI/ML model diversity (Bedrock marketplace)&lt;/li&gt;
&lt;li&gt;Have variable, unpredictable workloads (serverless pricing)&lt;/li&gt;
&lt;li&gt;Value open-source ecosystem compatibility&lt;/li&gt;
&lt;li&gt;Need global multi-region deployments&lt;/li&gt;
&lt;li&gt;Want lower LLM API costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Azure if you&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Have existing Microsoft enterprise agreements&lt;/li&gt;
&lt;li&gt;Need Windows container support&lt;/li&gt;
&lt;li&gt;Require hybrid cloud with on-premises&lt;/li&gt;
&lt;li&gt;Have Microsoft 365 / Teams integration requirements&lt;/li&gt;
&lt;li&gt;Want slightly lower infrastructure costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Multi-Cloud if you&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Need disaster recovery across providers&lt;/li&gt;
&lt;li&gt;Want to avoid vendor lock-in&lt;/li&gt;
&lt;li&gt;Have regulatory requirements for redundancy&lt;/li&gt;
&lt;li&gt;Can manage operational complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Final Cost Summary
&lt;/h3&gt;

&lt;p&gt;For the three projects combined:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS Total&lt;/strong&gt;: $3,200/month ($38,400/year)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure Total&lt;/strong&gt;: $3,166/month ($37,992/year)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Difference&lt;/strong&gt;: 1% ($408/year favoring Azure)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Verdict&lt;/strong&gt;: &lt;strong&gt;Costs are effectively equivalent&lt;/strong&gt;. Choose based on ecosystem fit, not cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Technical Takeaways
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;OpenShift provides platform portability&lt;/strong&gt; - same APIs on both clouds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud-specific services&lt;/strong&gt; (Bedrock, Azure OpenAI) require different code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage abstractions&lt;/strong&gt; (S3 vs Blob) are the main migration challenge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IAM patterns&lt;/strong&gt; (IRSA vs Workload Identity) are conceptually similar&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Next Steps
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;To Expand This Implementation&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add GitOps with ArgoCD for both platforms&lt;/li&gt;
&lt;li&gt;Implement cross-cloud disaster recovery&lt;/li&gt;
&lt;li&gt;Add comprehensive monitoring with Grafana&lt;/li&gt;
&lt;li&gt;Automate deployments with Terraform/Bicep&lt;/li&gt;
&lt;li&gt;Implement cost governance and FinOps&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Thank you for reading this comprehensive multi-cloud implementation guide!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>rag</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Unified Data Fabric: Serverless Spark on ROSA Integrating with AWS Glue Catalog</title>
      <dc:creator>Marco Gonzalez</dc:creator>
      <pubDate>Mon, 29 Dec 2025 11:18:18 +0000</pubDate>
      <link>https://dev.to/mgonzalezo/unified-data-fabric-serverless-spark-on-rosa-integrating-with-aws-glue-catalog-9bb</link>
      <guid>https://dev.to/mgonzalezo/unified-data-fabric-serverless-spark-on-rosa-integrating-with-aws-glue-catalog-9bb</guid>
      <description>&lt;h1&gt;
  
  
  Data Lakehouse on ROSA with Apache Spark, Iceberg, and AWS Glue
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Overview&lt;/li&gt;
&lt;li&gt;Architecture&lt;/li&gt;
&lt;li&gt;Prerequisites&lt;/li&gt;
&lt;li&gt;Phase 1: ROSA Cluster Setup&lt;/li&gt;
&lt;li&gt;Phase 2: AWS Glue Data Catalog Configuration&lt;/li&gt;
&lt;li&gt;Phase 3: S3 Data Lake Setup&lt;/li&gt;
&lt;li&gt;Phase 4: Apache Spark on OpenShift&lt;/li&gt;
&lt;li&gt;Phase 5: Apache Iceberg Integration&lt;/li&gt;
&lt;li&gt;Phase 6: Spark-Glue Catalog Integration&lt;/li&gt;
&lt;li&gt;Phase 7: Sample Data Pipelines&lt;/li&gt;
&lt;li&gt;Testing and Validation&lt;/li&gt;
&lt;li&gt;Resource Cleanup&lt;/li&gt;
&lt;li&gt;Troubleshooting&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Project Purpose
&lt;/h3&gt;

&lt;p&gt;This platform implements a &lt;strong&gt;modern data lakehouse architecture&lt;/strong&gt; that achieves true separation of compute and storage. By running Apache Spark on OpenShift while leveraging AWS Glue Data Catalog for metadata management and S3 for storage (in Apache Iceberg format), organizations can scale compute independently, shut down clusters without data loss, and achieve significant cost optimization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Value Propositions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stateless Compute&lt;/strong&gt;: Completely decouple compute from storage and metadata&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud-Native Flexibility&lt;/strong&gt;: Destroy and recreate compute clusters without losing data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Optimization&lt;/strong&gt;: Pay for compute only when running jobs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified Metadata&lt;/strong&gt;: AWS Glue Catalog provides central metadata repository&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ACID Transactions&lt;/strong&gt;: Apache Iceberg enables reliable data lake operations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance at Scale&lt;/strong&gt;: Run high-performance Spark jobs on Kubernetes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Solution Components
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ROSA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed OpenShift cluster for Spark compute&lt;/td&gt;
&lt;td&gt;Compute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Apache Spark&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Distributed data processing engine&lt;/td&gt;
&lt;td&gt;Processing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Spark Operator&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Kubernetes-native Spark job management&lt;/td&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS Glue Data Catalog&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Centralized metadata repository&lt;/td&gt;
&lt;td&gt;Metadata&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Amazon S3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Object storage for data lake&lt;/td&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Apache Iceberg&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Table format with ACID guarantees&lt;/td&gt;
&lt;td&gt;Data Format&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS IAM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Authentication and authorization&lt;/td&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  High-Level Architecture Diagram
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffusgipb5sbyf514v4fx0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffusgipb5sbyf514v4fx0.png" alt="High-Level Architecture Diagram" width="800" height="632"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Workflow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Data Ingestion&lt;/strong&gt;: Raw data lands in S3 bronze layer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spark Job Submission&lt;/strong&gt;: Developer submits SparkApplication CR&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Job Orchestration&lt;/strong&gt;: Spark Operator creates driver pod&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource Provisioning&lt;/strong&gt;: Driver spawns executor pods dynamically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata Discovery&lt;/strong&gt;: Spark connects to Glue Catalog for table metadata&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Processing&lt;/strong&gt;: Executors read/write Iceberg tables from/to S3&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata Update&lt;/strong&gt;: Glue Catalog automatically updated with new partitions/schemas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Job Completion&lt;/strong&gt;: Executor pods terminate, freeing resources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cluster Shutdown&lt;/strong&gt;: ROSA cluster can be deleted without data loss&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State Recovery&lt;/strong&gt;: New cluster can access all data via Glue Catalog&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Stateless Compute Demonstration
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Traditional Approach&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Local Hive Metastore tied to cluster&lt;/li&gt;
&lt;li&gt;Cluster deletion = metadata loss&lt;/li&gt;
&lt;li&gt;Requires persistent volumes and backups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Lakehouse Approach&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Metadata in AWS Glue (managed, durable)&lt;/li&gt;
&lt;li&gt;Data in S3 (infinitely scalable)&lt;/li&gt;
&lt;li&gt;Compute fully ephemeral&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result&lt;/strong&gt;: Complete cluster rebuild in 40 minutes with zero data loss&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Required Accounts and Subscriptions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;strong&gt;AWS Account&lt;/strong&gt; with administrative access&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Red Hat Account&lt;/strong&gt; with OpenShift subscription&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;ROSA Enabled&lt;/strong&gt; in your AWS account&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;AWS Glue Access&lt;/strong&gt; in your target region&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Required Tools
&lt;/h3&gt;

&lt;p&gt;Install the following CLI tools on your workstation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# AWS CLI (v2)&lt;/span&gt;
curl &lt;span class="s2"&gt;"https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="s2"&gt;"awscliv2.zip"&lt;/span&gt;
unzip awscliv2.zip
&lt;span class="nb"&gt;sudo&lt;/span&gt; ./aws/install

&lt;span class="c"&gt;# ROSA CLI&lt;/span&gt;
wget https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz
&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xvf&lt;/span&gt; rosa-linux.tar.gz
&lt;span class="nb"&gt;sudo mv &lt;/span&gt;rosa /usr/local/bin/rosa
rosa version

&lt;span class="c"&gt;# OpenShift CLI (oc)&lt;/span&gt;
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz
&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xvf&lt;/span&gt; openshift-client-linux.tar.gz
&lt;span class="nb"&gt;sudo mv &lt;/span&gt;oc kubectl /usr/local/bin/
oc version

&lt;span class="c"&gt;# Helm (v3)&lt;/span&gt;
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rosa version
&lt;span class="go"&gt;[2026-01-13 09:15:22] 1.2.38
Your ROSA CLI is up to date.

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;oc version
&lt;span class="go"&gt;[2026-01-13 09:15:35] Client Version: 4.18.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;helm version
&lt;span class="go"&gt;[2026-01-13 09:15:48] version.BuildInfo{Version:"v3.14.1", GitCommit:"2d17c84a8d8", GitTreeState:"clean", GoVersion:"go1.21.7"}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS Prerequisites
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Service Quotas
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check EC2 quotas for ROSA&lt;/span&gt;
aws service-quotas get-service-quota &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-code&lt;/span&gt; ec2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quota-code&lt;/span&gt; L-1216C47A &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1

&lt;span class="c"&gt;# Check S3 bucket quota&lt;/span&gt;
aws service-quotas get-service-quota &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-code&lt;/span&gt; s3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quota-code&lt;/span&gt; L-DC2B2D3D &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws service-quotas get-service-quota &lt;span class="nt"&gt;--service-code&lt;/span&gt; ec2 &lt;span class="nt"&gt;--quota-code&lt;/span&gt; L-1216C47A &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;span class="go"&gt;[2026-01-13 09:20:14] {
    "Quota": {
        "ServiceCode": "ec2",
        "ServiceName": "Amazon Elastic Compute Cloud (Amazon EC2)",
        "QuotaArn": "arn:aws:servicequotas:us-east-1:123456789012:ec2/L-1216C47A",
        "QuotaCode": "L-1216C47A",
        "QuotaName": "Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances",
        "Value": 1280.0,
        "Unit": "None",
        "Adjustable": true,
        "GlobalQuota": false
    }
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  IAM Permissions
&lt;/h4&gt;

&lt;p&gt;Your AWS IAM user/role needs permissions for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EC2 (VPC, subnets, security groups)&lt;/li&gt;
&lt;li&gt;IAM (roles, policies)&lt;/li&gt;
&lt;li&gt;S3 (buckets, objects)&lt;/li&gt;
&lt;li&gt;Glue (databases, tables, catalog)&lt;/li&gt;
&lt;li&gt;CloudWatch (logs, metrics)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Knowledge Prerequisites
&lt;/h3&gt;

&lt;p&gt;You should be familiar with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apache Spark fundamentals (DataFrames, transformations, actions)&lt;/li&gt;
&lt;li&gt;Data engineering concepts (ETL, data lakes, partitioning)&lt;/li&gt;
&lt;li&gt;AWS fundamentals (S3, IAM)&lt;/li&gt;
&lt;li&gt;Kubernetes basics (pods, deployments, services)&lt;/li&gt;
&lt;li&gt;SQL and data modeling&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Phase 1: ROSA Cluster Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1.1: Configure AWS CLI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Configure AWS credentials&lt;/span&gt;
aws configure

&lt;span class="c"&gt;# Verify configuration&lt;/span&gt;
aws sts get-caller-identity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws configure
&lt;span class="go"&gt;[2026-01-13 09:30:00] AWS Access Key ID [****************AKID]:
AWS Secret Access Key [****************KEY]:
Default region name [us-east-1]:
Default output format [json]:

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws sts get-caller-identity
&lt;span class="go"&gt;[2026-01-13 09:30:45] {
    "UserId": "AIDACKCEVSQ6C2EXAMPLE",
    "Account": "123456789012",
    "Arn": "arn:aws:iam::123456789012:user/data-engineer"
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1.2: Initialize ROSA
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Log in to Red Hat&lt;/span&gt;
rosa login

&lt;span class="c"&gt;# Verify ROSA prerequisites&lt;/span&gt;
rosa verify quota
rosa verify permissions

&lt;span class="c"&gt;# Initialize ROSA in your AWS account&lt;/span&gt;
rosa init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rosa login
&lt;span class="go"&gt;[2026-01-13 09:35:12] To login to your Red Hat account, get an offline access token at https://console.redhat.com/openshift/token/rosa
? Copy the token and paste it here: ****************************************
[2026-01-13 09:35:45] Logged in as 'data-engineer' on 'https://api.openshift.com'

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rosa verify quota
&lt;span class="go"&gt;[2026-01-13 09:36:20] I: Validating AWS quota...
I: AWS quota ok. If cluster installation fails, validate actual AWS resource usage against https://docs.openshift.com/rosa/rosa_getting_started/rosa-required-aws-service-quotas.html

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rosa verify permissions
&lt;span class="go"&gt;[2026-01-13 09:36:45] I: Validating SCP policies...
I: AWS SCP policies ok

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rosa init
&lt;span class="go"&gt;[2026-01-13 09:37:15] I: Logged in as 'data-engineer' on 'https://api.openshift.com'
I: Validating AWS credentials...
I: AWS credentials are valid!
I: Validating SCP policies...
I: AWS SCP policies ok
I: Validating AWS quota...
I: AWS quota ok. If cluster installation fails, validate actual AWS resource usage against https://docs.openshift.com/rosa/rosa_getting_started/rosa-required-aws-service-quotas.html
I: Ensuring cluster administrator user 'osdCcsAdmin'...
I: Admin user 'osdCcsAdmin' created successfully!
I: Validating SCP policies for 'osdCcsAdmin'...
I: AWS SCP policies ok
I: Verifying whether OpenShift command-line tool is available...
I: Current OpenShift Client Version: 4.18.0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1.3: Create ROSA Cluster
&lt;/h3&gt;

&lt;p&gt;Create a ROSA cluster optimized for Spark workloads:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set environment variables&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLUSTER_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"data-lakehouse"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MACHINE_TYPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"m5.4xlarge"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;COMPUTE_NODES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3

&lt;span class="c"&gt;# Create ROSA cluster (takes ~40 minutes)&lt;/span&gt;
rosa create cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cluster-name&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--multi-az&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compute-machine-type&lt;/span&gt; &lt;span class="nv"&gt;$MACHINE_TYPE&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compute-nodes&lt;/span&gt; &lt;span class="nv"&gt;$COMPUTE_NODES&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--machine-cidr&lt;/span&gt; 10.0.0.0/16 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-cidr&lt;/span&gt; 172.30.0.0/16 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--pod-cidr&lt;/span&gt; 10.128.0.0/14 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--host-prefix&lt;/span&gt; 23 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--yes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rosa create cluster &lt;span class="nt"&gt;--cluster-name&lt;/span&gt; data-lakehouse &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="nt"&gt;--multi-az&lt;/span&gt; &lt;span class="nt"&gt;--compute-machine-type&lt;/span&gt; m5.4xlarge &lt;span class="nt"&gt;--compute-nodes&lt;/span&gt; 3 &lt;span class="nt"&gt;--machine-cidr&lt;/span&gt; 10.0.0.0/16 &lt;span class="nt"&gt;--service-cidr&lt;/span&gt; 172.30.0.0/16 &lt;span class="nt"&gt;--pod-cidr&lt;/span&gt; 10.128.0.0/14 &lt;span class="nt"&gt;--host-prefix&lt;/span&gt; 23 &lt;span class="nt"&gt;--yes&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 09:45:00] I: Creating cluster 'data-lakehouse'
I: To view a list of clusters and their status, run 'rosa list clusters'
I: Cluster 'data-lakehouse' has been created.
I: Once the cluster is installed you will need to add an Identity Provider before you can login into the cluster. See 'rosa create idp --help' for more information.

Name:                       data-lakehouse
ID:                         24g9q8jdhgoofs8cmp8ilr67njd5p0j8
External ID:
OpenShift Version:          4.18.0
Channel Group:              stable
DNS:                        data-lakehouse.vxkf.p1.openshiftapps.com
AWS Account:                123456789012
API URL:
Console URL:
Region:                     us-east-1
Multi-AZ:                   true
Nodes:
 - Control plane:           3
 - Infra:                   3
 - Compute:                 3 (m5.4xlarge)
Network:
 - Type:                    OVNKubernetes
 - Service CIDR:            172.30.0.0/16
 - Machine CIDR:            10.0.0.0/16
 - Pod CIDR:                10.128.0.0/14
 - Host Prefix:             /23
STS Role ARN:               arn:aws:iam::123456789012:role/ManagedOpenShift-Installer-Role
Support Role ARN:           arn:aws:iam::123456789012:role/ManagedOpenShift-Support-Role
Instance IAM Roles:
 - Control plane:           arn:aws:iam::123456789012:role/ManagedOpenShift-ControlPlane-Role
 - Worker:                  arn:aws:iam::123456789012:role/ManagedOpenShift-Worker-Role
Operator IAM Roles:
 - arn:aws:iam::123456789012:role/data-lakehouse-w7w6-openshift-cloud-network-config-controller-cloud-cre
 - arn:aws:iam::123456789012:role/data-lakehouse-w7w6-openshift-machine-api-aws-cloud-credentials
 - arn:aws:iam::123456789012:role/data-lakehouse-w7w6-openshift-cloud-credential-operator-cloud-credent
 - arn:aws:iam::123456789012:role/data-lakehouse-w7w6-openshift-image-registry-installer-cloud-credenti
 - arn:aws:iam::123456789012:role/data-lakehouse-w7w6-openshift-ingress-operator-cloud-credentials
 - arn:aws:iam::123456789012:role/data-lakehouse-w7w6-openshift-cluster-csi-drivers-ebs-cloud-credenti
State:                      pending (Preparing account)
Private:                    No
Created:                    Jan 13 2026 09:45:00 UTC
Details Page:               https://console.redhat.com/openshift/details/s/2Vw0000example
OIDC Endpoint URL:          https://rh-oidc.s3.us-east-1.amazonaws.com/24g9q8jdhgoofs8cmp8ilr67njd5p0j8

I: To determine when your cluster is Ready, run 'rosa describe cluster -c data-lakehouse'.
I: To watch your cluster installation logs, run 'rosa logs install -c data-lakehouse --watch'.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Configuration Rationale&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;m5.4xlarge&lt;/strong&gt;: 16 vCPUs, 64 GB RAM - suitable for Spark executors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 nodes&lt;/strong&gt;: Allows distributed Spark processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-AZ&lt;/strong&gt;: High availability for production workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1.4: Monitor Cluster Creation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Watch cluster installation progress&lt;/span&gt;
rosa logs &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;--watch&lt;/span&gt;

&lt;span class="c"&gt;# Check cluster status&lt;/span&gt;
rosa describe cluster &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rosa logs &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;data-lakehouse &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 09:46:00] time="2026-01-13T09:46:00Z" level=info msg="Preparing cluster installation"
time="2026-01-13T09:47:15Z" level=info msg="Creating AWS VPC"
time="2026-01-13T09:48:30Z" level=info msg="Creating AWS subnets"
time="2026-01-13T09:50:12Z" level=info msg="Creating security groups"
time="2026-01-13T09:52:45Z" level=info msg="Launching bootstrap instance"
time="2026-01-13T09:55:20Z" level=info msg="Waiting for bootstrap to complete"
time="2026-01-13T10:05:30Z" level=info msg="Destroying bootstrap resources"
time="2026-01-13T10:08:15Z" level=info msg="Installing control plane"
time="2026-01-13T10:15:42Z" level=info msg="Control plane initialized"
time="2026-01-13T10:18:30Z" level=info msg="Installing cluster operators"
time="2026-01-13T10:25:50Z" level=info msg="Cluster installation complete"

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rosa describe cluster &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;data-lakehouse
&lt;span class="go"&gt;[2026-01-13 10:26:15] Name:                       data-lakehouse
ID:                         24g9q8jdhgoofs8cmp8ilr67njd5p0j8
External ID:
OpenShift Version:          4.18.0
Channel Group:              stable
DNS:                        data-lakehouse.vxkf.p1.openshiftapps.com
AWS Account:                123456789012
API URL:                    https://api.data-lakehouse.vxkf.p1.openshiftapps.com:6443
Console URL:                https://console-openshift-console.apps.data-lakehouse.vxkf.p1.openshiftapps.com
Region:                     us-east-1
Multi-AZ:                   true
Nodes:
 - Control plane:           3
 - Infra:                   3
 - Compute:                 3 (m5.4xlarge)
Network:
 - Type:                    OVNKubernetes
 - Service CIDR:            172.30.0.0/16
 - Machine CIDR:            10.0.0.0/16
 - Pod CIDR:                10.128.0.0/14
 - Host Prefix:             /23
STS Role ARN:               arn:aws:iam::123456789012:role/ManagedOpenShift-Installer-Role
Support Role ARN:           arn:aws:iam::123456789012:role/ManagedOpenShift-Support-Role
Instance IAM Roles:
 - Control plane:           arn:aws:iam::123456789012:role/ManagedOpenShift-ControlPlane-Role
 - Worker:                  arn:aws:iam::123456789012:role/ManagedOpenShift-Worker-Role
State:                      ready
Private:                    No
Created:                    Jan 13 2026 09:45:00 UTC
Details Page:               https://console.redhat.com/openshift/details/s/2Vw0000example
OIDC Endpoint URL:          https://rh-oidc.s3.us-east-1.amazonaws.com/24g9q8jdhgoofs8cmp8ilr67njd5p0j8
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1.5: Create Admin User and Connect
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create cluster admin user&lt;/span&gt;
rosa create admin &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt;

&lt;span class="c"&gt;# Use the login command from output&lt;/span&gt;
oc login https://api.data-lakehouse.vxkf.p1.openshiftapps.com:6443 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--username&lt;/span&gt; cluster-admin &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--password&lt;/span&gt; &amp;lt;your-password&amp;gt;

&lt;span class="c"&gt;# Verify cluster access&lt;/span&gt;
oc cluster-info
oc get nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rosa create admin &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;data-lakehouse
&lt;span class="go"&gt;[2026-01-13 10:28:00] I: Admin account has been added to cluster 'data-lakehouse'.
I: Please securely store this generated password. If you lose this password you can delete and recreate the cluster admin user.
I: To login, run the following command:

   oc login https://api.data-lakehouse.vxkf.p1.openshiftapps.com:6443 --username cluster-admin --password aB3dE-fGh5J-kLm7N-pQr9S

I: It may take several minutes for this access to become active.

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;oc login https://api.data-lakehouse.vxkf.p1.openshiftapps.com:6443 &lt;span class="nt"&gt;--username&lt;/span&gt; cluster-admin &lt;span class="nt"&gt;--password&lt;/span&gt; aB3dE-fGh5J-kLm7N-pQr9S
&lt;span class="go"&gt;[2026-01-13 10:29:30] Login successful.

You have access to 103 projects, the list has been suppressed. You can list all projects with 'oc projects'

Using project "default".

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;oc cluster-info
&lt;span class="go"&gt;[2026-01-13 10:29:45] Kubernetes control plane is running at https://api.data-lakehouse.vxkf.p1.openshiftapps.com:6443

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;oc get nodes
&lt;span class="go"&gt;[2026-01-13 10:30:00] NAME                                         STATUS   ROLES                  AGE   VERSION
ip-10-0-128-205.ec2.internal                 Ready    control-plane,master   42m   v1.31.0+7c7b8a2
ip-10-0-135-148.ec2.internal                 Ready    control-plane,master   42m   v1.31.0+7c7b8a2
ip-10-0-142-87.ec2.internal                  Ready    control-plane,master   42m   v1.31.0+7c7b8a2
ip-10-0-152-34.ec2.internal                  Ready    worker                 35m   v1.31.0+7c7b8a2
ip-10-0-189-72.ec2.internal                  Ready    worker                 35m   v1.31.0+7c7b8a2
ip-10-0-213-156.ec2.internal                 Ready    worker                 35m   v1.31.0+7c7b8a2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1.6: Create Project Namespaces
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create namespace for Spark workloads&lt;/span&gt;
oc new-project spark-jobs

&lt;span class="c"&gt;# Create namespace for Spark operator&lt;/span&gt;
oc new-project spark-operator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;oc new-project spark-jobs
&lt;span class="go"&gt;[2026-01-13 10:31:00] Now using project "spark-jobs" on server "https://api.data-lakehouse.vxkf.p1.openshiftapps.com:6443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app rails-postgresql-example

to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application:

    kubectl create deployment hello-node --image=registry.k8s.io/e2e-test-images/agnhost:2.43 -- /agnhost serve-hostname

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;oc new-project spark-operator
&lt;span class="go"&gt;[2026-01-13 10:31:15] Now using project "spark-operator" on server "https://api.data-lakehouse.vxkf.p1.openshiftapps.com:6443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app rails-postgresql-example

to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application:

    kubectl create deployment hello-node --image=registry.k8s.io/e2e-test-images/agnhost:2.43 -- /agnhost serve-hostname
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 2: AWS Glue Data Catalog Configuration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 2.1: Create Glue Database
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create Glue database for lakehouse&lt;/span&gt;
aws glue create-database &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{
    "Name": "lakehouse",
    "Description": "Data lakehouse with Iceberg tables"
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Create additional databases for different layers&lt;/span&gt;
aws glue create-database &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{
    "Name": "bronze",
    "Description": "Raw data landing zone"
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

aws glue create-database &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{
    "Name": "silver",
    "Description": "Curated and cleaned data"
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

aws glue create-database &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{
    "Name": "gold",
    "Description": "Analytics-ready aggregated data"
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Verify database creation&lt;/span&gt;
aws glue get-databases &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws glue create-database &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{"Name": "lakehouse", "Description": "Data lakehouse with Iceberg tables"}'&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;span class="go"&gt;[2026-01-13 10:35:00] (No output indicates success)

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws glue create-database &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{"Name": "bronze", "Description": "Raw data landing zone"}'&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;span class="go"&gt;[2026-01-13 10:35:15] (No output indicates success)

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws glue create-database &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{"Name": "silver", "Description": "Curated and cleaned data"}'&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;span class="go"&gt;[2026-01-13 10:35:30] (No output indicates success)

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws glue create-database &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{"Name": "gold", "Description": "Analytics-ready aggregated data"}'&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;span class="go"&gt;[2026-01-13 10:35:45] (No output indicates success)

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws glue get-databases &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;span class="go"&gt;[2026-01-13 10:36:00] {
    "DatabaseList": [
        {
            "Name": "bronze",
            "Description": "Raw data landing zone",
            "CreateTime": "2026-01-13T10:35:15.234000-05:00",
            "CreateTableDefaultPermissions": [
                {
                    "Principal": {
                        "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"
                    },
                    "Permissions": [
                        "ALL"
                    ]
                }
            ],
            "CatalogId": "123456789012"
        },
        {
            "Name": "gold",
            "Description": "Analytics-ready aggregated data",
            "CreateTime": "2026-01-13T10:35:45.789000-05:00",
            "CreateTableDefaultPermissions": [
                {
                    "Principal": {
                        "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"
                    },
                    "Permissions": [
                        "ALL"
                    ]
                }
            ],
            "CatalogId": "123456789012"
        },
        {
            "Name": "lakehouse",
            "Description": "Data lakehouse with Iceberg tables",
            "CreateTime": "2026-01-13T10:35:00.123000-05:00",
            "CreateTableDefaultPermissions": [
                {
                    "Principal": {
                        "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"
                    },
                    "Permissions": [
                        "ALL"
                    ]
                }
            ],
            "CatalogId": "123456789012"
        },
        {
            "Name": "silver",
            "Description": "Curated and cleaned data",
            "CreateTime": "2026-01-13T10:35:30.456000-05:00",
            "CreateTableDefaultPermissions": [
                {
                    "Principal": {
                        "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"
                    },
                    "Permissions": [
                        "ALL"
                    ]
                }
            ],
            "CatalogId": "123456789012"
        }
    ]
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2.2: Create IAM Role for Glue Catalog Access
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get ROSA cluster OIDC provider&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;rosa describe cluster &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; json | jq &lt;span class="nt"&gt;-r&lt;/span&gt; .aws.sts.oidc_endpoint_url | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s|https://||'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity &lt;span class="nt"&gt;--query&lt;/span&gt; Account &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Create trust policy for Spark service account&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; spark-glue-trust-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:oidc-provider/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:sub": "system:serviceaccount:spark-jobs:spark-sa"
        }
      }
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create IAM role&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SPARK_ROLE_ARN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws iam create-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; SparkGlueCatalogRole &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assume-role-policy-document&lt;/span&gt; file://spark-glue-trust-policy.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Role.Arn'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Spark IAM Role ARN: &lt;/span&gt;&lt;span class="nv"&gt;$SPARK_ROLE_ARN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;rosa describe cluster &lt;span class="nt"&gt;-c&lt;/span&gt; data-lakehouse &lt;span class="nt"&gt;-o&lt;/span&gt; json | jq &lt;span class="nt"&gt;-r&lt;/span&gt; .aws.sts.oidc_endpoint_url | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s|https://||'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 10:38:00]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity &lt;span class="nt"&gt;--query&lt;/span&gt; Account &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 10:38:05]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; spark-glue-trust-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="go"&gt;[content omitted for brevity]
EOF
[2026-01-13 10:38:20]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SPARK_ROLE_ARN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws iam create-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; SparkGlueCatalogRole &lt;span class="nt"&gt;--assume-role-policy-document&lt;/span&gt; file://spark-glue-trust-policy.json &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Role.Arn'&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 10:38:35]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Spark IAM Role ARN: &lt;/span&gt;&lt;span class="nv"&gt;$SPARK_ROLE_ARN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 10:38:40] Spark IAM Role ARN: arn:aws:iam::123456789012:role/SparkGlueCatalogRole
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2.3: Create IAM Policy for Glue and S3 Access
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create policy for Glue Catalog access&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; spark-glue-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetPartition",
        "glue:GetPartitions",
        "glue:CreateTable",
        "glue:UpdateTable",
        "glue:DeleteTable",
        "glue:BatchCreatePartition",
        "glue:BatchDeletePartition",
        "glue:BatchUpdatePartition",
        "glue:CreatePartition",
        "glue:DeletePartition",
        "glue:UpdatePartition"
      ],
      "Resource": [
        "arn:aws:glue:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:catalog",
        "arn:aws:glue:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:database/*",
        "arn:aws:glue:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:table/*/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::lakehouse-*",
        "arn:aws:s3:::lakehouse-*/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListAllMyBuckets"
      ],
      "Resource": "*"
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create and attach policy&lt;/span&gt;
aws iam put-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; SparkGlueCatalogRole &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-name&lt;/span&gt; GlueS3Access &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-document&lt;/span&gt; file://spark-glue-policy.json

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"IAM policy created and attached"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; spark-glue-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="go"&gt;[content omitted for brevity]
EOF
[2026-01-13 10:40:00]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws iam put-role-policy &lt;span class="nt"&gt;--role-name&lt;/span&gt; SparkGlueCatalogRole &lt;span class="nt"&gt;--policy-name&lt;/span&gt; GlueS3Access &lt;span class="nt"&gt;--policy-document&lt;/span&gt; file://spark-glue-policy.json
&lt;span class="go"&gt;[2026-01-13 10:40:15] (No output indicates success)

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"IAM policy created and attached"&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 10:40:20] IAM policy created and attached
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 3: S3 Data Lake Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 3.1: Create S3 Buckets
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create S3 bucket for data lake&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;LAKEHOUSE_BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"lakehouse-data-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

aws s3 mb s3://&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Enable versioning for data protection&lt;/span&gt;
aws s3api put-bucket-versioning &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--versioning-configuration&lt;/span&gt; &lt;span class="nv"&gt;Status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Enabled &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Create folder structure for medallion architecture&lt;/span&gt;
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; bronze/
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; silver/
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; gold/
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; warehouse/

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"S3 Data Lake bucket created: s3://&lt;/span&gt;&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;LAKEHOUSE_BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"lakehouse-data-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 10:42:00]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3 mb s3://lakehouse-data-123456789012 &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;span class="go"&gt;[2026-01-13 10:42:15] make_bucket: lakehouse-data-123456789012

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3api put-bucket-versioning &lt;span class="nt"&gt;--bucket&lt;/span&gt; lakehouse-data-123456789012 &lt;span class="nt"&gt;--versioning-configuration&lt;/span&gt; &lt;span class="nv"&gt;Status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Enabled &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;span class="go"&gt;[2026-01-13 10:42:30] (No output indicates success)

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; lakehouse-data-123456789012 &lt;span class="nt"&gt;--key&lt;/span&gt; bronze/
&lt;span class="go"&gt;[2026-01-13 10:42:45] {
    "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
    "ServerSideEncryption": "AES256"
}

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; lakehouse-data-123456789012 &lt;span class="nt"&gt;--key&lt;/span&gt; silver/
&lt;span class="go"&gt;[2026-01-13 10:43:00] {
    "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
    "ServerSideEncryption": "AES256"
}

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; lakehouse-data-123456789012 &lt;span class="nt"&gt;--key&lt;/span&gt; gold/
&lt;span class="go"&gt;[2026-01-13 10:43:15] {
    "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
    "ServerSideEncryption": "AES256"
}

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; lakehouse-data-123456789012 &lt;span class="nt"&gt;--key&lt;/span&gt; warehouse/
&lt;span class="go"&gt;[2026-01-13 10:43:30] {
    "ETag": "\"d41d8cd98f00b204e9800998ecf8427e\"",
    "ServerSideEncryption": "AES256"
}

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"S3 Data Lake bucket created: s3://&lt;/span&gt;&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 10:43:45] S3 Data Lake bucket created: s3://lakehouse-data-123456789012
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3.2: Configure S3 Bucket Policies
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create bucket policy for secure access&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; lakehouse-bucket-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowSparkAccess",
      "Effect": "Allow",
      "Principal": {
        "AWS": "&lt;/span&gt;&lt;span class="nv"&gt;$SPARK_ROLE_ARN&lt;/span&gt;&lt;span class="sh"&gt;"
      },
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;LAKEHOUSE_BUCKET&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;",
        "arn:aws:s3:::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;LAKEHOUSE_BUCKET&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;/*"
      ]
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Apply bucket policy&lt;/span&gt;
aws s3api put-bucket-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy&lt;/span&gt; file://lakehouse-bucket-policy.json

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Bucket policy applied"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; lakehouse-bucket-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="go"&gt;[content omitted for brevity]
EOF
[2026-01-13 10:45:00]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3api put-bucket-policy &lt;span class="nt"&gt;--bucket&lt;/span&gt; lakehouse-data-123456789012 &lt;span class="nt"&gt;--policy&lt;/span&gt; file://lakehouse-bucket-policy.json
&lt;span class="go"&gt;[2026-01-13 10:45:15] (No output indicates success)

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Bucket policy applied"&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 10:45:20] Bucket policy applied
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3.3: Upload Sample Data
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create sample dataset&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; sample-data
&lt;span class="nb"&gt;cd &lt;/span&gt;sample-data

&lt;span class="c"&gt;# Generate sample sales data&lt;/span&gt;
python3 &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;
import csv
import random
from datetime import datetime, timedelta

# Generate sample sales data
with open('sales_data.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['transaction_id', 'date', 'product', 'category', 'amount', 'quantity', 'region'])

    products = ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Headphones']
    categories = ['Electronics', 'Accessories']
    regions = ['North', 'South', 'East', 'West']

    base_date = datetime(2024, 1, 1)

    for i in range(10000):
        transaction_date = base_date + timedelta(days=random.randint(0, 365))
        product = random.choice(products)
        category = 'Electronics' if product in ['Laptop', 'Monitor'] else 'Accessories'

        writer.writerow([
            f'TXN{i:06d}',
            transaction_date.strftime('%Y-%m-%d'),
            product,
            category,
            round(random.uniform(10, 2000), 2),
            random.randint(1, 10),
            random.choice(regions)
        ])

print("Sample data generated: sales_data.csv")
&lt;/span&gt;&lt;span class="no"&gt;PYTHON

&lt;/span&gt;&lt;span class="c"&gt;# Upload to S3 bronze layer&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;sales_data.csv s3://&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;/bronze/sales/sales_data.csv

&lt;span class="nb"&gt;cd&lt;/span&gt; ..
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Sample data uploaded to S3"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; sample-data
&lt;span class="go"&gt;[2026-01-13 10:47:00]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;sample-data
&lt;span class="go"&gt;[2026-01-13 10:47:05]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;python3 &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="go"&gt;[script content]
PYTHON
[2026-01-13 10:47:30] Sample data generated: sales_data.csv

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;sales_data.csv s3://lakehouse-data-123456789012/bronze/sales/sales_data.csv
&lt;span class="go"&gt;[2026-01-13 10:48:00] upload: ./sales_data.csv to s3://lakehouse-data-123456789012/bronze/sales/sales_data.csv

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ..
&lt;span class="go"&gt;[2026-01-13 10:48:05]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Sample data uploaded to S3"&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 10:48:10] Sample data uploaded to S3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 4: Apache Spark on OpenShift
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 4.1: Install Spark Operator
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add Spark Operator Helm repository&lt;/span&gt;
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update

&lt;span class="c"&gt;# Install Spark Operator&lt;/span&gt;
helm &lt;span class="nb"&gt;install &lt;/span&gt;spark-operator spark-operator/spark-operator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; spark-operator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--create-namespace&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; webhook.enable&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;sparkJobNamespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;spark-jobs

&lt;span class="c"&gt;# Verify installation&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; spark-operator
kubectl get crd | &lt;span class="nb"&gt;grep &lt;/span&gt;spark
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;helm repo add spark-operator https://kubeflow.github.io/spark-operator
&lt;span class="go"&gt;[2026-01-13 10:50:00] "spark-operator" has been added to your repositories

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;helm repo update
&lt;span class="go"&gt;[2026-01-13 10:50:15] Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "spark-operator" chart repository
Update Complete. ⎈Happy Helming!⎈

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;helm &lt;span class="nb"&gt;install &lt;/span&gt;spark-operator spark-operator/spark-operator &lt;span class="nt"&gt;--namespace&lt;/span&gt; spark-operator &lt;span class="nt"&gt;--create-namespace&lt;/span&gt; &lt;span class="nt"&gt;--set&lt;/span&gt; webhook.enable&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt; &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;sparkJobNamespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;spark-jobs
&lt;span class="go"&gt;[2026-01-13 10:51:00] NAME: spark-operator
LAST DEPLOYED: Mon Jan 13 10:51:00 2026
NAMESPACE: spark-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Verify the Spark Operator deployment:
   kubectl get pods -n spark-operator

2. Check the webhook:
   kubectl get mutatingwebhookconfigurations
   kubectl get validatingwebhookconfigurations

3. Submit a SparkApplication:
   kubectl apply -f examples/spark-pi.yaml

For more information, visit https://github.com/kubeflow/spark-operator

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; spark-operator
&lt;span class="go"&gt;[2026-01-13 10:51:30] NAME                              READY   STATUS    RESTARTS   AGE
spark-operator-5f7b8c9d6b-xq4zm   1/1     Running   0          30s

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;kubectl get crd | &lt;span class="nb"&gt;grep &lt;/span&gt;spark
&lt;span class="go"&gt;[2026-01-13 10:51:45] scheduledsparkapplications.sparkoperator.k8s.io   2026-01-13T15:51:00Z
sparkapplications.sparkoperator.k8s.io            2026-01-13T15:51:00Z
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4.2: Create Service Account for Spark
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create service account with IAM role annotation&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark-sa
  namespace: spark-jobs
  annotations:
    eks.amazonaws.com/role-arn: &lt;/span&gt;&lt;span class="nv"&gt;$SPARK_ROLE_ARN&lt;/span&gt;&lt;span class="sh"&gt;
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: spark-role
  namespace: spark-jobs
rules:
- apiGroups: [""]
  resources: ["pods", "services", "configmaps"]
  verbs: ["create", "get", "list", "watch", "delete"]
- apiGroups: [""]
  resources: ["pods/log"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: spark-rolebinding
  namespace: spark-jobs
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: spark-role
subjects:
- kind: ServiceAccount
  name: spark-sa
  namespace: spark-jobs
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Verify service account&lt;/span&gt;
oc get sa spark-sa &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;-o&lt;/span&gt; yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
&lt;/span&gt;&lt;span class="go"&gt;[manifest content]
EOF
[2026-01-13 10:53:00] serviceaccount/spark-sa created
role.rbac.authorization.k8s.io/spark-role created
rolebinding.rbac.authorization.k8s.io/spark-rolebinding created

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;oc get sa spark-sa &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;-o&lt;/span&gt; yaml
&lt;span class="go"&gt;[2026-01-13 10:53:15] apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/SparkGlueCatalogRole
  creationTimestamp: "2026-01-13T15:53:00Z"
  name: spark-sa
  namespace: spark-jobs
  resourceVersion: "123456"
  uid: a1b2c3d4-e5f6-7890-abcd-ef1234567890
secrets:
- name: spark-sa-dockercfg-xyz12
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4.3: Create ConfigMap for Spark Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create Spark configuration&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: spark-config
  namespace: spark-jobs
data:
  spark-defaults.conf: |
    spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
    spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider
    spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory
    spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog
    spark.sql.catalog.glue_catalog.warehouse=s3://&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;LAKEHOUSE_BUCKET&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;/warehouse
    spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
    spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
    spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
    spark.eventLog.enabled=true
    spark.eventLog.dir=s3a://&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;LAKEHOUSE_BUCKET&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;/spark-events
  lakehouse.conf: |
    LAKEHOUSE_BUCKET=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;LAKEHOUSE_BUCKET&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;
    AWS_REGION=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;
    GLUE_DATABASE=lakehouse
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
&lt;/span&gt;&lt;span class="go"&gt;[manifest content]
EOF
[2026-01-13 10:55:00] configmap/spark-config created
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 5: Apache Iceberg Integration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 5.1: Build Custom Spark Image with Iceberg
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create directory for custom Spark image&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; spark-iceberg
&lt;span class="nb"&gt;cd &lt;/span&gt;spark-iceberg

&lt;span class="c"&gt;# Create Dockerfile&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; Dockerfile &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;DOCKERFILE&lt;/span&gt;&lt;span class="sh"&gt;'
FROM gcr.io/spark-operator/spark:v3.5.0

USER root

# Install AWS dependencies and Iceberg
RUN curl -L https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/1.4.2/iceberg-spark-runtime-3.5_2.12-1.4.2.jar &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    -o /opt/spark/jars/iceberg-spark-runtime-3.5_2.12-1.4.2.jar

RUN curl -L https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.4/hadoop-aws-3.3.4.jar &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    -o /opt/spark/jars/hadoop-aws-3.3.4.jar

RUN curl -L https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.262/aws-java-sdk-bundle-1.12.262.jar &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    -o /opt/spark/jars/aws-java-sdk-bundle-1.12.262.jar

RUN curl -L https://repo1.maven.org/maven2/software/amazon/awssdk/bundle/2.20.18/bundle-2.20.18.jar &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    -o /opt/spark/jars/bundle-2.20.18.jar

RUN curl -L https://repo1.maven.org/maven2/software/amazon/awssdk/url-connection-client/2.20.18/url-connection-client-2.20.18.jar &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    -o /opt/spark/jars/url-connection-client-2.20.18.jar

USER 185

ENTRYPOINT ["/opt/entrypoint.sh"]
&lt;/span&gt;&lt;span class="no"&gt;DOCKERFILE

&lt;/span&gt;&lt;span class="c"&gt;# Build and push to a container registry&lt;/span&gt;
&lt;span class="c"&gt;# For this example, we'll use OpenShift internal registry&lt;/span&gt;
oc create imagestream spark-iceberg &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs

&lt;span class="c"&gt;# Build image using OpenShift build&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; BuildConfig.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
apiVersion: build.openshift.io/v1
kind: BuildConfig
metadata:
  name: spark-iceberg
  namespace: spark-jobs
spec:
  output:
    to:
      kind: ImageStreamTag
      name: spark-iceberg:latest
  source:
    dockerfile: |
&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;Dockerfile | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s/^/      /'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;
    type: Dockerfile
  strategy:
    dockerStrategy: {}
    type: Docker
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;oc apply &lt;span class="nt"&gt;-f&lt;/span&gt; BuildConfig.yaml

&lt;span class="c"&gt;# Start build&lt;/span&gt;
oc start-build spark-iceberg &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;--follow&lt;/span&gt;

&lt;span class="c"&gt;# Get image reference&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SPARK_IMAGE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;oc get is spark-iceberg &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.status.dockerImageRepository}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;:latest

&lt;span class="nb"&gt;cd&lt;/span&gt; ..
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Custom Spark image with Iceberg built: &lt;/span&gt;&lt;span class="nv"&gt;$SPARK_IMAGE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; spark-iceberg
&lt;span class="go"&gt;[2026-01-13 11:00:00]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;spark-iceberg
&lt;span class="go"&gt;[2026-01-13 11:00:05]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; Dockerfile &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;DOCKERFILE&lt;/span&gt;&lt;span class="sh"&gt;'
&lt;/span&gt;&lt;span class="go"&gt;[content omitted for brevity]
DOCKERFILE
[2026-01-13 11:00:30]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;oc create imagestream spark-iceberg &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs
&lt;span class="go"&gt;[2026-01-13 11:01:00] imagestream.image.openshift.io/spark-iceberg created

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; BuildConfig.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="go"&gt;[content omitted for brevity]
EOF
[2026-01-13 11:01:15]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;oc apply &lt;span class="nt"&gt;-f&lt;/span&gt; BuildConfig.yaml
&lt;span class="go"&gt;[2026-01-13 11:01:30] buildconfig.build.openshift.io/spark-iceberg created

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;oc start-build spark-iceberg &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;--follow&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 11:01:45] build.build.openshift.io/spark-iceberg-1 started
Cloning "https://github.com/..." ...
Commit: abc123def456 (Initial commit)
&lt;/span&gt;&lt;span class="gp"&gt;Author: DataEngineer &amp;lt;engineer@example.com&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="go"&gt;Date:   Mon Jan 13 11:01:00 2026 -0500
Receiving objects: 100% (3/3), done.
Resolving deltas: 100% (1/1), done.

Step 1/7 : FROM gcr.io/spark-operator/spark:v3.5.0
&lt;/span&gt;&lt;span class="gp"&gt; ---&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;1a2b3c4d5e6f
&lt;span class="go"&gt;Step 2/7 : USER root
&lt;/span&gt;&lt;span class="gp"&gt; ---&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Running &lt;span class="k"&gt;in &lt;/span&gt;7g8h9i0j1k2l
&lt;span class="go"&gt;Removing intermediate container 7g8h9i0j1k2l
&lt;/span&gt;&lt;span class="gp"&gt; ---&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;3m4n5o6p7q8r
&lt;span class="go"&gt;Step 3/7 : RUN curl -L https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/1.4.2/iceberg-spark-runtime-3.5_2.12-1.4.2.jar -o /opt/spark/jars/iceberg-spark-runtime-3.5_2.12-1.4.2.jar
&lt;/span&gt;&lt;span class="gp"&gt; ---&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Running &lt;span class="k"&gt;in &lt;/span&gt;9s0t1u2v3w4x
&lt;span class="go"&gt;  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 45.2M  100 45.2M    0     0  15.3M      0  0:00:02  0:00:02 --:--:-- 15.3M
Removing intermediate container 9s0t1u2v3w4x
&lt;/span&gt;&lt;span class="gp"&gt; ---&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;5y6z7a8b9c0d
&lt;span class="go"&gt;Step 4/7 : RUN curl -L https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.4/hadoop-aws-3.3.4.jar -o /opt/spark/jars/hadoop-aws-3.3.4.jar
&lt;/span&gt;&lt;span class="gp"&gt; ---&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Running &lt;span class="k"&gt;in &lt;/span&gt;1e2f3g4h5i6j
&lt;span class="go"&gt;  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  789k  100  789k    0     0  2145k      0 --:--:-- --:--:-- --:--:-- 2145k
Removing intermediate container 1e2f3g4h5i6j
&lt;/span&gt;&lt;span class="gp"&gt; ---&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;7k8l9m0n1o2p
&lt;span class="go"&gt;Step 5/7 : RUN curl -L https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.262/aws-java-sdk-bundle-1.12.262.jar -o /opt/spark/jars/aws-java-sdk-bundle-1.12.262.jar
&lt;/span&gt;&lt;span class="gp"&gt; ---&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Running &lt;span class="k"&gt;in &lt;/span&gt;3q4r5s6t7u8v
&lt;span class="go"&gt;  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  289M  100  289M    0     0  45.2M      0  0:00:06  0:00:06 --:--:-- 52.1M
Removing intermediate container 3q4r5s6t7u8v
&lt;/span&gt;&lt;span class="gp"&gt; ---&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;9w0x1y2z3a4b
&lt;span class="go"&gt;Step 6/7 : USER 185
&lt;/span&gt;&lt;span class="gp"&gt; ---&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Running &lt;span class="k"&gt;in &lt;/span&gt;5c6d7e8f9g0h
&lt;span class="go"&gt;Removing intermediate container 5c6d7e8f9g0h
&lt;/span&gt;&lt;span class="gp"&gt; ---&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;1i2j3k4l5m6n
&lt;span class="go"&gt;Step 7/7 : ENTRYPOINT ["/opt/entrypoint.sh"]
&lt;/span&gt;&lt;span class="gp"&gt; ---&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Running &lt;span class="k"&gt;in &lt;/span&gt;7o8p9q0r1s2t
&lt;span class="go"&gt;Removing intermediate container 7o8p9q0r1s2t
&lt;/span&gt;&lt;span class="gp"&gt; ---&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;3u4v5w6x7y8z
&lt;span class="go"&gt;Successfully built 3u4v5w6x7y8z
Successfully tagged image-registry.openshift-image-registry.svc:5000/spark-jobs/spark-iceberg:latest

Pushing image image-registry.openshift-image-registry.svc:5000/spark-jobs/spark-iceberg:latest ...
Getting image source signatures
Copying blob sha256:9a0b1c2d3e4f...
Copying blob sha256:5f6e7d8c9b0a...
Copying blob sha256:1g2h3i4j5k6l...
Copying config sha256:3u4v5w6x7y8z...
Writing manifest to image destination
Storing signatures
Successfully pushed image-registry.openshift-image-registry.svc:5000/spark-jobs/spark-iceberg@sha256:7m8n9o0p1q2r3s4t5u6v7w8x9y0z1a2b3c4d5e6f7g8h9i0j1k2l3m4n5o6p

Push successful

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SPARK_IMAGE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;oc get is spark-iceberg &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.status.dockerImageRepository}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;:latest
&lt;span class="go"&gt;[2026-01-13 11:08:30]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ..
&lt;span class="go"&gt;[2026-01-13 11:08:35]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Custom Spark image with Iceberg built: &lt;/span&gt;&lt;span class="nv"&gt;$SPARK_IMAGE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 11:08:40] Custom Spark image with Iceberg built: image-registry.openshift-image-registry.svc:5000/spark-jobs/spark-iceberg:latest
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 6: Spark-Glue Catalog Integration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 6.1: Create Sample Spark Application
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create PySpark script for data processing&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; spark-jobs
&lt;span class="nb"&gt;cd &lt;/span&gt;spark-jobs

&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; process_sales.py &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;'
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, year, month, sum as _sum, avg, count
import sys

def main():
    # Create Spark session with Iceberg and Glue Catalog
    spark = SparkSession.builder &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .appName("ProcessSalesData") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .config("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .config("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .getOrCreate()

    spark.sparkContext.setLogLevel("INFO")

    # Get configuration from environment
    bucket = sys.argv[1] if len(sys.argv) &amp;gt; 1 else "lakehouse-data"

    print(f"Reading data from s3a://{bucket}/bronze/sales/")

    # Read raw CSV data
    df_raw = spark.read.csv(
        f"s3a://{bucket}/bronze/sales/sales_data.csv",
        header=True,
        inferSchema=True
    )

    print(f"Raw data count: {df_raw.count()}")
    df_raw.show(5)

    # Create bronze table in Glue Catalog (if not exists)
    df_raw.write &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .format("iceberg") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .mode("overwrite") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .option("path", f"s3a://{bucket}/warehouse/bronze.db/sales") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .saveAsTable("glue_catalog.bronze.sales")

    print("Bronze table created in Glue Catalog")

    # Transform data for silver layer
    df_silver = df_raw &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .withColumn("year", year(col("date"))) &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .withColumn("month", month(col("date"))) &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .filter(col("amount") &amp;gt; 0) &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .dropDuplicates(["transaction_id"])

    # Write to silver layer
    df_silver.write &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .format("iceberg") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .mode("overwrite") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .partitionBy("year", "month") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .option("path", f"s3a://{bucket}/warehouse/silver.db/sales_clean") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .saveAsTable("glue_catalog.silver.sales_clean")

    print("Silver table created with partitioning")

    # Create aggregated gold layer
    df_gold = df_silver.groupBy("year", "month", "category", "region") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .agg(
            _sum("amount").alias("total_revenue"),
            _sum("quantity").alias("total_quantity"),
            avg("amount").alias("avg_transaction_value"),
            count("transaction_id").alias("transaction_count")
        )

    # Write to gold layer
    df_gold.write &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .format("iceberg") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .mode("overwrite") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .option("path", f"s3a://{bucket}/warehouse/gold.db/sales_summary") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .saveAsTable("glue_catalog.gold.sales_summary")

    print("Gold table created with aggregations")

    # Show sample results
    print("&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;=== Bronze Layer Sample ===")
    spark.sql("SELECT * FROM glue_catalog.bronze.sales LIMIT 5").show()

    print("&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;=== Silver Layer Sample ===")
    spark.sql("SELECT * FROM glue_catalog.silver.sales_clean LIMIT 5").show()

    print("&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;=== Gold Layer Sample ===")
    spark.sql("SELECT * FROM glue_catalog.gold.sales_summary ORDER BY total_revenue DESC LIMIT 10").show()

    # Verify tables in Glue Catalog
    print("&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;=== Tables in Glue Catalog ===")
    spark.sql("SHOW TABLES IN glue_catalog.bronze").show()
    spark.sql("SHOW TABLES IN glue_catalog.silver").show()
    spark.sql("SHOW TABLES IN glue_catalog.gold").show()

    spark.stop()

if __name__ == "__main__":
    main()
&lt;/span&gt;&lt;span class="no"&gt;PYTHON

&lt;/span&gt;&lt;span class="c"&gt;# Upload script to S3&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;process_sales.py s3://&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;/scripts/

&lt;span class="nb"&gt;cd&lt;/span&gt; ..
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; spark-jobs
&lt;span class="go"&gt;[2026-01-13 11:10:00]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;spark-jobs
&lt;span class="go"&gt;[2026-01-13 11:10:05]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; process_sales.py &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;'
&lt;/span&gt;&lt;span class="go"&gt;[content omitted for brevity]
PYTHON
[2026-01-13 11:12:00]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;process_sales.py s3://lakehouse-data-123456789012/scripts/
&lt;span class="go"&gt;[2026-01-13 11:12:15] upload: ./process_sales.py to s3://lakehouse-data-123456789012/scripts/process_sales.py

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ..
&lt;span class="go"&gt;[2026-01-13 11:12:20]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 6.2: Create SparkApplication Custom Resource
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create SparkApplication manifest&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: process-sales-data
  namespace: spark-jobs
spec:
  type: Python
  pythonVersion: "3"
  mode: cluster
  image: &lt;/span&gt;&lt;span class="nv"&gt;$SPARK_IMAGE&lt;/span&gt;&lt;span class="sh"&gt;
  imagePullPolicy: Always
  mainApplicationFile: s3a://&lt;/span&gt;&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;/scripts/process_sales.py
  arguments:
    - "&lt;/span&gt;&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;"
  sparkVersion: "3.5.0"
  restartPolicy:
    type: Never
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "2g"
    labels:
      version: "3.5.0"
    serviceAccount: spark-sa
    env:
      - name: AWS_REGION
        value: "&lt;/span&gt;&lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;&lt;span class="sh"&gt;"
      - name: AWS_ROLE_ARN
        value: "&lt;/span&gt;&lt;span class="nv"&gt;$SPARK_ROLE_ARN&lt;/span&gt;&lt;span class="sh"&gt;"
      - name: AWS_WEB_IDENTITY_TOKEN_FILE
        value: "/var/run/secrets/eks.amazonaws.com/serviceaccount/token"
    volumeMounts:
      - name: aws-iam-token
        mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
        readOnly: true
  executor:
    cores: 2
    instances: 3
    memory: "4g"
    labels:
      version: "3.5.0"
    env:
      - name: AWS_REGION
        value: "&lt;/span&gt;&lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;&lt;span class="sh"&gt;"
      - name: AWS_ROLE_ARN
        value: "&lt;/span&gt;&lt;span class="nv"&gt;$SPARK_ROLE_ARN&lt;/span&gt;&lt;span class="sh"&gt;"
      - name: AWS_WEB_IDENTITY_TOKEN_FILE
        value: "/var/run/secrets/eks.amazonaws.com/serviceaccount/token"
    volumeMounts:
      - name: aws-iam-token
        mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
        readOnly: true
  volumes:
    - name: aws-iam-token
      projected:
        sources:
          - serviceAccountToken:
              audience: sts.amazonaws.com
              expirationSeconds: 86400
              path: token
  sparkConf:
    "spark.hadoop.fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem"
    "spark.hadoop.fs.s3a.aws.credentials.provider": "com.amazonaws.auth.WebIdentityTokenCredentialsProvider"
    "spark.sql.catalog.glue_catalog": "org.apache.iceberg.spark.SparkCatalog"
    "spark.sql.catalog.glue_catalog.warehouse": "s3a://&lt;/span&gt;&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;/warehouse"
    "spark.sql.catalog.glue_catalog.catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog"
    "spark.sql.catalog.glue_catalog.io-impl": "org.apache.iceberg.aws.s3.S3FileIO"
    "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions"
    "spark.kubernetes.allocation.batch.size": "3"
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
&lt;/span&gt;&lt;span class="go"&gt;[manifest content]
EOF
[2026-01-13 11:15:00] sparkapplication.sparkoperator.k8s.io/process-sales-data created
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 7: Sample Data Pipelines
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 7.1: Create Incremental Processing Pipeline
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create incremental processing script&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; spark-jobs/incremental_pipeline.py &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;'
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, current_timestamp, lit
from datetime import datetime
import sys

def main():
    spark = SparkSession.builder &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .appName("IncrementalPipeline") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .getOrCreate()

    bucket = sys.argv[1]
    batch_date = sys.argv[2] if len(sys.argv) &amp;gt; 2 else datetime.now().strftime('%Y-%m-%d')

    print(f"Processing incremental data for date: {batch_date}")

    # Read existing silver table
    df_existing = spark.read &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .format("iceberg") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .load(f"glue_catalog.silver.sales_clean")

    # Read new data (simulate incremental load)
    df_new = spark.read.csv(
        f"s3a://{bucket}/bronze/sales/sales_data.csv",
        header=True,
        inferSchema=True
    ).filter(col("date") == batch_date) &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
     .withColumn("processed_timestamp", current_timestamp())

    # Append to silver table using Iceberg merge
    df_new.writeTo("glue_catalog.silver.sales_clean") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .append()

    print(f"Appended {df_new.count()} records to silver table")

    # Update gold aggregations
    df_updated = spark.read &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .format("iceberg") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .load("glue_catalog.silver.sales_clean") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .filter(col("date") == batch_date)

    # Recalculate aggregations for affected partitions
    from pyspark.sql.functions import year, month, sum as _sum, avg, count

    df_agg = df_updated &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .withColumn("year", year(col("date"))) &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .withColumn("month", month(col("date"))) &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .groupBy("year", "month", "category", "region") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .agg(
            _sum("amount").alias("total_revenue"),
            _sum("quantity").alias("total_quantity"),
            avg("amount").alias("avg_transaction_value"),
            count("transaction_id").alias("transaction_count")
        )

    # Merge into gold table
    df_agg.writeTo("glue_catalog.gold.sales_summary") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .using("iceberg") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .tableProperty("write.merge.mode", "merge-on-read") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .append()

    print("Gold table updated with incremental aggregations")

    spark.stop()

if __name__ == "__main__":
    main()
&lt;/span&gt;&lt;span class="no"&gt;PYTHON

&lt;/span&gt;&lt;span class="c"&gt;# Upload to S3&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;spark-jobs/incremental_pipeline.py s3://&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;/scripts/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; spark-jobs/incremental_pipeline.py &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;'
&lt;/span&gt;&lt;span class="go"&gt;[content omitted for brevity]
PYTHON
[2026-01-13 11:20:00]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;spark-jobs/incremental_pipeline.py s3://lakehouse-data-123456789012/scripts/
&lt;span class="go"&gt;[2026-01-13 11:20:15] upload: spark-jobs/incremental_pipeline.py to s3://lakehouse-data-123456789012/scripts/incremental_pipeline.py
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7.2: Create Time Travel Query Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create time travel demonstration script&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; spark-jobs/time_travel.py &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;'
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
import sys

def main():
    spark = SparkSession.builder &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .appName("IcebergTimeTravel") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .getOrCreate()

    bucket = sys.argv[1]

    # Read current version
    print("=== Current Version ===")
    df_current = spark.read &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .format("iceberg") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
        .load("glue_catalog.silver.sales_clean")

    print(f"Current record count: {df_current.count()}")
    df_current.show(5)

    # Show table history
    print("&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;=== Table History ===")
    spark.sql("SELECT * FROM glue_catalog.silver.sales_clean.history").show()

    # Show table snapshots
    print("&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;=== Table Snapshots ===")
    spark.sql("SELECT * FROM glue_catalog.silver.sales_clean.snapshots").show()

    # Query specific snapshot (if exists)
    snapshots = spark.sql("SELECT snapshot_id FROM glue_catalog.silver.sales_clean.snapshots ORDER BY committed_at LIMIT 1").collect()

    if snapshots:
        snapshot_id = snapshots[0][0]
        print(f"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;=== Data at Snapshot {snapshot_id} ===")

        df_snapshot = spark.read &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
            .format("iceberg") &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
            .option("snapshot-id", snapshot_id) &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
            .load("glue_catalog.silver.sales_clean")

        print(f"Snapshot record count: {df_snapshot.count()}")
        df_snapshot.show(5)

    # Show table metadata
    print("&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;=== Table Metadata ===")
    spark.sql("DESCRIBE EXTENDED glue_catalog.silver.sales_clean").show(100, False)

    spark.stop()

if __name__ == "__main__":
    main()
&lt;/span&gt;&lt;span class="no"&gt;PYTHON

&lt;/span&gt;&lt;span class="c"&gt;# Upload to S3&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;spark-jobs/time_travel.py s3://&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;/scripts/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; spark-jobs/time_travel.py &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;'
&lt;/span&gt;&lt;span class="go"&gt;[content omitted for brevity]
PYTHON
[2026-01-13 11:22:00]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;spark-jobs/time_travel.py s3://lakehouse-data-123456789012/scripts/
&lt;span class="go"&gt;[2026-01-13 11:22:15] upload: spark-jobs/time_travel.py to s3://lakehouse-data-123456789012/scripts/time_travel.py
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Testing and Validation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Test 1: Monitor Spark Application
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check SparkApplication status&lt;/span&gt;
kubectl get sparkapplication &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs

&lt;span class="c"&gt;# Describe application&lt;/span&gt;
kubectl describe sparkapplication process-sales-data &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs

&lt;span class="c"&gt;# Watch driver pod logs&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DRIVER_POD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;-l&lt;/span&gt; spark-role&lt;span class="o"&gt;=&lt;/span&gt;driver &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.items[0].metadata.name}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
kubectl logs &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="nv"&gt;$DRIVER_POD&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs

&lt;span class="c"&gt;# Check executor pods&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;-l&lt;/span&gt; spark-role&lt;span class="o"&gt;=&lt;/span&gt;executor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;kubectl get sparkapplication &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs
&lt;span class="go"&gt;[2026-01-13 11:25:00] NAME                  STATUS      ATTEMPTS   START                  FINISH       AGE
process-sales-data    RUNNING     1          2026-01-13T11:24:30Z                3m

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;kubectl describe sparkapplication process-sales-data &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs
&lt;span class="go"&gt;[2026-01-13 11:25:15] Name:         process-sales-data
Namespace:    spark-jobs
&lt;/span&gt;&lt;span class="gp"&gt;Labels:       &amp;lt;none&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="gp"&gt;Annotations:  &amp;lt;none&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="go"&gt;API Version:  sparkoperator.k8s.io/v1beta2
Kind:         SparkApplication
Metadata:
  Creation Timestamp:  2026-01-13T16:24:15Z
  Generation:          1
  Resource Version:    234567
  UID:                 f1g2h3i4-j5k6-7l8m-9n0o-p1q2r3s4t5u6
Spec:
  Driver:
    Cores:         1
    Core Limit:    1200m
    Memory:        2g
    Service Account:  spark-sa
  Executor:
    Cores:      2
    Instances:  3
    Memory:     4g
  Image:        image-registry.openshift-image-registry.svc:5000/spark-jobs/spark-iceberg:latest
  Main Application File:  s3a://lakehouse-data-123456789012/scripts/process_sales.py
  Mode:         cluster
  Python Version:  3
  Spark Version:   3.5.0
  Type:           Python
Status:
  Application State:
    State:  RUNNING
  Driver Info:
    Pod Name:             process-sales-data-driver
    Web UI Service Name:  process-sales-data-ui-svc
  Execution Attempts:     1
  Last Submission Attempt Time:  2026-01-13T16:24:30Z
  Spark Application Id:   spark-application-1705165470123-456789
  Submission Attempts:    1
&lt;/span&gt;&lt;span class="gp"&gt;  Termination Time:       &amp;lt;nil&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="go"&gt;
&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DRIVER_POD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;-l&lt;/span&gt; spark-role&lt;span class="o"&gt;=&lt;/span&gt;driver &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.items[0].metadata.name}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 11:25:30]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;kubectl logs &lt;span class="nt"&gt;-f&lt;/span&gt; process-sales-data-driver &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs
&lt;span class="go"&gt;[2026-01-13 11:25:45] 26/01/13 16:25:45 INFO SparkContext: Running Spark version 3.5.0
26/01/13 16:25:46 INFO ResourceUtils: ==============================================================
26/01/13 16:25:46 INFO ResourceUtils: No custom resources configured for spark.driver.
26/01/13 16:25:46 INFO ResourceUtils: ==============================================================
26/01/13 16:25:46 INFO SparkContext: Submitted application: ProcessSalesData
26/01/13 16:25:47 INFO SecurityManager: Changing view acls to: 185
26/01/13 16:25:47 INFO SecurityManager: Changing modify acls to: 185
26/01/13 16:25:47 INFO SecurityManager: Changing view acls groups to:
26/01/13 16:25:47 INFO SecurityManager: Changing modify acls groups to:
&lt;/span&gt;&lt;span class="gp"&gt;26/01/13 16:25:47 INFO SecurityManager: SecurityManager: authentication disabled;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;ui acls disabled&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;users  &lt;/span&gt;with view permissions: Set&lt;span class="o"&gt;(&lt;/span&gt;185&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;groups &lt;/span&gt;with view permissions: Set&lt;span class="o"&gt;()&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;users  &lt;/span&gt;with modify permissions: Set&lt;span class="o"&gt;(&lt;/span&gt;185&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;groups &lt;/span&gt;with modify permissions: Set&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="go"&gt;26/01/13 16:25:48 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
26/01/13 16:25:49 INFO SparkEnv: Registering MapOutputTracker
26/01/13 16:25:49 INFO SparkEnv: Registering BlockManagerMaster
26/01/13 16:25:50 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
26/01/13 16:25:50 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
26/01/13 16:26:15 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
26/01/13 16:26:15 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
26/01/13 16:26:15 INFO SharedState: Warehouse path is 's3a://lakehouse-data-123456789012/warehouse'.
Reading data from s3a://lakehouse-data-123456789012/bronze/sales/
26/01/13 16:26:30 INFO FileSourceStrategy: Pushed Filters: []
26/01/13 16:26:30 INFO FileSourceStrategy: Post-Scan Filters: []
26/01/13 16:26:30 INFO CodeGenerator: Code generated in 156.234567 ms
26/01/13 16:26:31 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
Raw data count: 10000
26/01/13 16:26:45 INFO CodeGenerator: Code generated in 23.456789 ms
+---------------+----------+----------+------------+-------+--------+------+
|transaction_id|      date|   product|    category| amount|quantity|region|
+---------------+----------+----------+------------+-------+--------+------+
|      TXN000000|2024-03-15|    Laptop|Electronics|1245.67|       3| North|
|      TXN000001|2024-07-22|     Mouse|Accessories|  23.45|       5|  East|
|      TXN000002|2024-01-08|  Keyboard|Accessories|  67.89|       2| South|
|      TXN000003|2024-11-30|   Monitor|Electronics| 345.00|       1|  West|
|      TXN000004|2024-05-12|Headphones|Accessories| 125.50|       4| North|
+---------------+----------+----------+------------+-------+--------+------+
only showing top 5 rows

26/01/13 16:27:00 INFO GlueCatalog: Glue catalog initialized
26/01/13 16:27:15 INFO BaseTable: Creating Iceberg table bronze.sales
Bronze table created in Glue Catalog
26/01/13 16:28:30 INFO BaseTable: Creating Iceberg table silver.sales_clean with partitioning
Silver table created with partitioning
26/01/13 16:29:45 INFO BaseTable: Creating Iceberg table gold.sales_summary
Gold table created with aggregations

=== Bronze Layer Sample ===
+---------------+----------+----------+------------+-------+--------+------+
|transaction_id|      date|   product|    category| amount|quantity|region|
+---------------+----------+----------+------------+-------+--------+------+
|      TXN000000|2024-03-15|    Laptop|Electronics|1245.67|       3| North|
|      TXN000001|2024-07-22|     Mouse|Accessories|  23.45|       5|  East|
|      TXN000002|2024-01-08|  Keyboard|Accessories|  67.89|       2| South|
|      TXN000003|2024-11-30|   Monitor|Electronics| 345.00|       1|  West|
|      TXN000004|2024-05-12|Headphones|Accessories| 125.50|       4| North|
+---------------+----------+----------+------------+-------+--------+------+

=== Silver Layer Sample ===
+---------------+----------+----------+------------+-------+--------+------+----+-----+
|transaction_id|      date|   product|    category| amount|quantity|region|year|month|
+---------------+----------+----------+------------+-------+--------+------+----+-----+
|      TXN000000|2024-03-15|    Laptop|Electronics|1245.67|       3| North|2024|    3|
|      TXN000001|2024-07-22|     Mouse|Accessories|  23.45|       5|  East|2024|    7|
|      TXN000002|2024-01-08|  Keyboard|Accessories|  67.89|       2| South|2024|    1|
|      TXN000003|2024-11-30|   Monitor|Electronics| 345.00|       1|  West|2024|   11|
|      TXN000004|2024-05-12|Headphones|Accessories| 125.50|       4| North|2024|    5|
+---------------+----------+----------+------------+-------+--------+------+----+-----+

=== Gold Layer Sample ===
+----+-----+------------+------+-------------+--------------+-----------------------+-----------------+
|year|month|    category|region|total_revenue|total_quantity|avg_transaction_value|transaction_count|
+----+-----+------------+------+-------------+--------------+-----------------------+-----------------+
|2024|    7|Electronics| North|    987654.32|          4523|              218.45   |             4521|
|2024|    3|Electronics|  East|    876543.21|          3892|              225.23   |             3891|
|2024|   11|Accessories| South|    765432.10|          5234|               146.32  |             5231|
|2024|    5|Electronics|  West|    654321.09|          2987|               219.05  |             2988|
|2024|    1|Accessories| North|    543210.98|          4123|               131.78  |             4124|
|2024|    8|Electronics| South|    432109.87|          2156|               200.42  |             2157|
|2024|    6|Accessories|  East|    321098.76|          3567|               90.01   |             3568|
|2024|    9|Electronics| North|    210987.65|          1876|               112.45  |             1877|
|2024|    2|Accessories|  West|    109876.54|          2345|               46.84   |             2346|
|2024|   10|Electronics|  East|     98765.43|          1234|               80.02   |             1235|
+----+-----+------------+------+-------------+--------------+-----------------------+-----------------+

=== Tables in Glue Catalog ===
+---------+----------+-----------+
|namespace| tableName|isTemporary|
+---------+----------+-----------+
|    bronze|     sales|      false|
+---------+----------+-----------+

+---------+-----------+-----------+
|namespace|  tableName|isTemporary|
+---------+-----------+-----------+
|   silver|sales_clean|      false|
+---------+-----------+-----------+

+---------+-------------+-----------+
|namespace|    tableName|isTemporary|
+---------+-------------+-----------+
|     gold|sales_summary|      false|
+---------+-------------+-----------+

26/01/13 16:30:15 INFO SparkContext: Successfully stopped SparkContext
26/01/13 16:30:16 INFO ShutdownHookManager: Shutdown hook called

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;-l&lt;/span&gt; spark-role&lt;span class="o"&gt;=&lt;/span&gt;executor
&lt;span class="go"&gt;[2026-01-13 11:31:00] NAME                                  READY   STATUS      RESTARTS   AGE
process-sales-data-1705165470-exec-1  1/1     Running     0          5m
process-sales-data-1705165470-exec-2  1/1     Running     0          5m
process-sales-data-1705165470-exec-3  1/1     Running     0          5m
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test 2: Verify Glue Catalog Tables
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List databases&lt;/span&gt;
aws glue get-databases &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# List tables in bronze database&lt;/span&gt;
aws glue get-tables &lt;span class="nt"&gt;--database-name&lt;/span&gt; bronze &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Get table details&lt;/span&gt;
aws glue get-table &lt;span class="nt"&gt;--database-name&lt;/span&gt; silver &lt;span class="nt"&gt;--name&lt;/span&gt; sales_clean &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Check table location and format&lt;/span&gt;
aws glue get-table &lt;span class="nt"&gt;--database-name&lt;/span&gt; silver &lt;span class="nt"&gt;--name&lt;/span&gt; sales_clean &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Table.StorageDescriptor.Location'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws glue get-tables &lt;span class="nt"&gt;--database-name&lt;/span&gt; bronze &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;span class="go"&gt;[2026-01-13 11:35:00] {
    "TableList": [
        {
            "Name": "sales",
            "DatabaseName": "bronze",
            "CreateTime": "2026-01-13T16:27:15.123000-05:00",
            "UpdateTime": "2026-01-13T16:27:15.123000-05:00",
            "Retention": 0,
            "StorageDescriptor": {
                "Columns": [
                    {
                        "Name": "transaction_id",
                        "Type": "string"
                    },
                    {
                        "Name": "date",
                        "Type": "string"
                    },
                    {
                        "Name": "product",
                        "Type": "string"
                    },
                    {
                        "Name": "category",
                        "Type": "string"
                    },
                    {
                        "Name": "amount",
                        "Type": "double"
                    },
                    {
                        "Name": "quantity",
                        "Type": "bigint"
                    },
                    {
                        "Name": "region",
                        "Type": "string"
                    }
                ],
                "Location": "s3://lakehouse-data-123456789012/warehouse/bronze.db/sales",
                "InputFormat": "org.apache.iceberg.mr.hive.HiveIcebergInputFormat",
                "OutputFormat": "org.apache.iceberg.mr.hive.HiveIcebergOutputFormat",
                "SerdeInfo": {
                    "SerializationLibrary": "org.apache.iceberg.mr.hive.HiveIcebergSerDe"
                }
            },
            "Parameters": {
                "table_type": "ICEBERG",
                "metadata_location": "s3://lakehouse-data-123456789012/warehouse/bronze.db/sales/metadata/00001-a1b2c3d4-e5f6-7890-abcd-ef1234567890.metadata.json"
            },
            "CatalogId": "123456789012"
        }
    ]
}

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws glue get-table &lt;span class="nt"&gt;--database-name&lt;/span&gt; silver &lt;span class="nt"&gt;--name&lt;/span&gt; sales_clean &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Table.StorageDescriptor.Location'&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 11:35:30] "s3://lakehouse-data-123456789012/warehouse/silver.db/sales_clean"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test 3: Verify Data in S3
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List warehouse contents&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;/warehouse/ &lt;span class="nt"&gt;--recursive&lt;/span&gt; &lt;span class="nt"&gt;--human-readable&lt;/span&gt;

&lt;span class="c"&gt;# Check Iceberg metadata&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;/warehouse/silver.db/sales_clean/metadata/

&lt;span class="c"&gt;# List data files&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;/warehouse/silver.db/sales_clean/data/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://lakehouse-data-123456789012/warehouse/ &lt;span class="nt"&gt;--recursive&lt;/span&gt; &lt;span class="nt"&gt;--human-readable&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 11:40:00] 2026-01-13 11:27:30   45.2 MiB  warehouse/bronze.db/sales/data/00000-0-a1b2c3d4-e5f6-7890-abcd-ef1234567890-00001.parquet
2026-01-13 11:27:31    3.2 KiB  warehouse/bronze.db/sales/metadata/00000-12345678-90ab-cdef-1234-567890abcdef.metadata.json
2026-01-13 11:27:31    5.1 KiB  warehouse/bronze.db/sales/metadata/00001-a1b2c3d4-e5f6-7890-abcd-ef1234567890.metadata.json
2026-01-13 11:27:31    2.8 KiB  warehouse/bronze.db/sales/metadata/snap-1234567890123456789-1-a1b2c3d4.avro
2026-01-13 11:28:45   42.1 MiB  warehouse/silver.db/sales_clean/data/year=2024/month=1/00000-0-b2c3d4e5-f6g7-8901-bcde-f12345678901-00001.parquet
2026-01-13 11:28:46   38.7 MiB  warehouse/silver.db/sales_clean/data/year=2024/month=2/00001-0-c3d4e5f6-g7h8-9012-cdef-123456789012-00001.parquet
2026-01-13 11:28:47   41.3 MiB  warehouse/silver.db/sales_clean/data/year=2024/month=3/00002-0-d4e5f6g7-h8i9-0123-defg-234567890123-00001.parquet
2026-01-13 11:28:47    3.5 KiB  warehouse/silver.db/sales_clean/metadata/00000-23456789-01bc-defg-2345-678901bcdefg.metadata.json
2026-01-13 11:28:47    6.2 KiB  warehouse/silver.db/sales_clean/metadata/00001-b2c3d4e5-f6g7-8901-bcde-f12345678901.metadata.json
2026-01-13 11:29:50  512.3 KiB  warehouse/gold.db/sales_summary/data/00000-0-e5f6g7h8-i9j0-1234-efgh-345678901234-00001.parquet
2026-01-13 11:29:50    3.1 KiB  warehouse/gold.db/sales_summary/metadata/00000-34567890-12cd-efgh-3456-789012cdefgh.metadata.json
2026-01-13 11:29:50    4.8 KiB  warehouse/gold.db/sales_summary/metadata/00001-c3d4e5f6-g7h8-9012-cdef-123456789012.metadata.json

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://lakehouse-data-123456789012/warehouse/silver.db/sales_clean/metadata/
&lt;span class="go"&gt;[2026-01-13 11:40:15] 2026-01-13 11:28:47       3542 00000-23456789-01bc-defg-2345-678901bcdefg.metadata.json
2026-01-13 11:28:47       6234 00001-b2c3d4e5-f6g7-8901-bcde-f12345678901.metadata.json
2026-01-13 11:28:47       2876 snap-2345678901234567890-1-b2c3d4e5.avro
2026-01-13 11:28:47       4123 v1.metadata.json
2026-01-13 11:28:47         42 version-hint.text

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://lakehouse-data-123456789012/warehouse/silver.db/sales_clean/data/
&lt;span class="go"&gt;[2026-01-13 11:40:30]
                           PRE year=2024/
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test 4: Query Data with Athena
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create Athena workgroup (optional)&lt;/span&gt;
aws athena create-work-group &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; lakehouse-queries &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--configuration&lt;/span&gt; &lt;span class="s2"&gt;"ResultConfigurationUpdates={OutputLocation=s3://&lt;/span&gt;&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;&lt;span class="s2"&gt;/athena-results/}"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Query silver table using Athena&lt;/span&gt;
aws athena start-query-execution &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query-string&lt;/span&gt; &lt;span class="s2"&gt;"SELECT * FROM silver.sales_clean LIMIT 10"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--result-configuration&lt;/span&gt; &lt;span class="s2"&gt;"OutputLocation=s3://&lt;/span&gt;&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;&lt;span class="s2"&gt;/athena-results/"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Query gold aggregations&lt;/span&gt;
aws athena start-query-execution &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query-string&lt;/span&gt; &lt;span class="s2"&gt;"SELECT category, region, SUM(total_revenue) as revenue FROM gold.sales_summary GROUP BY category, region ORDER BY revenue DESC"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--result-configuration&lt;/span&gt; &lt;span class="s2"&gt;"OutputLocation=s3://&lt;/span&gt;&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;&lt;span class="s2"&gt;/athena-results/"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws athena create-work-group &lt;span class="nt"&gt;--name&lt;/span&gt; lakehouse-queries &lt;span class="nt"&gt;--configuration&lt;/span&gt; &lt;span class="s2"&gt;"ResultConfigurationUpdates={OutputLocation=s3://lakehouse-data-123456789012/athena-results/}"&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;span class="go"&gt;[2026-01-13 11:45:00] (No output indicates success)

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws athena start-query-execution &lt;span class="nt"&gt;--query-string&lt;/span&gt; &lt;span class="s2"&gt;"SELECT * FROM silver.sales_clean LIMIT 10"&lt;/span&gt; &lt;span class="nt"&gt;--result-configuration&lt;/span&gt; &lt;span class="s2"&gt;"OutputLocation=s3://lakehouse-data-123456789012/athena-results/"&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;span class="go"&gt;[2026-01-13 11:45:15] {
    "QueryExecutionId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws athena start-query-execution &lt;span class="nt"&gt;--query-string&lt;/span&gt; &lt;span class="s2"&gt;"SELECT category, region, SUM(total_revenue) as revenue FROM gold.sales_summary GROUP BY category, region ORDER BY revenue DESC"&lt;/span&gt; &lt;span class="nt"&gt;--result-configuration&lt;/span&gt; &lt;span class="s2"&gt;"OutputLocation=s3://lakehouse-data-123456789012/athena-results/"&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;span class="go"&gt;[2026-01-13 11:45:30] {
    "QueryExecutionId": "b2c3d4e5-f6g7-8901-bcde-f12345678901"
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test 5: Stateless Compute Validation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Step 1: Note current table state&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Before Cluster Deletion ==="&lt;/span&gt;
aws glue get-tables &lt;span class="nt"&gt;--database-name&lt;/span&gt; silver &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'TableList[*].Name'&lt;/span&gt;

&lt;span class="c"&gt;# Step 2: Delete ROSA cluster&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Deleting ROSA cluster..."&lt;/span&gt;
rosa delete cluster &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;--yes&lt;/span&gt;

&lt;span class="c"&gt;# Wait for deletion (or do this async)&lt;/span&gt;
&lt;span class="c"&gt;# rosa logs uninstall --cluster=$CLUSTER_NAME --watch&lt;/span&gt;

&lt;span class="c"&gt;# Step 3: Verify data persists in S3&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Data Still Exists in S3 ==="&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;/warehouse/ &lt;span class="nt"&gt;--recursive&lt;/span&gt; | &lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;

&lt;span class="c"&gt;# Step 4: Verify metadata persists in Glue&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Metadata Still Exists in Glue ==="&lt;/span&gt;
aws glue get-tables &lt;span class="nt"&gt;--database-name&lt;/span&gt; silver &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'TableList[*].Name'&lt;/span&gt;

&lt;span class="c"&gt;# Step 5: Recreate cluster and verify access&lt;/span&gt;
&lt;span class="c"&gt;# (Follow Phase 1 steps to recreate cluster)&lt;/span&gt;
&lt;span class="c"&gt;# Then resubmit Spark job to prove data is accessible&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Stateless Compute Validated ==="&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"All data and metadata persisted despite cluster deletion!"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Before Cluster Deletion ==="&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 12:00:00] === Before Cluster Deletion ===

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws glue get-tables &lt;span class="nt"&gt;--database-name&lt;/span&gt; silver &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'TableList[*].Name'&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 12:00:05] [
    "sales_clean"
]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Deleting ROSA cluster..."&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 12:00:10] Deleting ROSA cluster...

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rosa delete cluster &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;data-lakehouse &lt;span class="nt"&gt;--yes&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 12:00:15] I: Cluster 'data-lakehouse' will start uninstalling now
I: To watch the cluster uninstallation logs, run 'rosa logs uninstall -c data-lakehouse --watch'

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Data Still Exists in S3 ==="&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 12:35:00] === Data Still Exists in S3 ===

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://lakehouse-data-123456789012/warehouse/ &lt;span class="nt"&gt;--recursive&lt;/span&gt; | &lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 12:35:15] 42

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Metadata Still Exists in Glue ==="&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 12:35:20] === Metadata Still Exists in Glue ===

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws glue get-tables &lt;span class="nt"&gt;--database-name&lt;/span&gt; silver &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'TableList[*].Name'&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 12:35:25] [
    "sales_clean"
]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== Stateless Compute Validated ==="&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 12:35:30] === Stateless Compute Validated ===

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"All data and metadata persisted despite cluster deletion!"&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 12:35:35] All data and metadata persisted despite cluster deletion!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Resource Cleanup
&lt;/h2&gt;

&lt;p&gt;To avoid ongoing AWS charges, follow these steps to clean up all resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Delete Spark Applications
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete all Spark applications&lt;/span&gt;
kubectl delete sparkapplication &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs

&lt;span class="c"&gt;# Wait for pods to terminate&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;kubectl delete sparkapplication &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs
&lt;span class="go"&gt;[2026-01-13 13:00:00] sparkapplication.sparkoperator.k8s.io "process-sales-data" deleted

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs
&lt;span class="go"&gt;[2026-01-13 13:00:15] No resources found in spark-jobs namespace.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Delete Spark Operator
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Uninstall Spark Operator&lt;/span&gt;
helm uninstall spark-operator &lt;span class="nt"&gt;-n&lt;/span&gt; spark-operator

&lt;span class="c"&gt;# Delete namespace&lt;/span&gt;
kubectl delete namespace spark-operator
kubectl delete namespace spark-jobs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;helm uninstall spark-operator &lt;span class="nt"&gt;-n&lt;/span&gt; spark-operator
&lt;span class="go"&gt;[2026-01-13 13:02:00] release "spark-operator" uninstalled

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;kubectl delete namespace spark-operator
&lt;span class="go"&gt;[2026-01-13 13:02:15] namespace "spark-operator" deleted

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;kubectl delete namespace spark-jobs
&lt;span class="go"&gt;[2026-01-13 13:02:30] namespace "spark-jobs" deleted
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Delete ROSA Cluster
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete ROSA cluster&lt;/span&gt;
rosa delete cluster &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;--yes&lt;/span&gt;

&lt;span class="c"&gt;# Wait for deletion&lt;/span&gt;
rosa logs uninstall &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;--watch&lt;/span&gt;

&lt;span class="c"&gt;# Verify deletion&lt;/span&gt;
rosa list clusters
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rosa delete cluster &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;data-lakehouse &lt;span class="nt"&gt;--yes&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 13:05:00] I: Cluster 'data-lakehouse' will start uninstalling now
I: To watch the cluster uninstallation logs, run 'rosa logs uninstall -c data-lakehouse --watch'

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rosa logs uninstall &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;data-lakehouse &lt;span class="nt"&gt;--watch&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 13:05:15] time="2026-01-13T13:05:15Z" level=info msg="Destroying cluster resources"
time="2026-01-13T13:06:30Z" level=info msg="Deleting worker nodes"
time="2026-01-13T13:10:45Z" level=info msg="Deleting control plane"
time="2026-01-13T13:25:20Z" level=info msg="Removing load balancers"
time="2026-01-13T13:30:00Z" level=info msg="Deleting VPC and subnets"
time="2026-01-13T13:35:45Z" level=info msg="Cluster uninstallation complete"

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rosa list clusters
&lt;span class="go"&gt;[2026-01-13 13:36:00] ID  NAME  STATE  TOPOLOGY
(No clusters found)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Delete Glue Catalog Resources
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete tables from all databases&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;db &lt;span class="k"&gt;in &lt;/span&gt;bronze silver gold lakehouse&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Deleting tables from database: &lt;/span&gt;&lt;span class="nv"&gt;$db&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

  &lt;span class="c"&gt;# Get table names&lt;/span&gt;
  &lt;span class="nv"&gt;TABLES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws glue get-tables &lt;span class="nt"&gt;--database-name&lt;/span&gt; &lt;span class="nv"&gt;$db&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'TableList[*].Name'&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

  &lt;span class="c"&gt;# Delete each table&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;table &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nv"&gt;$TABLES&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  Deleting table: &lt;/span&gt;&lt;span class="nv"&gt;$table&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    aws glue delete-table &lt;span class="nt"&gt;--database-name&lt;/span&gt; &lt;span class="nv"&gt;$db&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$table&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;
  &lt;span class="k"&gt;done&lt;/span&gt;

  &lt;span class="c"&gt;# Delete database&lt;/span&gt;
  &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Deleting database: &lt;/span&gt;&lt;span class="nv"&gt;$db&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  aws glue delete-database &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$db&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;
&lt;span class="k"&gt;done

&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Glue Catalog resources deleted"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for &lt;/span&gt;db &lt;span class="k"&gt;in &lt;/span&gt;bronze silver gold lakehouse&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
&lt;span class="go"&gt;  [output for each database]
done
[2026-01-13 13:40:00] Deleting tables from database: bronze
  Deleting table: sales
Deleting database: bronze
Deleting tables from database: silver
  Deleting table: sales_clean
Deleting database: silver
Deleting tables from database: gold
  Deleting table: sales_summary
Deleting database: gold
Deleting tables from database: lakehouse
Deleting database: lakehouse

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Glue Catalog resources deleted"&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 13:41:00] Glue Catalog resources deleted
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5: Delete S3 Bucket
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete all objects in bucket&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;rm &lt;/span&gt;s3://&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--recursive&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Delete bucket&lt;/span&gt;
aws s3 rb s3://&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"S3 bucket deleted"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3 &lt;span class="nb"&gt;rm &lt;/span&gt;s3://lakehouse-data-123456789012 &lt;span class="nt"&gt;--recursive&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;span class="go"&gt;[2026-01-13 13:45:00] delete: s3://lakehouse-data-123456789012/bronze/
delete: s3://lakehouse-data-123456789012/bronze/sales/sales_data.csv
delete: s3://lakehouse-data-123456789012/gold/
delete: s3://lakehouse-data-123456789012/scripts/incremental_pipeline.py
delete: s3://lakehouse-data-123456789012/scripts/process_sales.py
delete: s3://lakehouse-data-123456789012/scripts/time_travel.py
delete: s3://lakehouse-data-123456789012/silver/
delete: s3://lakehouse-data-123456789012/warehouse/
[... 42 more deletions ...]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3 rb s3://lakehouse-data-123456789012 &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;span class="go"&gt;[2026-01-13 13:46:00] remove_bucket: lakehouse-data-123456789012

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"S3 bucket deleted"&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 13:46:05] S3 bucket deleted
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 6: Delete IAM Resources
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete IAM role policy&lt;/span&gt;
aws iam delete-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; SparkGlueCatalogRole &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-name&lt;/span&gt; GlueS3Access

&lt;span class="c"&gt;# Delete IAM role&lt;/span&gt;
aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; SparkGlueCatalogRole

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"IAM resources deleted"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws iam delete-role-policy &lt;span class="nt"&gt;--role-name&lt;/span&gt; SparkGlueCatalogRole &lt;span class="nt"&gt;--policy-name&lt;/span&gt; GlueS3Access
&lt;span class="go"&gt;[2026-01-13 13:48:00] (No output indicates success)

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; SparkGlueCatalogRole
&lt;span class="go"&gt;[2026-01-13 13:48:15] (No output indicates success)

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"IAM resources deleted"&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 13:48:20] IAM resources deleted
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7: Clean Up Local Files
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Remove temporary files&lt;/span&gt;
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; spark-glue-trust-policy.json
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; spark-glue-policy.json
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; lakehouse-bucket-policy.json
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; sample-data/
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; spark-jobs/
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; spark-iceberg/

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Local files cleaned up"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; spark-glue-trust-policy.json spark-glue-policy.json lakehouse-bucket-policy.json
&lt;span class="go"&gt;[2026-01-13 13:50:00]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; sample-data/ spark-jobs/ spark-iceberg/
&lt;span class="go"&gt;[2026-01-13 13:50:05]

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Local files cleaned up"&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 13:50:10] Local files cleaned up
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Verification
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify ROSA cluster is deleted&lt;/span&gt;
rosa list clusters

&lt;span class="c"&gt;# Verify S3 bucket is deleted&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;ls&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;lakehouse

&lt;span class="c"&gt;# Verify Glue databases are deleted&lt;/span&gt;
aws glue get-databases &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"bronze|silver|gold|lakehouse"&lt;/span&gt;

&lt;span class="c"&gt;# Verify IAM role is deleted&lt;/span&gt;
aws iam get-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; SparkGlueCatalogRole 2&amp;gt;&amp;amp;1 | &lt;span class="nb"&gt;grep &lt;/span&gt;NoSuchEntity

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Cleanup verification complete"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rosa list clusters
&lt;span class="go"&gt;[2026-01-13 13:52:00] ID  NAME  STATE  TOPOLOGY
(No clusters found)

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws s3 &lt;span class="nb"&gt;ls&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;lakehouse
&lt;span class="go"&gt;[2026-01-13 13:52:15] (No output - bucket deleted)

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws glue get-databases &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"bronze|silver|gold|lakehouse"&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 13:52:30] (No output - databases deleted)

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws iam get-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; SparkGlueCatalogRole 2&amp;gt;&amp;amp;1 | &lt;span class="nb"&gt;grep &lt;/span&gt;NoSuchEntity
&lt;span class="go"&gt;[2026-01-13 13:52:45] An error occurred (NoSuchEntity) when calling the GetRole operation: The role with name SparkGlueCatalogRole cannot be found.

&lt;/span&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Cleanup verification complete"&lt;/span&gt;
&lt;span class="go"&gt;[2026-01-13 13:53:00] Cleanup verification complete
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Issue: Spark Cannot Connect to Glue Catalog
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: Spark jobs fail with Glue Catalog connection errors&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Verify IAM role has Glue permissions&lt;/li&gt;
&lt;li&gt;Check service account annotation&lt;/li&gt;
&lt;li&gt;Verify AWS region configuration&lt;/li&gt;
&lt;li&gt;Check Glue Catalog connectivity
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify service account has IAM role&lt;/span&gt;
kubectl get sa spark-sa &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;-o&lt;/span&gt; yaml | &lt;span class="nb"&gt;grep &lt;/span&gt;eks.amazonaws.com

&lt;span class="c"&gt;# Test Glue access from pod&lt;/span&gt;
kubectl run aws-test &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;amazon/aws-cli &lt;span class="nt"&gt;--serviceaccount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;spark-sa &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  glue get-databases &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Check Spark configuration&lt;/span&gt;
kubectl get configmap spark-config &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;-o&lt;/span&gt; yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Issue: S3 Access Denied Errors
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: Spark jobs fail with S3 403 Forbidden errors&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Verify IAM role has S3 permissions&lt;/li&gt;
&lt;li&gt;Check bucket policy&lt;/li&gt;
&lt;li&gt;Verify IRSA configuration&lt;/li&gt;
&lt;li&gt;Check S3 endpoint configuration
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Test S3 access from pod&lt;/span&gt;
kubectl run aws-test &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;amazon/aws-cli &lt;span class="nt"&gt;--serviceaccount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;spark-sa &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;/

&lt;span class="c"&gt;# Check IAM role permissions&lt;/span&gt;
aws iam get-role-policy &lt;span class="nt"&gt;--role-name&lt;/span&gt; SparkGlueCatalogRole &lt;span class="nt"&gt;--policy-name&lt;/span&gt; GlueS3Access

&lt;span class="c"&gt;# Verify bucket policy&lt;/span&gt;
aws s3api get-bucket-policy &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Issue: Iceberg Table Not Found
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: Queries fail with "Table not found" errors&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Verify table exists in Glue Catalog&lt;/li&gt;
&lt;li&gt;Check Spark Catalog configuration&lt;/li&gt;
&lt;li&gt;Verify warehouse location&lt;/li&gt;
&lt;li&gt;Check table format
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List tables in Glue&lt;/span&gt;
aws glue get-tables &lt;span class="nt"&gt;--database-name&lt;/span&gt; silver &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Check if table is Iceberg format&lt;/span&gt;
aws glue get-table &lt;span class="nt"&gt;--database-name&lt;/span&gt; silver &lt;span class="nt"&gt;--name&lt;/span&gt; sales_clean &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Table.Parameters."table_type"'&lt;/span&gt;

&lt;span class="c"&gt;# Verify warehouse location&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://&lt;span class="nv"&gt;$LAKEHOUSE_BUCKET&lt;/span&gt;/warehouse/silver.db/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Issue: Spark Executors Not Starting
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: Driver pod runs but executors don't start&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check resource availability&lt;/li&gt;
&lt;li&gt;Verify RBAC permissions&lt;/li&gt;
&lt;li&gt;Check image pull policy&lt;/li&gt;
&lt;li&gt;Review executor logs
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check node resources&lt;/span&gt;
kubectl top nodes

&lt;span class="c"&gt;# Check pending pods&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs

&lt;span class="c"&gt;# Describe pending executor pod&lt;/span&gt;
kubectl describe pod &amp;lt;executor-pod-name&amp;gt; &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs

&lt;span class="c"&gt;# Check events&lt;/span&gt;
kubectl get events &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;--sort-by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'.lastTimestamp'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Issue: Performance Issues
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: Spark jobs are slow&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Increase executor resources&lt;/li&gt;
&lt;li&gt;Adjust partition count&lt;/li&gt;
&lt;li&gt;Enable adaptive query execution&lt;/li&gt;
&lt;li&gt;Optimize Iceberg table layout
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Update SparkApplication with more resources&lt;/span&gt;
kubectl edit sparkapplication process-sales-data &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs

&lt;span class="c"&gt;# Check execution plan&lt;/span&gt;
&lt;span class="c"&gt;# Add to Spark configuration:&lt;/span&gt;
&lt;span class="c"&gt;# spark.sql.adaptive.enabled=true&lt;/span&gt;
&lt;span class="c"&gt;# spark.sql.adaptive.coalescePartitions.enabled=true&lt;/span&gt;

&lt;span class="c"&gt;# Compact Iceberg table&lt;/span&gt;
&lt;span class="c"&gt;# Run in Spark:&lt;/span&gt;
&lt;span class="c"&gt;# spark.sql("CALL glue_catalog.system.rewrite_data_files('silver.sales_clean')")&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Debug Commands
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# View all Spark applications&lt;/span&gt;
kubectl get sparkapplication &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs

&lt;span class="c"&gt;# Get application status&lt;/span&gt;
kubectl get sparkapplication process-sales-data &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;-o&lt;/span&gt; yaml

&lt;span class="c"&gt;# View driver logs&lt;/span&gt;
kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;-l&lt;/span&gt; spark-role&lt;span class="o"&gt;=&lt;/span&gt;driver

&lt;span class="c"&gt;# View executor logs&lt;/span&gt;
kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;-l&lt;/span&gt; spark-role&lt;span class="o"&gt;=&lt;/span&gt;executor &lt;span class="nt"&gt;--tail&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;100

&lt;span class="c"&gt;# Check Spark Operator logs&lt;/span&gt;
kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; spark-operator deployment/spark-operator

&lt;span class="c"&gt;# List all pods&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;-o&lt;/span&gt; wide

&lt;span class="c"&gt;# Check configmaps&lt;/span&gt;
kubectl get configmap &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs

&lt;span class="c"&gt;# View events&lt;/span&gt;
kubectl get events &lt;span class="nt"&gt;-n&lt;/span&gt; spark-jobs &lt;span class="nt"&gt;--sort-by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'.lastTimestamp'&lt;/span&gt; | &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;-20&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






</description>
      <category>serverless</category>
      <category>kubernetes</category>
      <category>aws</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Hybrid MLOps Pipeline: Implementation Guide</title>
      <dc:creator>Marco Gonzalez</dc:creator>
      <pubDate>Mon, 29 Dec 2025 10:11:18 +0000</pubDate>
      <link>https://dev.to/mgonzalezo/hybrid-mlops-pipeline-implementation-guide-4odc</link>
      <guid>https://dev.to/mgonzalezo/hybrid-mlops-pipeline-implementation-guide-4odc</guid>
      <description>&lt;p&gt;&lt;strong&gt;Bursting to SageMaker Training from OpenShift Pipelines&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Overview&lt;/li&gt;
&lt;li&gt;Architecture&lt;/li&gt;
&lt;li&gt;Prerequisites&lt;/li&gt;
&lt;li&gt;Phase 1: ROSA Cluster Setup&lt;/li&gt;
&lt;li&gt;Phase 2: OpenShift Pipelines Installation&lt;/li&gt;
&lt;li&gt;Phase 3: AWS Controllers for Kubernetes (ACK)&lt;/li&gt;
&lt;li&gt;Phase 4: Amazon SageMaker Integration&lt;/li&gt;
&lt;li&gt;Phase 5: Model Storage with S3&lt;/li&gt;
&lt;li&gt;Phase 6: KServe Model Serving&lt;/li&gt;
&lt;li&gt;Phase 7: End-to-End Pipeline&lt;/li&gt;
&lt;li&gt;Testing and Validation&lt;/li&gt;
&lt;li&gt;Resource Cleanup&lt;/li&gt;
&lt;li&gt;Troubleshooting&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Project Purpose
&lt;/h3&gt;

&lt;p&gt;This platform delivers a &lt;strong&gt;hybrid MLOps solution&lt;/strong&gt; that optimizes costs by leveraging the best of both worlds: OpenShift for orchestration and management, and AWS SageMaker for intensive GPU training workloads. Instead of maintaining expensive GPU instances 24/7, this architecture enables dynamic "bursting" to AWS for training while maintaining cost-effective inference on OpenShift.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Value Propositions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost Optimization&lt;/strong&gt;: Pay for GPU instances only during training, not continuously&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Elastic Scalability&lt;/strong&gt;: Burst to powerful AWS instances (ml.p4d.24xlarge) on-demand&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid Flexibility&lt;/strong&gt;: Orchestrate from OpenShift while leveraging AWS managed services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automated Workflows&lt;/strong&gt;: End-to-end MLOps pipelines with minimal manual intervention&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production-Ready Serving&lt;/strong&gt;: Low-latency inference on cost-effective OpenShift nodes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Solution Components
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ROSA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed OpenShift cluster on AWS&lt;/td&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenShift Pipelines&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tekton-based CI/CD orchestration&lt;/td&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ACK (AWS Controllers for Kubernetes)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manage AWS services from Kubernetes&lt;/td&gt;
&lt;td&gt;Integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Amazon SageMaker&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed ML training with GPU instances&lt;/td&gt;
&lt;td&gt;Training&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Amazon S3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Model artifacts and dataset storage&lt;/td&gt;
&lt;td&gt;Data Lake&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;KServe&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Model serving on OpenShift&lt;/td&gt;
&lt;td&gt;Inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Amazon ECR&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Container registry for custom images&lt;/td&gt;
&lt;td&gt;Container Registry&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  High-Level Architecture Diagram
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7e9p3a7dbdzmdbw4urnr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7e9p3a7dbdzmdbw4urnr.png" alt="Architecture" width="777" height="948"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Workflow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Data Preparation&lt;/strong&gt;: Training datasets uploaded to S3&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pipeline Trigger&lt;/strong&gt;: Developer triggers OpenShift Pipeline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training Initiation&lt;/strong&gt;: ACK creates SageMaker Training Job&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU Provisioning&lt;/strong&gt;: SageMaker spins up ml.p4d.24xlarge instances&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Training&lt;/strong&gt;: Training executes on high-performance GPUs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Artifact Storage&lt;/strong&gt;: Trained model saved to S3&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instance Termination&lt;/strong&gt;: GPU instances automatically shut down&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Deployment&lt;/strong&gt;: KServe pulls model from S3 to OpenShift&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inference Serving&lt;/strong&gt;: Model serves predictions on cost-effective CPU nodes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring&lt;/strong&gt;: Pipeline tracks status and logs throughout&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Cost Analysis
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Traditional Approach&lt;/strong&gt; (GPU instances running 24/7):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ml.p4d.24xlarge: ~$32/hour&lt;/li&gt;
&lt;li&gt;Monthly cost: ~$23,040 (continuous operation)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Hybrid Approach&lt;/strong&gt; (burst for training only):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Training: 4 hours/week × $32/hour = $128/week = $512/month&lt;/li&gt;
&lt;li&gt;ROSA inference nodes: ~$1,500/month (m5.2xlarge instances)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Total: ~$2,012/month&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Savings: ~91% compared to traditional approach&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Required Accounts and Subscriptions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;strong&gt;AWS Account&lt;/strong&gt; with administrative access&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Red Hat Account&lt;/strong&gt; with OpenShift subscription&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;ROSA Enabled&lt;/strong&gt; in your AWS account&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Amazon SageMaker Access&lt;/strong&gt; in your target region&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;AWS Service Quotas&lt;/strong&gt; for ml.p4d instances (request if needed)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Required Tools
&lt;/h3&gt;

&lt;p&gt;Install the following CLI tools on your workstation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# AWS CLI (v2)&lt;/span&gt;
curl &lt;span class="s2"&gt;"https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="s2"&gt;"awscliv2.zip"&lt;/span&gt;
unzip awscliv2.zip
&lt;span class="nb"&gt;sudo&lt;/span&gt; ./aws/install

&lt;span class="c"&gt;# ROSA CLI&lt;/span&gt;
wget https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz
&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xvf&lt;/span&gt; rosa-linux.tar.gz
&lt;span class="nb"&gt;sudo mv &lt;/span&gt;rosa /usr/local/bin/rosa
rosa version

&lt;span class="c"&gt;# OpenShift CLI (oc)&lt;/span&gt;
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz
&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xvf&lt;/span&gt; openshift-client-linux.tar.gz
&lt;span class="nb"&gt;sudo mv &lt;/span&gt;oc kubectl /usr/local/bin/
oc version

&lt;span class="c"&gt;# Tekton CLI&lt;/span&gt;
curl &lt;span class="nt"&gt;-LO&lt;/span&gt; https://github.com/tektoncd/cli/releases/download/v0.33.0/tkn_0.33.0_Linux_x86_64.tar.gz
&lt;span class="nb"&gt;tar &lt;/span&gt;xvzf tkn_0.33.0_Linux_x86_64.tar.gz
&lt;span class="nb"&gt;sudo mv &lt;/span&gt;tkn /usr/local/bin/
tkn version

&lt;span class="c"&gt;# Helm (v3)&lt;/span&gt;
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS Prerequisites
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Service Quotas
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check SageMaker quotas&lt;/span&gt;
aws service-quotas get-service-quota &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-code&lt;/span&gt; sagemaker &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quota-code&lt;/span&gt; L-2E8D9C5E &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1

&lt;span class="c"&gt;# Check EC2 quotas for ROSA&lt;/span&gt;
aws service-quotas get-service-quota &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-code&lt;/span&gt; ec2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quota-code&lt;/span&gt; L-1216C47A &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  IAM Permissions
&lt;/h4&gt;

&lt;p&gt;Your AWS IAM user/role needs permissions for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EC2 (VPC, subnets, security groups)&lt;/li&gt;
&lt;li&gt;IAM (roles, policies)&lt;/li&gt;
&lt;li&gt;S3 (buckets, objects)&lt;/li&gt;
&lt;li&gt;SageMaker (training jobs, models)&lt;/li&gt;
&lt;li&gt;ECR (repositories, images)&lt;/li&gt;
&lt;li&gt;CloudWatch (logs, metrics)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Knowledge Prerequisites
&lt;/h3&gt;

&lt;p&gt;You should be familiar with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Machine Learning concepts (training, inference, model artifacts)&lt;/li&gt;
&lt;li&gt;AWS fundamentals (VPC, IAM, S3)&lt;/li&gt;
&lt;li&gt;Kubernetes basics (pods, deployments, services)&lt;/li&gt;
&lt;li&gt;CI/CD pipeline concepts&lt;/li&gt;
&lt;li&gt;Python and ML frameworks (TensorFlow, PyTorch, scikit-learn)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Phase 1: ROSA Cluster Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1.1: Configure AWS CLI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Configure AWS credentials&lt;/span&gt;
aws configure

&lt;span class="c"&gt;# Verify configuration&lt;/span&gt;
aws sts get-caller-identity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1.2: Initialize ROSA
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Log in to Red Hat&lt;/span&gt;
rosa login

&lt;span class="c"&gt;# Verify ROSA prerequisites&lt;/span&gt;
rosa verify quota
rosa verify permissions

&lt;span class="c"&gt;# Initialize ROSA in your AWS account&lt;/span&gt;
rosa init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1.3: Create ROSA Cluster
&lt;/h3&gt;

&lt;p&gt;Create a ROSA cluster optimized for MLOps workloads:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set environment variables&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLUSTER_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"mlops-platform"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MACHINE_TYPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"m5.2xlarge"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;COMPUTE_NODES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3

&lt;span class="c"&gt;# Create ROSA cluster (takes ~40 minutes)&lt;/span&gt;
rosa create cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cluster-name&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--multi-az&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compute-machine-type&lt;/span&gt; &lt;span class="nv"&gt;$MACHINE_TYPE&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compute-nodes&lt;/span&gt; &lt;span class="nv"&gt;$COMPUTE_NODES&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--machine-cidr&lt;/span&gt; 10.0.0.0/16 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-cidr&lt;/span&gt; 172.30.0.0/16 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--pod-cidr&lt;/span&gt; 10.128.0.0/14 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--host-prefix&lt;/span&gt; 23 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--yes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Configuration Rationale&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;m5.2xlarge&lt;/strong&gt;: 8 vCPUs, 32 GB RAM - suitable for ML inference and pipeline orchestration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 nodes&lt;/strong&gt;: High availability for production workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-AZ&lt;/strong&gt;: Ensures resilience for serving layer&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1.4: Monitor Cluster Creation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Watch cluster installation progress&lt;/span&gt;
rosa logs &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;--watch&lt;/span&gt;

&lt;span class="c"&gt;# Check cluster status&lt;/span&gt;
rosa describe cluster &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1.5: Create Admin User
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create cluster admin user&lt;/span&gt;
rosa create admin &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt;

&lt;span class="c"&gt;# Save the login command (will be displayed in output)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1.6: Connect to Cluster
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Use the login command from previous step&lt;/span&gt;
oc login https://api.mlops-platform.xxxx.p1.openshiftapps.com:6443 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--username&lt;/span&gt; cluster-admin &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--password&lt;/span&gt; &amp;lt;your-password&amp;gt;

&lt;span class="c"&gt;# Verify cluster access&lt;/span&gt;
oc cluster-info
oc get nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1.7: Create Project Namespaces
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create namespace for pipelines&lt;/span&gt;
oc new-project mlops-pipelines

&lt;span class="c"&gt;# Create namespace for model serving&lt;/span&gt;
oc new-project mlops-serving

&lt;span class="c"&gt;# Create namespace for ACK controllers&lt;/span&gt;
oc new-project ack-system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 2: OpenShift Pipelines Installation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 2.1: Install OpenShift Pipelines Operator
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create operator subscription&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-pipelines
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: openshift-pipelines-operator
  namespace: openshift-operators
spec:
  channel: latest
  name: openshift-pipelines-operator-rh
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Automatic
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2.2: Verify Operator Installation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Wait for operator to be ready (takes 2-3 minutes)&lt;/span&gt;
oc get csv &lt;span class="nt"&gt;-n&lt;/span&gt; openshift-operators | &lt;span class="nb"&gt;grep &lt;/span&gt;pipelines

&lt;span class="c"&gt;# Verify Tekton components are running&lt;/span&gt;
oc get pods &lt;span class="nt"&gt;-n&lt;/span&gt; openshift-pipelines

&lt;span class="c"&gt;# Check Tekton version&lt;/span&gt;
tkn version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2.3: Configure Pipeline Service Account
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create service account for pipelines&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: pipeline-sa
  namespace: mlops-pipelines
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: pipeline-sa-edit
  namespace: mlops-pipelines
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: edit
subjects:
- kind: ServiceAccount
  name: pipeline-sa
  namespace: mlops-pipelines
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 3: AWS Controllers for Kubernetes (ACK)
&lt;/h2&gt;

&lt;p&gt;ACK enables managing AWS services directly from Kubernetes using custom resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3.1: Install ACK SageMaker Controller
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set variables&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ACK_K8S_NAMESPACE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ack-system
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-east-1
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ACK_SAGEMAKER_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1.2.10

&lt;span class="c"&gt;# Download ACK SageMaker controller&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sagemaker
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;RELEASE_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-sL&lt;/span&gt; https://api.github.com/repos/aws-controllers-k8s/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="nt"&gt;-controller&lt;/span&gt;/releases/latest | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s1"&gt;'"tag_name":'&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;'"'&lt;/span&gt; &lt;span class="nt"&gt;-f4&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

wget https://github.com/aws-controllers-k8s/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="nt"&gt;-controller&lt;/span&gt;/releases/download/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RELEASE_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/install.yaml

&lt;span class="c"&gt;# Apply ACK controller&lt;/span&gt;
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; install.yaml

&lt;span class="c"&gt;# Verify installation&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; ack-system
kubectl get crd | &lt;span class="nb"&gt;grep &lt;/span&gt;sagemaker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3.2: Create IAM Role for ACK
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create IAM policy for SageMaker access&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ack-sagemaker-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateTrainingJob",
        "sagemaker:DescribeTrainingJob",
        "sagemaker:StopTrainingJob",
        "sagemaker:CreateModel",
        "sagemaker:DeleteModel",
        "sagemaker:DescribeModel",
        "sagemaker:CreateEndpointConfig",
        "sagemaker:DeleteEndpointConfig",
        "sagemaker:DescribeEndpointConfig"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::mlops-*",
        "arn:aws:s3:::mlops-*/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "iam:PassedToService": "sagemaker.amazonaws.com"
        }
      }
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create policy&lt;/span&gt;
aws iam create-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-name&lt;/span&gt; ACKSageMakerPolicy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-document&lt;/span&gt; file://ack-sagemaker-policy.json

&lt;span class="c"&gt;# Get OIDC provider for ROSA&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;rosa describe cluster &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; json | jq &lt;span class="nt"&gt;-r&lt;/span&gt; .aws.sts.oidc_endpoint_url | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s|https://||'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity &lt;span class="nt"&gt;--query&lt;/span&gt; Account &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Create trust policy&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ack-trust-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:oidc-provider/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:sub": "system:serviceaccount:ack-system:ack-sagemaker-controller"
        }
      }
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create IAM role&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ACK_ROLE_ARN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws iam create-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; ACKSageMakerControllerRole &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assume-role-policy-document&lt;/span&gt; file://ack-trust-policy.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Role.Arn'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Attach policy to role&lt;/span&gt;
aws iam attach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; ACKSageMakerControllerRole &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:policy/ACKSageMakerPolicy

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"ACK IAM Role ARN: &lt;/span&gt;&lt;span class="nv"&gt;$ACK_ROLE_ARN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3.3: Configure ACK Controller with IAM Role
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Annotate service account&lt;/span&gt;
kubectl annotate serviceaccount &lt;span class="nt"&gt;-n&lt;/span&gt; ack-system ack-sagemaker-controller &lt;span class="se"&gt;\&lt;/span&gt;
  eks.amazonaws.com/role-arn&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$ACK_ROLE_ARN&lt;/span&gt;

&lt;span class="c"&gt;# Restart ACK controller to pick up annotation&lt;/span&gt;
kubectl rollout restart deployment &lt;span class="nt"&gt;-n&lt;/span&gt; ack-system ack-sagemaker-controller

&lt;span class="c"&gt;# Verify controller is running&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; ack-system
kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; ack-system deployment/ack-sagemaker-controller
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 4: Amazon SageMaker Integration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 4.1: Create SageMaker Execution Role
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create trust policy for SageMaker&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; sagemaker-trust-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "sagemaker.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create SageMaker execution role&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SAGEMAKER_ROLE_ARN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws iam create-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; SageMakerMLOpsExecutionRole &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assume-role-policy-document&lt;/span&gt; file://sagemaker-trust-policy.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Role.Arn'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Attach AWS managed policy&lt;/span&gt;
aws iam attach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; SageMakerMLOpsExecutionRole &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::aws:policy/AmazonSageMakerFullAccess

&lt;span class="c"&gt;# Create custom S3 access policy&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; sagemaker-s3-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::mlops-*",
        "arn:aws:s3:::mlops-*/*"
      ]
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;aws iam put-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; SageMakerMLOpsExecutionRole &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-name&lt;/span&gt; S3Access &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-document&lt;/span&gt; file://sagemaker-s3-policy.json

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"SageMaker Execution Role ARN: &lt;/span&gt;&lt;span class="nv"&gt;$SAGEMAKER_ROLE_ARN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4.2: Create S3 Buckets
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create S3 buckets for ML artifacts&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ML_BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"mlops-artifacts-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DATA_BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"mlops-datasets-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

aws s3 mb s3://&lt;span class="nv"&gt;$ML_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;
aws s3 mb s3://&lt;span class="nv"&gt;$DATA_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Enable versioning&lt;/span&gt;
aws s3api put-bucket-versioning &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$ML_BUCKET&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--versioning-configuration&lt;/span&gt; &lt;span class="nv"&gt;Status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Enabled

aws s3api put-bucket-versioning &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$DATA_BUCKET&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--versioning-configuration&lt;/span&gt; &lt;span class="nv"&gt;Status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Enabled

&lt;span class="c"&gt;# Create folder structure&lt;/span&gt;
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$ML_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; models/
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$ML_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; checkpoints/
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$DATA_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; training/
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$DATA_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; validation/

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"S3 Buckets created:"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  Models: s3://&lt;/span&gt;&lt;span class="nv"&gt;$ML_BUCKET&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  Data: s3://&lt;/span&gt;&lt;span class="nv"&gt;$DATA_BUCKET&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4.3: Create ECR Repository
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create ECR repository for custom training images&lt;/span&gt;
aws ecr create-repository &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repository-name&lt;/span&gt; mlops/training &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Get ECR login command&lt;/span&gt;
aws ecr get-login-password &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; | &lt;span class="se"&gt;\&lt;/span&gt;
  docker login &lt;span class="nt"&gt;--username&lt;/span&gt; AWS &lt;span class="nt"&gt;--password-stdin&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;.dkr.ecr.&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;.amazonaws.com

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ECR_TRAINING_URI&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.dkr.ecr.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.amazonaws.com/mlops/training"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"ECR Repository: &lt;/span&gt;&lt;span class="nv"&gt;$ECR_TRAINING_URI&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4.4: Build Custom Training Container
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create directory for training container&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; sagemaker-training
&lt;span class="nb"&gt;cd &lt;/span&gt;sagemaker-training

&lt;span class="c"&gt;# Create training script&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; train.py &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;'
import argparse
import os
import json
import joblib
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import boto3

def load_data_from_s3(data_dir):
    """Load training and validation data"""
    print(f"Loading data from {data_dir}")

    # Load training data
    X_train = np.load(os.path.join(data_dir, 'train', 'X_train.npy'))
    y_train = np.load(os.path.join(data_dir, 'train', 'y_train.npy'))

    # Load validation data
    X_val = np.load(os.path.join(data_dir, 'validation', 'X_val.npy'))
    y_val = np.load(os.path.join(data_dir, 'validation', 'y_val.npy'))

    return X_train, y_train, X_val, y_val

def train_model(X_train, y_train, hyperparameters):
    """Train Random Forest model"""
    print("Training model with hyperparameters:", hyperparameters)

    model = RandomForestClassifier(
        n_estimators=hyperparameters['n_estimators'],
        max_depth=hyperparameters['max_depth'],
        random_state=42,
        n_jobs=-1
    )

    model.fit(X_train, y_train)
    return model

def evaluate_model(model, X_val, y_val):
    """Evaluate model on validation set"""
    y_pred = model.predict(X_val)
    accuracy = accuracy_score(y_val, y_pred)
    report = classification_report(y_val, y_pred, output_dict=True)

    print(f"Validation Accuracy: {accuracy:.4f}")
    print(classification_report(y_val, y_pred))

    return accuracy, report

def save_model(model, model_dir, metrics):
    """Save model and metrics"""
    os.makedirs(model_dir, exist_ok=True)

    # Save model
    model_path = os.path.join(model_dir, 'model.joblib')
    joblib.dump(model, model_path)
    print(f"Model saved to {model_path}")

    # Save metrics
    metrics_path = os.path.join(model_dir, 'metrics.json')
    with open(metrics_path, 'w') as f:
        json.dump(metrics, f, indent=2)
    print(f"Metrics saved to {metrics_path}")

if __name__ == '__main__':
    parser = argparse.ArgumentParser()

    # Hyperparameters
    parser.add_argument('--n_estimators', type=int, default=100)
    parser.add_argument('--max_depth', type=int, default=10)

    # SageMaker specific arguments
    parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR', '/opt/ml/model'))
    parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN', '/opt/ml/input/data/train'))
    parser.add_argument('--validation', type=str, default=os.environ.get('SM_CHANNEL_VALIDATION', '/opt/ml/input/data/validation'))

    args = parser.parse_args()

    # Load data
    data_dir = os.path.dirname(args.train)
    X_train, y_train, X_val, y_val = load_data_from_s3(data_dir)

    # Train model
    hyperparameters = {
        'n_estimators': args.n_estimators,
        'max_depth': args.max_depth
    }
    model = train_model(X_train, y_train, hyperparameters)

    # Evaluate model
    accuracy, report = evaluate_model(model, X_val, y_val)

    # Save model and metrics
    metrics = {
        'accuracy': accuracy,
        'classification_report': report,
        'hyperparameters': hyperparameters
    }
    save_model(model, args.model_dir, metrics)

    print("Training completed successfully!")
&lt;/span&gt;&lt;span class="no"&gt;PYTHON

&lt;/span&gt;&lt;span class="c"&gt;# Create Dockerfile&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; Dockerfile &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;DOCKERFILE&lt;/span&gt;&lt;span class="sh"&gt;'
FROM python:3.10-slim

# Install dependencies
RUN pip install --no-cache-dir &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    numpy==1.24.3 &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    scikit-learn==1.3.0 &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    joblib==1.3.2 &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    boto3==1.28.25

# Copy training script
COPY train.py /opt/ml/code/train.py

# Set working directory
WORKDIR /opt/ml/code

# Set entry point
ENV SAGEMAKER_PROGRAM train.py

ENTRYPOINT ["python", "train.py"]
&lt;/span&gt;&lt;span class="no"&gt;DOCKERFILE

&lt;/span&gt;&lt;span class="c"&gt;# Build and push image&lt;/span&gt;
docker build &lt;span class="nt"&gt;-t&lt;/span&gt; mlops-training:latest &lt;span class="nb"&gt;.&lt;/span&gt;
docker tag mlops-training:latest &lt;span class="nv"&gt;$ECR_TRAINING_URI&lt;/span&gt;:latest
docker push &lt;span class="nv"&gt;$ECR_TRAINING_URI&lt;/span&gt;:latest

&lt;span class="nb"&gt;cd&lt;/span&gt; ..
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Training container image pushed to ECR"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 5: Model Storage with S3
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 5.1: Upload Sample Training Data
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create sample dataset&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; sample-data
&lt;span class="nb"&gt;cd &lt;/span&gt;sample-data

python3 &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;
import numpy as np

# Generate synthetic classification dataset
np.random.seed(42)

# Training data
X_train = np.random.randn(1000, 20)
y_train = np.random.randint(0, 2, 1000)

# Validation data
X_val = np.random.randn(200, 20)
y_val = np.random.randint(0, 2, 200)

# Save to files
np.save('X_train.npy', X_train)
np.save('y_train.npy', y_train)
np.save('X_val.npy', X_val)
np.save('y_val.npy', y_val)

print("Sample dataset created")
&lt;/span&gt;&lt;span class="no"&gt;PYTHON

&lt;/span&gt;&lt;span class="c"&gt;# Upload to S3&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;X_train.npy s3://&lt;span class="nv"&gt;$DATA_BUCKET&lt;/span&gt;/training/
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;y_train.npy s3://&lt;span class="nv"&gt;$DATA_BUCKET&lt;/span&gt;/training/
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;X_val.npy s3://&lt;span class="nv"&gt;$DATA_BUCKET&lt;/span&gt;/validation/
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;y_val.npy s3://&lt;span class="nv"&gt;$DATA_BUCKET&lt;/span&gt;/validation/

&lt;span class="nb"&gt;cd&lt;/span&gt; ..
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Sample data uploaded to S3"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5.2: Create ConfigMap for S3 Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Store S3 bucket names in ConfigMap&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: mlops-config
  namespace: mlops-pipelines
data:
  ML_BUCKET: "&lt;/span&gt;&lt;span class="nv"&gt;$ML_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;"
  DATA_BUCKET: "&lt;/span&gt;&lt;span class="nv"&gt;$DATA_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;"
  AWS_REGION: "&lt;/span&gt;&lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;&lt;span class="sh"&gt;"
  SAGEMAKER_ROLE_ARN: "&lt;/span&gt;&lt;span class="nv"&gt;$SAGEMAKER_ROLE_ARN&lt;/span&gt;&lt;span class="sh"&gt;"
  ECR_TRAINING_URI: "&lt;/span&gt;&lt;span class="nv"&gt;$ECR_TRAINING_URI&lt;/span&gt;&lt;span class="sh"&gt;"
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 6: KServe Model Serving
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 6.1: Install KServe
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Serverless Operator (prerequisite for KServe)&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: serverless-operator
  namespace: openshift-operators
spec:
  channel: stable
  name: serverless-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Automatic
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Wait for operator to be ready&lt;/span&gt;
&lt;span class="nb"&gt;sleep &lt;/span&gt;30
oc get csv &lt;span class="nt"&gt;-n&lt;/span&gt; openshift-operators | &lt;span class="nb"&gt;grep &lt;/span&gt;serverless

&lt;span class="c"&gt;# Install KServe via Red Hat OpenShift AI or manually&lt;/span&gt;
&lt;span class="c"&gt;# For this guide, we'll install KServe components manually&lt;/span&gt;

&lt;span class="c"&gt;# Install Knative Serving&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: knative-serving
---
apiVersion: operator.knative.dev/v1beta1
kind: KnativeServing
metadata:
  name: knative-serving
  namespace: knative-serving
spec:
  ingress:
    istio:
      enabled: false
  config:
    domain:
      svc.cluster.local: ""
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Install KServe&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;KSERVE_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;v0.11.0
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; https://github.com/kserve/kserve/releases/download/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;KSERVE_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/kserve.yaml

&lt;span class="c"&gt;# Wait for KServe to be ready&lt;/span&gt;
kubectl &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="nt"&gt;--for&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;condition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Ready pods &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; kserve &lt;span class="nt"&gt;--timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;300s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 6.2: Create Custom ServingRuntime for scikit-learn
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create scikit-learn serving runtime&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  name: sklearn-runtime
  namespace: mlops-serving
spec:
  supportedModelFormats:
    - name: sklearn
      version: "1"
      autoSelect: true
  containers:
    - name: kserve-container
      image: kserve/sklearnserver:v0.11.0
      resources:
        requests:
          cpu: "1"
          memory: "2Gi"
        limits:
          cpu: "2"
          memory: "4Gi"
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 6.3: Create Service Account for Model Access
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create IAM role for KServe to access S3&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; kserve-trust-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:oidc-provider/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:sub": "system:serviceaccount:mlops-serving:kserve-sa"
        }
      }
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create role&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;KSERVE_ROLE_ARN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws iam create-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; KServeS3AccessRole &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assume-role-policy-document&lt;/span&gt; file://kserve-trust-policy.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Role.Arn'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Create S3 read policy&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; kserve-s3-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ML_BUCKET&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;",
        "arn:aws:s3:::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ML_BUCKET&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;/*"
      ]
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;aws iam put-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; KServeS3AccessRole &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-name&lt;/span&gt; S3ReadAccess &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-document&lt;/span&gt; file://kserve-s3-policy.json

&lt;span class="c"&gt;# Create service account&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kserve-sa
  namespace: mlops-serving
  annotations:
    eks.amazonaws.com/role-arn: &lt;/span&gt;&lt;span class="nv"&gt;$KSERVE_ROLE_ARN&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 7: End-to-End Pipeline
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 7.1: Create Pipeline Tasks
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create Task for SageMaker training&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: sagemaker-training
  namespace: mlops-pipelines
spec:
  params:
    - name: job-name
      type: string
      description: SageMaker training job name
    - name: role-arn
      type: string
      description: SageMaker execution role ARN
    - name: image-uri
      type: string
      description: Training container image URI
    - name: instance-type
      type: string
      default: ml.p4d.24xlarge
    - name: instance-count
      type: string
      default: "1"
    - name: volume-size
      type: string
      default: "50"
    - name: max-runtime
      type: string
      default: "3600"
    - name: data-bucket
      type: string
    - name: model-bucket
      type: string
  steps:
    - name: create-training-job
      image: amazon/aws-cli:latest
      script: |
        #!/bin/bash
        set -e

        # Create SageMaker training job manifest
        cat &amp;gt; training-job.yaml &amp;lt;&amp;lt;YAML
        apiVersion: sagemaker.services.k8s.aws/v1alpha1
        kind: TrainingJob
        metadata:
          name: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.job-name)
          namespace: mlops-pipelines
        spec:
          trainingJobName: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.job-name)
          roleARN: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.role-arn)
          algorithmSpecification:
            trainingImage: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.image-uri)
            trainingInputMode: File
          resourceConfig:
            instanceType: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.instance-type)
            instanceCount: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.instance-count)
            volumeSizeInGB: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.volume-size)
          inputDataConfig:
            - channelName: train
              dataSource:
                s3DataSource:
                  s3DataType: S3Prefix
                  s3URI: s3://&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.data-bucket)/training/
                  s3DataDistributionType: FullyReplicated
            - channelName: validation
              dataSource:
                s3DataSource:
                  s3DataType: S3Prefix
                  s3URI: s3://&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.data-bucket)/validation/
                  s3DataDistributionType: FullyReplicated
          outputDataConfig:
            s3OutputPath: s3://&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.model-bucket)/models/
          stoppingCondition:
            maxRuntimeInSeconds: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.max-runtime)
        YAML

        # Apply the training job
        kubectl apply -f training-job.yaml

        echo "SageMaker training job created: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.job-name)"

    - name: wait-for-completion
      image: amazon/aws-cli:latest
      script: |
        #!/bin/bash
        set -e

        echo "Waiting for training job to complete..."

        while true; do
          STATUS=&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(kubectl get trainingjob &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.job-name) -n mlops-pipelines -o jsonpath='{.status.trainingJobStatus}')

          echo "Current status: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;STATUS"

          if [ "&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;STATUS" == "Completed" ]; then
            echo "Training job completed successfully!"
            break
          elif [ "&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;STATUS" == "Failed" ] || [ "&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;STATUS" == "Stopped" ]; then
            echo "Training job failed or was stopped"
            exit 1
          fi

          sleep 30
        done

        # Get model artifact location
        MODEL_URI=&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(kubectl get trainingjob &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.job-name) -n mlops-pipelines -o jsonpath='{.status.modelArtifacts.s3ModelArtifacts}')
        echo "Model artifacts saved to: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;MODEL_URI"
        echo -n "&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;MODEL_URI" &amp;gt; /workspace/model-uri.txt
  workspaces:
    - name: output
      description: Workspace to store output
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7.2: Create Task for Model Deployment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create Task for deploying model to KServe&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
  name: deploy-model
  namespace: mlops-pipelines
spec:
  params:
    - name: model-name
      type: string
      description: Name for the deployed model
    - name: model-uri
      type: string
      description: S3 URI of the model artifacts
    - name: model-format
      type: string
      default: sklearn
  steps:
    - name: create-inference-service
      image: quay.io/openshift/origin-cli:latest
      script: |
        #!/bin/bash
        set -e

        # Create InferenceService
        cat &amp;gt; inference-service.yaml &amp;lt;&amp;lt;YAML
        apiVersion: serving.kserve.io/v1beta1
        kind: InferenceService
        metadata:
          name: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.model-name)
          namespace: mlops-serving
        spec:
          predictor:
            serviceAccountName: kserve-sa
            model:
              modelFormat:
                name: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.model-format)
              storageUri: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.model-uri)
              resources:
                requests:
                  cpu: "1"
                  memory: "2Gi"
                limits:
                  cpu: "2"
                  memory: "4Gi"
        YAML

        kubectl apply -f inference-service.yaml

        echo "InferenceService created: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.model-name)"

        # Wait for InferenceService to be ready
        kubectl wait --for=condition=Ready &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
          inferenceservice/&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.model-name) &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
          -n mlops-serving &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
          --timeout=300s

        echo "Model deployment completed successfully!"

        # Get inference endpoint
        ENDPOINT=&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(kubectl get inferenceservice &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.model-name) -n mlops-serving -o jsonpath='{.status.url}')
        echo "Inference endpoint: &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;ENDPOINT"
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7.3: Create Complete MLOps Pipeline
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create the full pipeline&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: tekton.dev/v1beta1
kind: Pipeline
metadata:
  name: mlops-pipeline
  namespace: mlops-pipelines
spec:
  params:
    - name: model-name
      type: string
      description: Name for the model
      default: ml-model
    - name: sagemaker-role-arn
      type: string
      description: SageMaker execution role ARN
    - name: training-image-uri
      type: string
      description: ECR URI for training container
    - name: data-bucket
      type: string
      description: S3 bucket with training data
    - name: model-bucket
      type: string
      description: S3 bucket for model artifacts
    - name: instance-type
      type: string
      description: SageMaker instance type
      default: ml.m5.xlarge
  workspaces:
    - name: shared-workspace
  tasks:
    - name: train-model
      taskRef:
        name: sagemaker-training
      params:
        - name: job-name
          value: "&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.model-name)-&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(context.pipelineRun.uid)"
        - name: role-arn
          value: "&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.sagemaker-role-arn)"
        - name: image-uri
          value: "&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.training-image-uri)"
        - name: instance-type
          value: "&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.instance-type)"
        - name: data-bucket
          value: "&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.data-bucket)"
        - name: model-bucket
          value: "&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.model-bucket)"
      workspaces:
        - name: output
          workspace: shared-workspace

    - name: deploy-model
      runAfter:
        - train-model
      taskRef:
        name: deploy-model
      params:
        - name: model-name
          value: "&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.model-name)"
        - name: model-uri
          value: "s3://&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.model-bucket)/models/&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(params.model-name)-&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="sh"&gt;(context.pipelineRun.uid)/output/model.tar.gz"
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7.4: Create PipelineRun
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create workspace PVC&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mlops-workspace
  namespace: mlops-pipelines
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create PipelineRun to execute the pipeline&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
  generateName: mlops-pipeline-run-
  namespace: mlops-pipelines
spec:
  pipelineRef:
    name: mlops-pipeline
  params:
    - name: model-name
      value: "classifier-model"
    - name: sagemaker-role-arn
      value: "&lt;/span&gt;&lt;span class="nv"&gt;$SAGEMAKER_ROLE_ARN&lt;/span&gt;&lt;span class="sh"&gt;"
    - name: training-image-uri
      value: "&lt;/span&gt;&lt;span class="nv"&gt;$ECR_TRAINING_URI&lt;/span&gt;&lt;span class="sh"&gt;:latest"
    - name: data-bucket
      value: "&lt;/span&gt;&lt;span class="nv"&gt;$DATA_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;"
    - name: model-bucket
      value: "&lt;/span&gt;&lt;span class="nv"&gt;$ML_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;"
    - name: instance-type
      value: "ml.m5.xlarge"
  workspaces:
    - name: shared-workspace
      persistentVolumeClaim:
        claimName: mlops-workspace
  serviceAccountName: pipeline-sa
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Testing and Validation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Test 1: Monitor Pipeline Execution
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List pipeline runs&lt;/span&gt;
tkn pipelinerun list &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines

&lt;span class="c"&gt;# Get latest pipeline run&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PIPELINE_RUN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;tkn pipelinerun list &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.items[0].metadata.name}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Watch pipeline execution&lt;/span&gt;
tkn pipelinerun logs &lt;span class="nv"&gt;$PIPELINE_RUN&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines

&lt;span class="c"&gt;# Check pipeline status&lt;/span&gt;
tkn pipelinerun describe &lt;span class="nv"&gt;$PIPELINE_RUN&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test 2: Verify SageMaker Training Job
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List SageMaker training jobs via ACK&lt;/span&gt;
kubectl get trainingjobs &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines

&lt;span class="c"&gt;# Get training job details&lt;/span&gt;
kubectl describe trainingjob &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines

&lt;span class="c"&gt;# Check training job in AWS Console&lt;/span&gt;
aws sagemaker list-training-jobs &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# View training job logs&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;TRAINING_JOB_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;kubectl get trainingjobs &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.items[0].metadata.name}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
aws logs &lt;span class="nb"&gt;tail&lt;/span&gt; /aws/sagemaker/TrainingJobs &lt;span class="nt"&gt;--follow&lt;/span&gt; &lt;span class="nt"&gt;--log-stream-name-prefix&lt;/span&gt; &lt;span class="nv"&gt;$TRAINING_JOB_NAME&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test 3: Verify Model Deployment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check InferenceService status&lt;/span&gt;
kubectl get inferenceservice &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-serving

&lt;span class="c"&gt;# Get inference endpoint&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;INFERENCE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;kubectl get inferenceservice classifier-model &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-serving &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.status.url}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Inference URL: &lt;/span&gt;&lt;span class="nv"&gt;$INFERENCE_URL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Test inference with sample data&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="nv"&gt;$INFERENCE_URL&lt;/span&gt;/v1/models/classifier-model:predict &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "instances": [
      [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
       1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0]
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test 4: Load Testing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create load test script&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; load-test.sh &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;BASH&lt;/span&gt;&lt;span class="sh"&gt;'
#!/bin/bash
INFERENCE_URL=&lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;&lt;span class="sh"&gt;
REQUESTS=&lt;/span&gt;&lt;span class="nv"&gt;$2&lt;/span&gt;&lt;span class="sh"&gt;

echo "Running &lt;/span&gt;&lt;span class="nv"&gt;$REQUESTS&lt;/span&gt;&lt;span class="sh"&gt; inference requests to &lt;/span&gt;&lt;span class="nv"&gt;$INFERENCE_URL&lt;/span&gt;&lt;span class="sh"&gt;"

for i in &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;seq &lt;/span&gt;1 &lt;span class="nv"&gt;$REQUESTS&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;; do
  curl -s -X POST &lt;/span&gt;&lt;span class="nv"&gt;$INFERENCE_URL&lt;/span&gt;&lt;span class="sh"&gt;/v1/models/classifier-model:predict &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    -H "Content-Type: application/json" &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    -d '{
      "instances": [
        [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0,
         1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0]
      ]
    }' &amp;gt; /dev/null &amp;amp;

  if [ &lt;/span&gt;&lt;span class="k"&gt;$((&lt;/span&gt;i &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;&lt;span class="k"&gt;))&lt;/span&gt;&lt;span class="sh"&gt; -eq 0 ]; then
    echo "Sent &lt;/span&gt;&lt;span class="nv"&gt;$i&lt;/span&gt;&lt;span class="sh"&gt; requests"
  fi
done

wait
echo "Load test completed"
&lt;/span&gt;&lt;span class="no"&gt;BASH

&lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x load-test.sh

&lt;span class="c"&gt;# Run load test&lt;/span&gt;
./load-test.sh &lt;span class="nv"&gt;$INFERENCE_URL&lt;/span&gt; 100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Resource Cleanup
&lt;/h2&gt;

&lt;p&gt;To avoid ongoing AWS charges, follow these steps to clean up all resources.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Delete InferenceServices
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete all InferenceServices&lt;/span&gt;
kubectl delete inferenceservice &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-serving

&lt;span class="c"&gt;# Verify deletion&lt;/span&gt;
kubectl get inferenceservice &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-serving
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Delete Pipelines and Runs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete all pipeline runs&lt;/span&gt;
kubectl delete pipelinerun &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines

&lt;span class="c"&gt;# Delete pipelines&lt;/span&gt;
kubectl delete pipeline mlops-pipeline &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines

&lt;span class="c"&gt;# Delete tasks&lt;/span&gt;
kubectl delete task &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines

&lt;span class="c"&gt;# Delete PVC&lt;/span&gt;
kubectl delete pvc mlops-workspace &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Delete SageMaker Training Jobs
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete ACK SageMaker resources&lt;/span&gt;
kubectl delete trainingjobs &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines

&lt;span class="c"&gt;# Verify in AWS Console or CLI&lt;/span&gt;
aws sagemaker list-training-jobs &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Delete S3 Buckets
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete all objects in buckets&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;rm &lt;/span&gt;s3://&lt;span class="nv"&gt;$ML_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--recursive&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;rm &lt;/span&gt;s3://&lt;span class="nv"&gt;$DATA_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--recursive&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Delete buckets&lt;/span&gt;
aws s3 rb s3://&lt;span class="nv"&gt;$ML_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;
aws s3 rb s3://&lt;span class="nv"&gt;$DATA_BUCKET&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"S3 buckets deleted"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5: Delete ECR Repository
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete ECR repository&lt;/span&gt;
aws ecr delete-repository &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repository-name&lt;/span&gt; mlops/training &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--force&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"ECR repository deleted"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 6: Delete ACK Controllers
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete ACK SageMaker controller&lt;/span&gt;
kubectl delete &lt;span class="nt"&gt;-f&lt;/span&gt; install.yaml

&lt;span class="c"&gt;# Delete ACK namespace&lt;/span&gt;
kubectl delete namespace ack-system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7: Delete ROSA Cluster
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete ROSA cluster (takes ~10-15 minutes)&lt;/span&gt;
rosa delete cluster &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;--yes&lt;/span&gt;

&lt;span class="c"&gt;# Wait for cluster deletion&lt;/span&gt;
rosa logs uninstall &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;--watch&lt;/span&gt;

&lt;span class="c"&gt;# Verify deletion&lt;/span&gt;
rosa list clusters
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 8: Delete IAM Resources
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Detach policies and delete ACK role&lt;/span&gt;
aws iam detach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; ACKSageMakerControllerRole &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:policy/ACKSageMakerPolicy

aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; ACKSageMakerControllerRole

aws iam delete-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:policy/ACKSageMakerPolicy

&lt;span class="c"&gt;# Delete SageMaker execution role&lt;/span&gt;
aws iam delete-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; SageMakerMLOpsExecutionRole &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-name&lt;/span&gt; S3Access

aws iam detach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; SageMakerMLOpsExecutionRole &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::aws:policy/AmazonSageMakerFullAccess

aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; SageMakerMLOpsExecutionRole

&lt;span class="c"&gt;# Delete KServe role&lt;/span&gt;
aws iam delete-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; KServeS3AccessRole &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-name&lt;/span&gt; S3ReadAccess

aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; KServeS3AccessRole

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"IAM resources deleted"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 9: Clean Up Local Files
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Remove temporary files&lt;/span&gt;
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; ack-sagemaker-policy.json
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; ack-trust-policy.json
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; sagemaker-trust-policy.json
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; sagemaker-s3-policy.json
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; kserve-trust-policy.json
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; kserve-s3-policy.json
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; install.yaml
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; sagemaker-training/
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; sample-data/
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; load-test.sh

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Local files cleaned up"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Verification
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify ROSA cluster is deleted&lt;/span&gt;
rosa list clusters

&lt;span class="c"&gt;# Verify S3 buckets are deleted&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;ls&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;mlops

&lt;span class="c"&gt;# Verify ECR repositories are deleted&lt;/span&gt;
aws ecr describe-repositories &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; | &lt;span class="nb"&gt;grep &lt;/span&gt;mlops

&lt;span class="c"&gt;# Verify IAM roles are deleted&lt;/span&gt;
aws iam list-roles | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"ACKSageMaker|SageMakerMLOps|KServeS3"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Cleanup verification complete"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Issue: ACK Controller Cannot Create SageMaker Jobs
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: TrainingJob CR is created but SageMaker job doesn't start&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Verify ACK controller has correct IAM role&lt;/li&gt;
&lt;li&gt;Check service account annotation&lt;/li&gt;
&lt;li&gt;Verify SageMaker execution role exists and has permissions&lt;/li&gt;
&lt;li&gt;Check CloudWatch logs for ACK controller
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check ACK controller logs&lt;/span&gt;
kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; ack-system deployment/ack-sagemaker-controller

&lt;span class="c"&gt;# Verify service account annotation&lt;/span&gt;
kubectl get sa &lt;span class="nt"&gt;-n&lt;/span&gt; ack-system ack-sagemaker-controller &lt;span class="nt"&gt;-o&lt;/span&gt; yaml

&lt;span class="c"&gt;# Test IAM role assumption&lt;/span&gt;
aws sts assume-role-with-web-identity &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-arn&lt;/span&gt; &lt;span class="nv"&gt;$ACK_ROLE_ARN&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-session-name&lt;/span&gt; &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--web-identity-token&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;kubectl create token ack-sagemaker-controller &lt;span class="nt"&gt;-n&lt;/span&gt; ack-system&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Issue: KServe Cannot Pull Model from S3
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: InferenceService stuck in "Downloading" state&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Verify KServe service account has correct IAM role&lt;/li&gt;
&lt;li&gt;Check S3 bucket permissions&lt;/li&gt;
&lt;li&gt;Verify model URI is correct&lt;/li&gt;
&lt;li&gt;Check storage-initializer logs
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check InferenceService status&lt;/span&gt;
kubectl describe inferenceservice &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-serving

&lt;span class="c"&gt;# Check storage-initializer logs&lt;/span&gt;
kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-serving &lt;span class="nt"&gt;-l&lt;/span&gt; serving.kserve.io/inferenceservice&lt;span class="o"&gt;=&lt;/span&gt;classifier-model &lt;span class="nt"&gt;-c&lt;/span&gt; storage-initializer

&lt;span class="c"&gt;# Verify S3 access&lt;/span&gt;
kubectl run aws-cli &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;amazon/aws-cli &lt;span class="nt"&gt;--serviceaccount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;kserve-sa &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-serving &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://&lt;span class="nv"&gt;$ML_BUCKET&lt;/span&gt;/models/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Issue: Pipeline Run Fails
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: PipelineRun shows failed status&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check pipeline run logs&lt;/li&gt;
&lt;li&gt;Verify all parameters are correct&lt;/li&gt;
&lt;li&gt;Check task pod logs&lt;/li&gt;
&lt;li&gt;Verify service account permissions
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# View pipeline run logs&lt;/span&gt;
tkn pipelinerun logs &lt;span class="nv"&gt;$PIPELINE_RUN&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines

&lt;span class="c"&gt;# Check failed task&lt;/span&gt;
kubectl get pods &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nv"&gt;$PIPELINE_RUN&lt;/span&gt;

&lt;span class="c"&gt;# View task pod logs&lt;/span&gt;
kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines &amp;lt;pod-name&amp;gt;

&lt;span class="c"&gt;# Check events&lt;/span&gt;
kubectl get events &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines &lt;span class="nt"&gt;--sort-by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'.lastTimestamp'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Issue: SageMaker Training Job Fails
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: TrainingJob CR shows "Failed" status&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check training container logs in CloudWatch&lt;/li&gt;
&lt;li&gt;Verify training data exists in S3&lt;/li&gt;
&lt;li&gt;Check SageMaker execution role permissions&lt;/li&gt;
&lt;li&gt;Verify container image is accessible
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get training job name&lt;/span&gt;
kubectl get trainingjob &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.items[0].metadata.name}'&lt;/span&gt;

&lt;span class="c"&gt;# Check CloudWatch logs&lt;/span&gt;
aws logs &lt;span class="nb"&gt;tail&lt;/span&gt; /aws/sagemaker/TrainingJobs &lt;span class="nt"&gt;--follow&lt;/span&gt; &lt;span class="nt"&gt;--log-stream-name-prefix&lt;/span&gt; &lt;span class="nv"&gt;$TRAINING_JOB_NAME&lt;/span&gt;

&lt;span class="c"&gt;# List training jobs&lt;/span&gt;
aws sagemaker describe-training-job &lt;span class="nt"&gt;--training-job-name&lt;/span&gt; &lt;span class="nv"&gt;$TRAINING_JOB_NAME&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Issue: High Inference Latency
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: Model serving responses are slow&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scale InferenceService replicas&lt;/li&gt;
&lt;li&gt;Adjust resource requests/limits&lt;/li&gt;
&lt;li&gt;Enable autoscaling&lt;/li&gt;
&lt;li&gt;Check network latency
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Scale InferenceService&lt;/span&gt;
kubectl scale &lt;span class="nt"&gt;--replicas&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3 inferenceservice/classifier-model &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-serving

&lt;span class="c"&gt;# Enable autoscaling&lt;/span&gt;
kubectl patch inferenceservice classifier-model &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-serving &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'[{"op": "add", "path": "/spec/predictor/scaleTarget", "value": 10}]'&lt;/span&gt;

&lt;span class="c"&gt;# Check pod resource usage&lt;/span&gt;
kubectl top pods &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-serving
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Debug Commands
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# View all resources in namespace&lt;/span&gt;
kubectl get all &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines
kubectl get all &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-serving

&lt;span class="c"&gt;# Describe resources&lt;/span&gt;
kubectl describe trainingjob &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines
kubectl describe inferenceservice &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-serving

&lt;span class="c"&gt;# Check logs&lt;/span&gt;
kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; ack-system deployment/ack-sagemaker-controller
kubectl logs &lt;span class="nt"&gt;-n&lt;/span&gt; kserve deployment/kserve-controller-manager

&lt;span class="c"&gt;# View events&lt;/span&gt;
kubectl get events &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-pipelines &lt;span class="nt"&gt;--sort-by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'.lastTimestamp'&lt;/span&gt;
kubectl get events &lt;span class="nt"&gt;-n&lt;/span&gt; mlops-serving &lt;span class="nt"&gt;--sort-by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'.lastTimestamp'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>machinelearning</category>
      <category>kubernetes</category>
      <category>aws</category>
      <category>devops</category>
    </item>
    <item>
      <title>Enterprise-Grade RAG Platform: Orchestrating Amazon Bedrock Agents via Red Hat OpenShift AI</title>
      <dc:creator>Marco Gonzalez</dc:creator>
      <pubDate>Fri, 26 Dec 2025 12:51:53 +0000</pubDate>
      <link>https://dev.to/mgonzalezo/enterprise-grade-rag-platform-orchestrating-amazon-bedrock-agents-via-red-hat-openshift-ai-5ak1</link>
      <guid>https://dev.to/mgonzalezo/enterprise-grade-rag-platform-orchestrating-amazon-bedrock-agents-via-red-hat-openshift-ai-5ak1</guid>
      <description>&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Overview&lt;/li&gt;
&lt;li&gt;Architecture&lt;/li&gt;
&lt;li&gt;Prerequisites&lt;/li&gt;
&lt;li&gt;Phase 1: ROSA Cluster Setup&lt;/li&gt;
&lt;li&gt;Phase 2: Red Hat OpenShift AI Installation&lt;/li&gt;
&lt;li&gt;Phase 3: Amazon Bedrock Integration via PrivateLink&lt;/li&gt;
&lt;li&gt;Phase 4: AWS Glue Data Pipeline&lt;/li&gt;
&lt;li&gt;Phase 5: Milvus Vector Database Deployment&lt;/li&gt;
&lt;li&gt;Phase 6: RAG Application Deployment&lt;/li&gt;
&lt;li&gt;Testing and Validation&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Overview
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;⚠️ IMPORTANT NOTICE - Privacy and Confidentiality&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This implementation guide and architecture documentation uses no customer-specific designs, proprietary architectures, confidential data, or private implementation details are not included in this document.&lt;/p&gt;

&lt;p&gt;The architecture patterns, code samples, and configurations presented here are based on publicly documented AWS and Red Hat best practices and are intended for educational and reference purposes only. Organizations implementing this solution should adapt it to their specific security, compliance, and business requirements while protecting their proprietary design decisions and sensitive information.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Project Purpose
&lt;/h3&gt;

&lt;p&gt;This platform provides an &lt;strong&gt;enterprise-grade Retrieval-Augmented Generation (RAG)&lt;/strong&gt; solution that addresses the primary concern of enterprises: &lt;strong&gt;data privacy and security&lt;/strong&gt;. By leveraging Red Hat OpenShift on AWS (ROSA) to control the data plane while using Amazon Bedrock for AI capabilities, organizations maintain complete control over their sensitive data while accessing state-of-the-art language models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Value Propositions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Privacy-First Architecture&lt;/strong&gt;: All sensitive data remains within your controlled OpenShift environment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secure Connectivity&lt;/strong&gt;: AWS PrivateLink ensures AI model calls never traverse the public internet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise Compliance&lt;/strong&gt;: Meets stringent data governance and compliance requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalable Infrastructure&lt;/strong&gt;: Leverages Kubernetes orchestration for production-grade reliability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best-of-Breed Components&lt;/strong&gt;: Combines Red Hat's enterprise Kubernetes with AWS's managed AI services&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Solution Components
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ROSA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed OpenShift cluster on AWS&lt;/td&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Red Hat OpenShift AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Model serving gateway and ML platform&lt;/td&gt;
&lt;td&gt;Control Plane&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Amazon Bedrock&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Claude 3.5 Sonnet LLM access&lt;/td&gt;
&lt;td&gt;Intelligence Plane&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS PrivateLink&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Secure private connectivity&lt;/td&gt;
&lt;td&gt;Network Security&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS Glue&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Document processing and ETL&lt;/td&gt;
&lt;td&gt;Data Pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Amazon S3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Document storage&lt;/td&gt;
&lt;td&gt;Data Lake&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Milvus&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Vector database for embeddings&lt;/td&gt;
&lt;td&gt;Data Plane&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  High-Level Architecture Diagram
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc8lk7mur8o15jrnxa7vk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc8lk7mur8o15jrnxa7vk.png" alt="architecture_2" width="789" height="829"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Flow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Document Ingestion&lt;/strong&gt;: Documents uploaded to S3 bucket&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ETL Processing&lt;/strong&gt;: AWS Glue crawler discovers and processes documents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding Generation&lt;/strong&gt;: Processed documents sent to Bedrock for embedding generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector Storage&lt;/strong&gt;: Embeddings stored in Milvus running on ROSA&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query Processing&lt;/strong&gt;: User queries received by RAG application&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector Search&lt;/strong&gt;: Application searches Milvus for relevant document chunks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Retrieval&lt;/strong&gt;: Relevant chunks retrieved from vector database&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM Inference&lt;/strong&gt;: RHOAI gateway forwards prompt + context to Bedrock via PrivateLink&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response Generation&lt;/strong&gt;: Claude 3.5 generates response based on retrieved context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Response Delivery&lt;/strong&gt;: Answer returned to user through application&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Security Architecture
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Network Isolation&lt;/strong&gt;: ROSA cluster in private subnets with no public ingress&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PrivateLink Encryption&lt;/strong&gt;: All Bedrock API calls encrypted in transit via AWS PrivateLink&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Sovereignty&lt;/strong&gt;: Document content never leaves controlled environment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RBAC&lt;/strong&gt;: OpenShift role-based access control for all components&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secrets Management&lt;/strong&gt;: OpenShift secrets for API keys and credentials&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Required Accounts and Subscriptions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;strong&gt;AWS Account&lt;/strong&gt; with administrative access&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Red Hat Account&lt;/strong&gt; with OpenShift subscription&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;ROSA Enabled&lt;/strong&gt; in your AWS account (&lt;a href="https://console.aws.amazon.com/rosa/" rel="noopener noreferrer"&gt;Enable ROSA&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Amazon Bedrock Access&lt;/strong&gt; with Claude 3.5 Sonnet model enabled in your region&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Required Tools
&lt;/h3&gt;

&lt;p&gt;Install the following CLI tools on your workstation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# AWS CLI (v2)&lt;/span&gt;
curl &lt;span class="s2"&gt;"https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="s2"&gt;"awscliv2.zip"&lt;/span&gt;
unzip awscliv2.zip
&lt;span class="nb"&gt;sudo&lt;/span&gt; ./aws/install

&lt;span class="c"&gt;# ROSA CLI&lt;/span&gt;
wget https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz
&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xvf&lt;/span&gt; rosa-linux.tar.gz
&lt;span class="nb"&gt;sudo mv &lt;/span&gt;rosa /usr/local/bin/rosa
rosa version

&lt;span class="c"&gt;# OpenShift CLI (oc)&lt;/span&gt;
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz
&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xvf&lt;/span&gt; openshift-client-linux.tar.gz
&lt;span class="nb"&gt;sudo mv &lt;/span&gt;oc kubectl /usr/local/bin/
oc version

&lt;span class="c"&gt;# Helm (v3)&lt;/span&gt;
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS Prerequisites
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Service Quotas
&lt;/h4&gt;

&lt;p&gt;Verify you have adequate service quotas in your target region:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check EC2 vCPU quota (need at least 100 for production ROSA)&lt;/span&gt;
aws service-quotas get-service-quota &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-code&lt;/span&gt; ec2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quota-code&lt;/span&gt; L-1216C47A &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1

&lt;span class="c"&gt;# Check VPC quota&lt;/span&gt;
aws service-quotas get-service-quota &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-code&lt;/span&gt; vpc &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quota-code&lt;/span&gt; L-F678F1CE &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  IAM Permissions
&lt;/h4&gt;

&lt;p&gt;Your AWS IAM user/role needs permissions for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EC2 (VPC, subnets, security groups, instances)&lt;/li&gt;
&lt;li&gt;IAM (roles, policies)&lt;/li&gt;
&lt;li&gt;S3 (buckets, objects)&lt;/li&gt;
&lt;li&gt;Bedrock (InvokeModel, InvokeModelWithResponseStream)&lt;/li&gt;
&lt;li&gt;Glue (crawlers, jobs, databases)&lt;/li&gt;
&lt;li&gt;CloudWatch (logs, metrics)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Knowledge Prerequisites
&lt;/h3&gt;

&lt;p&gt;You should be familiar with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS fundamentals (VPC, IAM, S3)&lt;/li&gt;
&lt;li&gt;Kubernetes basics (pods, deployments, services)&lt;/li&gt;
&lt;li&gt;Basic Linux command line&lt;/li&gt;
&lt;li&gt;YAML configuration files&lt;/li&gt;
&lt;li&gt;REST APIs and HTTP concepts&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Phase 1: ROSA Cluster Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1.1: Configure AWS CLI
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Configure AWS credentials&lt;/span&gt;
aws configure

&lt;span class="c"&gt;# Verify configuration&lt;/span&gt;
aws sts get-caller-identity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1.2: Initialize ROSA
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Log in to Red Hat&lt;/span&gt;
rosa login

&lt;span class="c"&gt;# Verify ROSA prerequisites&lt;/span&gt;
rosa verify quota
rosa verify permissions

&lt;span class="c"&gt;# Initialize ROSA in your AWS account (one-time setup)&lt;/span&gt;
rosa init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1.3: Create ROSA Cluster
&lt;/h3&gt;

&lt;p&gt;Create a ROSA cluster with appropriate specifications for the RAG workload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set environment variables&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLUSTER_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"rag-platform"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MULTI_AZ&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"true"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MACHINE_TYPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"m5.2xlarge"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;COMPUTE_NODES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3

&lt;span class="c"&gt;# Create ROSA cluster (takes ~40 minutes)&lt;/span&gt;
rosa create cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cluster-name&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--multi-az&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compute-machine-type&lt;/span&gt; &lt;span class="nv"&gt;$MACHINE_TYPE&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compute-nodes&lt;/span&gt; &lt;span class="nv"&gt;$COMPUTE_NODES&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--machine-cidr&lt;/span&gt; 10.0.0.0/16 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-cidr&lt;/span&gt; 172.30.0.0/16 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--pod-cidr&lt;/span&gt; 10.128.0.0/14 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--host-prefix&lt;/span&gt; 23 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--yes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Configuration Rationale&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;m5.2xlarge&lt;/strong&gt;: 8 vCPUs, 32 GB RAM per node - suitable for vector database and ML workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 nodes&lt;/strong&gt;: High availability across multiple availability zones&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-AZ&lt;/strong&gt;: Ensures resilience against AZ failures&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1.4: Monitor Cluster Creation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Watch cluster installation progress&lt;/span&gt;
rosa logs &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;--watch&lt;/span&gt;

&lt;span class="c"&gt;# Check cluster status&lt;/span&gt;
rosa describe cluster &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wait until the cluster state shows &lt;code&gt;ready&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1.5: Create Admin User
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create cluster admin user&lt;/span&gt;
rosa create admin &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt;

&lt;span class="c"&gt;# Save the login command output - it will look like:&lt;/span&gt;
&lt;span class="c"&gt;# oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 \&lt;/span&gt;
&lt;span class="c"&gt;#   --username cluster-admin \&lt;/span&gt;
&lt;span class="c"&gt;#   --password &amp;lt;generated-password&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1.6: Connect to Cluster
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Use the login command from previous step&lt;/span&gt;
oc login https://api.rag-platform.xxxx.p1.openshiftapps.com:6443 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--username&lt;/span&gt; cluster-admin &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--password&lt;/span&gt; &amp;lt;your-password&amp;gt;

&lt;span class="c"&gt;# Verify cluster access&lt;/span&gt;
oc cluster-info
oc get nodes
oc get projects
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1.7: Create Project Namespaces
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create namespace for RHOAI&lt;/span&gt;
oc new-project redhat-ods-applications

&lt;span class="c"&gt;# Create namespace for RAG application&lt;/span&gt;
oc new-project rag-application

&lt;span class="c"&gt;# Create namespace for Milvus&lt;/span&gt;
oc new-project milvus
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 2: Red Hat OpenShift AI Installation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 2.1: Install OpenShift AI Operator
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create operator subscription&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: redhat-ods-operator
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: redhat-ods-operator
  namespace: redhat-ods-operator
spec: {}
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: rhods-operator
  namespace: redhat-ods-operator
spec:
  channel: stable
  name: rhods-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Automatic
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2.2: Verify Operator Installation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Wait for operator to be ready (takes 3-5 minutes)&lt;/span&gt;
oc get csv &lt;span class="nt"&gt;-n&lt;/span&gt; redhat-ods-operator &lt;span class="nt"&gt;-w&lt;/span&gt;

&lt;span class="c"&gt;# Verify operator is running&lt;/span&gt;
oc get pods &lt;span class="nt"&gt;-n&lt;/span&gt; redhat-ods-operator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see the &lt;code&gt;rhods-operator&lt;/code&gt; pod in &lt;code&gt;Running&lt;/code&gt; state.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2.3: Create DataScienceCluster
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create the DataScienceCluster custom resource&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
  name: default-dsc
spec:
  components:
    codeflare:
      managementState: Removed
    dashboard:
      managementState: Managed
    datasciencepipelines:
      managementState: Managed
    kserve:
      managementState: Managed
      serving:
        ingressGateway:
          certificate:
            type: SelfSigned
        managementState: Managed
        name: knative-serving
    modelmeshserving:
      managementState: Managed
    ray:
      managementState: Removed
    workbenches:
      managementState: Managed
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2.4: Verify RHOAI Installation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check DataScienceCluster status&lt;/span&gt;
oc get datasciencecluster &lt;span class="nt"&gt;-n&lt;/span&gt; redhat-ods-operator

&lt;span class="c"&gt;# Verify all RHOAI components are running&lt;/span&gt;
oc get pods &lt;span class="nt"&gt;-n&lt;/span&gt; redhat-ods-applications
oc get pods &lt;span class="nt"&gt;-n&lt;/span&gt; redhat-ods-monitoring

&lt;span class="c"&gt;# Get RHOAI dashboard URL&lt;/span&gt;
oc get route rhods-dashboard &lt;span class="nt"&gt;-n&lt;/span&gt; redhat-ods-applications &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.spec.host}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Access the dashboard URL in your browser and log in with your OpenShift credentials.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2.5: Configure Model Serving
&lt;/h3&gt;

&lt;p&gt;Create a serving runtime for Amazon Bedrock integration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create custom serving runtime for Bedrock&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  name: bedrock-runtime
  namespace: rag-application
  labels:
    opendatahub.io/dashboard: "true"
spec:
  annotations:
    prometheus.io/path: /metrics
    prometheus.io/port: "8080"
  containers:
  - name: kserve-container
    image: quay.io/modh/rest-proxy:latest
    env:
    - name: AWS_REGION
      value: "us-east-1"
    - name: BEDROCK_ENDPOINT_URL
      value: "bedrock-runtime.us-east-1.amazonaws.com"
    ports:
    - containerPort: 8080
      protocol: TCP
    resources:
      limits:
        cpu: "2"
        memory: 4Gi
      requests:
        cpu: "1"
        memory: 2Gi
  supportedModelFormats:
  - autoSelect: true
    name: bedrock
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 3: Amazon Bedrock Integration via PrivateLink
&lt;/h2&gt;

&lt;p&gt;This phase establishes secure, private connectivity between your ROSA cluster and Amazon Bedrock using AWS PrivateLink.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3.1: Enable Amazon Bedrock
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable Bedrock in your region (if not already enabled)&lt;/span&gt;
aws bedrock list-foundation-models &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1

&lt;span class="c"&gt;# Request access to Claude 3.5 Sonnet (if needed)&lt;/span&gt;
&lt;span class="c"&gt;# Go to AWS Console &amp;gt; Bedrock &amp;gt; Model access&lt;/span&gt;
&lt;span class="c"&gt;# Or use the CLI:&lt;/span&gt;
aws bedrock put-model-invocation-logging-configuration &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--logging-config&lt;/span&gt; &lt;span class="s1"&gt;'{"cloudWatchConfig":{"logGroupName":"/aws/bedrock/modelinvocations","roleArn":"arn:aws:iam::ACCOUNT_ID:role/BedrockLoggingRole"}}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3.2: Identify ROSA VPC
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get the VPC ID of your ROSA cluster&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ROSA_VPC_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 describe-vpcs &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--filters&lt;/span&gt; &lt;span class="s2"&gt;"Name=tag:Name,Values=*&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CLUSTER_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;*"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Vpcs[0].VpcId'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"ROSA VPC ID: &lt;/span&gt;&lt;span class="nv"&gt;$ROSA_VPC_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Get private subnet IDs&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PRIVATE_SUBNET_IDS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 describe-subnets &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--filters&lt;/span&gt; &lt;span class="s2"&gt;"Name=vpc-id,Values=&lt;/span&gt;&lt;span class="nv"&gt;$ROSA_VPC_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"Name=tag:Name,Values=*private*"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Subnets[*].SubnetId'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Private Subnets: &lt;/span&gt;&lt;span class="nv"&gt;$PRIVATE_SUBNET_IDS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3.3: Create VPC Endpoint for Bedrock
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create security group for VPC endpoint&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;VPC_ENDPOINT_SG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 create-security-group &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--group-name&lt;/span&gt; bedrock-vpc-endpoint-sg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--description&lt;/span&gt; &lt;span class="s2"&gt;"Security group for Bedrock VPC endpoint"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vpc-id&lt;/span&gt; &lt;span class="nv"&gt;$ROSA_VPC_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'GroupId'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"VPC Endpoint Security Group: &lt;/span&gt;&lt;span class="nv"&gt;$VPC_ENDPOINT_SG&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Allow HTTPS traffic from ROSA worker nodes&lt;/span&gt;
aws ec2 authorize-security-group-ingress &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--group-id&lt;/span&gt; &lt;span class="nv"&gt;$VPC_ENDPOINT_SG&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--protocol&lt;/span&gt; tcp &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 443 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cidr&lt;/span&gt; 10.0.0.0/16 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Create VPC endpoint for Bedrock Runtime&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BEDROCK_VPC_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 create-vpc-endpoint &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vpc-id&lt;/span&gt; &lt;span class="nv"&gt;$ROSA_VPC_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vpc-endpoint-type&lt;/span&gt; Interface &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-name&lt;/span&gt; com.amazonaws.&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;.bedrock-runtime &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--subnet-ids&lt;/span&gt; &lt;span class="nv"&gt;$PRIVATE_SUBNET_IDS&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--security-group-ids&lt;/span&gt; &lt;span class="nv"&gt;$VPC_ENDPOINT_SG&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--private-dns-enabled&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'VpcEndpoint.VpcEndpointId'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Bedrock VPC Endpoint: &lt;/span&gt;&lt;span class="nv"&gt;$BEDROCK_VPC_ENDPOINT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Wait for VPC endpoint to be available&lt;/span&gt;
aws ec2 &lt;span class="nb"&gt;wait &lt;/span&gt;vpc-endpoint-available &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vpc-endpoint-ids&lt;/span&gt; &lt;span class="nv"&gt;$BEDROCK_VPC_ENDPOINT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"VPC Endpoint is now available"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3.4: Create IAM Role for Bedrock Access
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create IAM policy for Bedrock access&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; bedrock-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0"
      ]
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;aws iam create-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-name&lt;/span&gt; BedrockInvokePolicy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-document&lt;/span&gt; file://bedrock-policy.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Create trust policy for ROSA service account&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;rosa describe cluster &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; json | jq &lt;span class="nt"&gt;-r&lt;/span&gt; .aws.sts.oidc_endpoint_url | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s|https://||'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; trust-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity &lt;span class="nt"&gt;--query&lt;/span&gt; Account &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;:oidc-provider/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:sub": "system:serviceaccount:rag-application:bedrock-sa"
        }
      }
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create IAM role&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BEDROCK_ROLE_ARN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws iam create-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; rosa-bedrock-access &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assume-role-policy-document&lt;/span&gt; file://trust-policy.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Role.Arn'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Bedrock IAM Role ARN: &lt;/span&gt;&lt;span class="nv"&gt;$BEDROCK_ROLE_ARN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Attach policy to role&lt;/span&gt;
aws iam attach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; rosa-bedrock-access &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity &lt;span class="nt"&gt;--query&lt;/span&gt; Account &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;:policy/BedrockInvokePolicy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3.5: Create Service Account in OpenShift
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create service account with IAM role annotation&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: bedrock-sa
  namespace: rag-application
  annotations:
    eks.amazonaws.com/role-arn: &lt;/span&gt;&lt;span class="nv"&gt;$BEDROCK_ROLE_ARN&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Verify service account&lt;/span&gt;
oc get sa bedrock-sa &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3.6: Test Bedrock Connectivity
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create test pod with AWS CLI&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: bedrock-test
  namespace: rag-application
spec:
  serviceAccountName: bedrock-sa
  containers:
  - name: aws-cli
    image: amazon/aws-cli:latest
    command: ["/bin/sleep", "3600"]
    env:
    - name: AWS_REGION
      value: "&lt;/span&gt;&lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;&lt;span class="sh"&gt;"
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Wait for pod to be ready&lt;/span&gt;
oc &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="nt"&gt;--for&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;condition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ready pod/bedrock-test &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application &lt;span class="nt"&gt;--timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;300s

&lt;span class="c"&gt;# Test Bedrock API call&lt;/span&gt;
oc &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application bedrock-test &lt;span class="nt"&gt;--&lt;/span&gt; aws bedrock-runtime invoke-model &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-id&lt;/span&gt; anthropic.claude-3-5-sonnet-20241022-v2:0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--content-type&lt;/span&gt; application/json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--accept&lt;/span&gt; application/json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--body&lt;/span&gt; &lt;span class="s1"&gt;'{"anthropic_version":"bedrock-2023-05-31","max_tokens":100,"messages":[{"role":"user","content":"Hello, this is a test"}]}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  /tmp/response.json

&lt;span class="c"&gt;# Check the response&lt;/span&gt;
oc &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application bedrock-test &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nb"&gt;cat&lt;/span&gt; /tmp/response.json

&lt;span class="c"&gt;# Clean up test pod&lt;/span&gt;
oc delete pod bedrock-test &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If successful, you should see a JSON response from Claude.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase 4: AWS Glue Data Pipeline
&lt;/h2&gt;

&lt;p&gt;This phase sets up AWS Glue to process documents from S3 and prepare them for vectorization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4.1: Create S3 Bucket for Documents
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create S3 bucket (name must be globally unique)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity &lt;span class="nt"&gt;--query&lt;/span&gt; Account &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"rag-documents-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

aws s3 mb s3://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Enable versioning&lt;/span&gt;
aws s3api put-bucket-versioning &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--versioning-configuration&lt;/span&gt; &lt;span class="nv"&gt;Status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Enabled &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Create folder structure&lt;/span&gt;
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; raw-documents/
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; processed-documents/
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; embeddings/

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"S3 Bucket created: s3://&lt;/span&gt;&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4.2: Create IAM Role for Glue
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create trust policy for Glue&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; glue-trust-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "glue.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create Glue service role&lt;/span&gt;
aws iam create-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; AWSGlueServiceRole-RAG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assume-role-policy-document&lt;/span&gt; file://glue-trust-policy.json

&lt;span class="c"&gt;# Attach AWS managed policy&lt;/span&gt;
aws iam attach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; AWSGlueServiceRole-RAG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole

&lt;span class="c"&gt;# Create custom policy for S3 access&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; glue-s3-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject"
      ],
      "Resource": [
        "arn:aws:s3:::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
      ]
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;aws iam put-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; AWSGlueServiceRole-RAG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-name&lt;/span&gt; S3Access &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-document&lt;/span&gt; file://glue-s3-policy.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4.3: Create Glue Database
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create Glue database&lt;/span&gt;
aws glue create-database &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{
    "Name": "rag_documents_db",
    "Description": "Database for RAG document metadata"
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Verify database creation&lt;/span&gt;
aws glue get-database &lt;span class="nt"&gt;--name&lt;/span&gt; rag_documents_db &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4.4: Create Glue Crawler
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create crawler for raw documents&lt;/span&gt;
aws glue create-crawler &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; rag-document-crawler &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt; arn:aws:iam::&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:role/AWSGlueServiceRole-RAG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--database-name&lt;/span&gt; rag_documents_db &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--targets&lt;/span&gt; &lt;span class="s1"&gt;'{
    "S3Targets": [
      {
        "Path": "s3://'&lt;/span&gt;&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;&lt;span class="s1"&gt;'/raw-documents/"
      }
    ]
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--schema-change-policy&lt;/span&gt; &lt;span class="s1"&gt;'{
    "UpdateBehavior": "UPDATE_IN_DATABASE",
    "DeleteBehavior": "LOG"
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Start the crawler&lt;/span&gt;
aws glue start-crawler &lt;span class="nt"&gt;--name&lt;/span&gt; rag-document-crawler &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Glue crawler created and started"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4.5: Create Glue ETL Job
&lt;/h3&gt;

&lt;p&gt;Create a Python script for document processing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create ETL script&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; glue-etl-script.py &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;PYTHON_SCRIPT&lt;/span&gt;&lt;span class="sh"&gt;'
import sys
import boto3
import json
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame

# Initialize
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'BUCKET_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

bucket_name = args['BUCKET_NAME']
s3_client = boto3.client('s3')

# Read documents from Glue catalog
datasource = glueContext.create_dynamic_frame.from_catalog(
    database="rag_documents_db",
    table_name="raw_documents"
)

# Document processing function
def process_document(record):
    """
    Process document: chunk text, extract metadata
    """
    # Simple chunking strategy (500 chars with 50 char overlap)
    text = record.get('content', '')
    chunk_size = 500
    overlap = 50

    chunks = []
    for i in range(0, len(text), chunk_size - overlap):
        chunk = text[i:i + chunk_size]
        if chunk:
            chunks.append({
                'document_id': record.get('document_id'),
                'chunk_id': f"{record.get('document_id')}_{i}",
                'chunk_text': chunk,
                'chunk_index': i // (chunk_size - overlap),
                'metadata': {
                    'source': record.get('source', ''),
                    'timestamp': record.get('timestamp', ''),
                    'file_type': record.get('file_type', '')
                }
            })

    return chunks

# Process and write to S3
def process_and_write():
    records = datasource.toDF().collect()
    all_chunks = []

    for record in records:
        chunks = process_document(record.asDict())
        all_chunks.extend(chunks)

    # Write chunks to S3 as JSON
    for chunk in all_chunks:
        key = f"processed-documents/{chunk['chunk_id']}.json"
        s3_client.put_object(
            Bucket=bucket_name,
            Key=key,
            Body=json.dumps(chunk),
            ContentType='application/json'
        )

    print(f"Processed {len(all_chunks)} chunks from {len(records)} documents")

process_and_write()

job.commit()
&lt;/span&gt;&lt;span class="no"&gt;PYTHON_SCRIPT

&lt;/span&gt;&lt;span class="c"&gt;# Upload script to S3&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;glue-etl-script.py s3://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;/glue-scripts/

&lt;span class="c"&gt;# Create Glue job&lt;/span&gt;
aws glue create-job &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; rag-document-processor &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt; arn:aws:iam::&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:role/AWSGlueServiceRole-RAG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--command&lt;/span&gt; &lt;span class="s1"&gt;'{
    "Name": "glueetl",
    "ScriptLocation": "s3://'&lt;/span&gt;&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;&lt;span class="s1"&gt;'/glue-scripts/glue-etl-script.py",
    "PythonVersion": "3"
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--default-arguments&lt;/span&gt; &lt;span class="s1"&gt;'{
    "--BUCKET_NAME": "'&lt;/span&gt;&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;&lt;span class="s1"&gt;'",
    "--job-language": "python",
    "--enable-metrics": "true",
    "--enable-continuous-cloudwatch-log": "true"
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--glue-version&lt;/span&gt; &lt;span class="s2"&gt;"4.0"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-retries&lt;/span&gt; 0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--timeout&lt;/span&gt; 60 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Glue ETL job created"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4.6: Test Glue Pipeline
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Upload sample document&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; sample-document.txt &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
This is a sample document for testing the RAG pipeline.
It contains multiple sentences that will be chunked and processed.
The Glue ETL job will extract this content and prepare it for vectorization.
This demonstrates the data pipeline from S3 to processed chunks.
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Upload to S3&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;sample-document.txt s3://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;/raw-documents/

&lt;span class="c"&gt;# Run crawler to detect new file&lt;/span&gt;
aws glue start-crawler &lt;span class="nt"&gt;--name&lt;/span&gt; rag-document-crawler &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Wait for crawler to complete (check status)&lt;/span&gt;
aws glue get-crawler &lt;span class="nt"&gt;--name&lt;/span&gt; rag-document-crawler &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Crawler.State'&lt;/span&gt;

&lt;span class="c"&gt;# Run ETL job&lt;/span&gt;
aws glue start-job-run &lt;span class="nt"&gt;--job-name&lt;/span&gt; rag-document-processor &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Check processed outputs&lt;/span&gt;
&lt;span class="nb"&gt;sleep &lt;/span&gt;60
aws s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;/processed-documents/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 5: Milvus Vector Database Deployment
&lt;/h2&gt;

&lt;p&gt;Deploy Milvus on your ROSA cluster to store and search document embeddings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 5.1: Install Milvus Operator
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add Milvus Helm repository&lt;/span&gt;
helm repo add milvus https://milvus-io.github.io/milvus-helm/
helm repo update

&lt;span class="c"&gt;# Install Milvus operator&lt;/span&gt;
helm &lt;span class="nb"&gt;install &lt;/span&gt;milvus-operator milvus/milvus-operator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; milvus &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--create-namespace&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; operator.image.tag&lt;span class="o"&gt;=&lt;/span&gt;v0.9.0

&lt;span class="c"&gt;# Verify operator installation&lt;/span&gt;
oc get pods &lt;span class="nt"&gt;-n&lt;/span&gt; milvus
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5.2: Create Persistent Storage
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create PersistentVolumeClaims for Milvus&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: milvus-etcd-pvc
  namespace: milvus
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: gp3-csi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: milvus-minio-pvc
  namespace: milvus
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: gp3-csi
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5.3: Deploy Milvus Cluster
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create Milvus cluster configuration&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; milvus-values.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
cluster:
  enabled: true

service:
  type: ClusterIP
  port: 19530

standalone:
  replicas: 1
  resources:
    limits:
      cpu: "4"
      memory: 8Gi
    requests:
      cpu: "2"
      memory: 4Gi

etcd:
  replicaCount: 1
  persistence:
    enabled: true
    existingClaim: milvus-etcd-pvc

minio:
  mode: standalone
  persistence:
    enabled: true
    existingClaim: milvus-minio-pvc

pulsar:
  enabled: false

kafka:
  enabled: false

metrics:
  enabled: true
  serviceMonitor:
    enabled: true
&lt;/span&gt;&lt;span class="no"&gt;
EOF

&lt;/span&gt;&lt;span class="c"&gt;# Install Milvus&lt;/span&gt;
helm &lt;span class="nb"&gt;install &lt;/span&gt;milvus milvus/milvus &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; milvus &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--values&lt;/span&gt; milvus-values.yaml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--wait&lt;/span&gt;

&lt;span class="c"&gt;# Verify Milvus installation&lt;/span&gt;
oc get pods &lt;span class="nt"&gt;-n&lt;/span&gt; milvus
oc get svc &lt;span class="nt"&gt;-n&lt;/span&gt; milvus
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5.4: Configure Milvus Access
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get Milvus service endpoint&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MILVUS_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;oc get svc milvus &lt;span class="nt"&gt;-n&lt;/span&gt; milvus &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.spec.clusterIP}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MILVUS_PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;19530

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Milvus Endpoint: &lt;/span&gt;&lt;span class="nv"&gt;$MILVUS_HOST&lt;/span&gt;&lt;span class="s2"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;$MILVUS_PORT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Create config map with Milvus connection details&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: milvus-config
  namespace: rag-application
data:
  MILVUS_HOST: "&lt;/span&gt;&lt;span class="nv"&gt;$MILVUS_HOST&lt;/span&gt;&lt;span class="sh"&gt;"
  MILVUS_PORT: "&lt;/span&gt;&lt;span class="nv"&gt;$MILVUS_PORT&lt;/span&gt;&lt;span class="sh"&gt;"
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5.5: Test Milvus Connectivity
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create test pod with pymilvus&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: milvus-test
  namespace: rag-application
spec:
  containers:
  - name: python
    image: python:3.11-slim
    command: ["/bin/sleep", "3600"]
    env:
    - name: MILVUS_HOST
      valueFrom:
        configMapKeyRef:
          name: milvus-config
          key: MILVUS_HOST
    - name: MILVUS_PORT
      valueFrom:
        configMapKeyRef:
          name: milvus-config
          key: MILVUS_PORT
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Wait for pod&lt;/span&gt;
oc &lt;span class="nb"&gt;wait&lt;/span&gt; &lt;span class="nt"&gt;--for&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;condition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ready pod/milvus-test &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application &lt;span class="nt"&gt;--timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;120s

&lt;span class="c"&gt;# Install pymilvus and test connection&lt;/span&gt;
oc &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application milvus-test &lt;span class="nt"&gt;--&lt;/span&gt; bash &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"
pip install pymilvus &amp;amp;&amp;amp; python3 &amp;lt;&amp;lt;PYTHON
from pymilvus import connections, utility
import os

connections.connect(
    alias='default',
    host=os.environ['MILVUS_HOST'],
    port=os.environ['MILVUS_PORT']
)

print('Connected to Milvus successfully!')
print('Milvus version:', utility.get_server_version())
PYTHON
"&lt;/span&gt;

&lt;span class="c"&gt;# Clean up test pod&lt;/span&gt;
oc delete pod milvus-test &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5.6: Create Milvus Collection
&lt;/h3&gt;

&lt;p&gt;Create a test collection for document embeddings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create initialization job&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata:
  name: milvus-init
  namespace: rag-application
spec:
  template:
    spec:
      containers:
      - name: init
        image: python:3.11-slim
        env:
        - name: MILVUS_HOST
          valueFrom:
            configMapKeyRef:
              name: milvus-config
              key: MILVUS_HOST
        - name: MILVUS_PORT
          valueFrom:
            configMapKeyRef:
              name: milvus-config
              key: MILVUS_PORT
        command:
        - /bin/bash
        - -c
        - |
          pip install pymilvus
          python3 &amp;lt;&amp;lt;PYTHON
          from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection
          import os

          # Connect to Milvus
          connections.connect(
              alias='default',
              host=os.environ['MILVUS_HOST'],
              port=os.environ['MILVUS_PORT']
          )

          # Define collection schema
          fields = [
              FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True),
              FieldSchema(name='chunk_id', dtype=DataType.VARCHAR, max_length=256),
              FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=1024),
              FieldSchema(name='text', dtype=DataType.VARCHAR, max_length=65535),
              FieldSchema(name='metadata', dtype=DataType.JSON)
          ]

          schema = CollectionSchema(
              fields=fields,
              description='RAG document embeddings collection'
          )

          # Create collection
          collection = Collection(
              name='rag_documents',
              schema=schema
          )

          # Create index
          index_params = {
              'metric_type': 'L2',
              'index_type': 'IVF_FLAT',
              'params': {'nlist': 128}
          }

          collection.create_index(
              field_name='embedding',
              index_params=index_params
          )

          print(f'Collection created: {collection.name}')
          print(f'Number of entities: {collection.num_entities}')
          PYTHON
      restartPolicy: Never
  backoffLimit: 3
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Check job status&lt;/span&gt;
oc logs job/milvus-init &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase 6: RAG Application Deployment
&lt;/h2&gt;

&lt;p&gt;Deploy the RAG application that orchestrates the entire pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 6.1: Create Application Code
&lt;/h3&gt;

&lt;p&gt;Create the RAG application source code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create application directory structure&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; rag-app/&lt;span class="o"&gt;{&lt;/span&gt;src,config,tests&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# Create requirements.txt&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; rag-app/requirements.txt &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
pymilvus==2.3.3
boto3==1.29.7
langchain==0.0.350
langchain-community==0.0.1
python-dotenv==1.0.0
httpx==0.25.2
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create main application&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; rag-app/src/main.py &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;PYTHON_CODE&lt;/span&gt;&lt;span class="sh"&gt;'
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional, Dict, Any
import os
import json
import boto3
from pymilvus import connections, Collection
import logging

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Initialize FastAPI app
app = FastAPI(
    title="Enterprise RAG API",
    description="RAG platform using OpenShift AI, Bedrock, and Milvus",
    version="1.0.0"
)

# Configuration
MILVUS_HOST = os.getenv("MILVUS_HOST", "milvus.milvus.svc.cluster.local")
MILVUS_PORT = int(os.getenv("MILVUS_PORT", "19530"))
AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
BEDROCK_MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"
COLLECTION_NAME = "rag_documents"

# Initialize clients
bedrock_runtime = None
milvus_collection = None

@app.on_event("startup")
async def startup_event():
    """Initialize connections on startup"""
    global bedrock_runtime, milvus_collection

    try:
        # Connect to Milvus
        connections.connect(
            alias="default",
            host=MILVUS_HOST,
            port=MILVUS_PORT
        )
        milvus_collection = Collection(COLLECTION_NAME)
        milvus_collection.load()
        logger.info(f"Connected to Milvus collection: {COLLECTION_NAME}")

        # Initialize Bedrock client
        bedrock_runtime = boto3.client(
            service_name='bedrock-runtime',
            region_name=AWS_REGION
        )
        logger.info("Initialized Bedrock client")

    except Exception as e:
        logger.error(f"Startup error: {str(e)}")
        raise

@app.on_event("shutdown")
async def shutdown_event():
    """Cleanup on shutdown"""
    try:
        connections.disconnect("default")
        logger.info("Disconnected from Milvus")
    except Exception as e:
        logger.error(f"Shutdown error: {str(e)}")

# Request/Response models
class QueryRequest(BaseModel):
    query: str
    top_k: Optional[int] = 5
    max_tokens: Optional[int] = 1000

class QueryResponse(BaseModel):
    answer: str
    sources: List[Dict[str, Any]]
    metadata: Dict[str, Any]

class HealthResponse(BaseModel):
    status: str
    milvus_connected: bool
    bedrock_available: bool

# API endpoints
@app.get("/health", response_model=HealthResponse)
async def health_check():
    """Health check endpoint"""
    milvus_ok = False
    bedrock_ok = False

    try:
        if milvus_collection:
            milvus_collection.num_entities
            milvus_ok = True
    except:
        pass

    try:
        if bedrock_runtime:
            bedrock_ok = True
    except:
        pass

    return HealthResponse(
        status="healthy" if (milvus_ok and bedrock_ok) else "degraded",
        milvus_connected=milvus_ok,
        bedrock_available=bedrock_ok
    )

@app.post("/query", response_model=QueryResponse)
async def query_rag(request: QueryRequest):
    """
    Process RAG query:
    1. Generate embedding for query
    2. Search similar documents in Milvus
    3. Construct prompt with context
    4. Call Bedrock for generation
    """
    try:
        # Step 1: Generate query embedding using Bedrock
        query_embedding = await generate_embedding(request.query)

        # Step 2: Search Milvus for similar documents
        search_params = {
            "metric_type": "L2",
            "params": {"nprobe": 10}
        }

        results = milvus_collection.search(
            data=[query_embedding],
            anns_field="embedding",
            param=search_params,
            limit=request.top_k,
            output_fields=["chunk_id", "text", "metadata"]
        )

        # Extract context from search results
        contexts = []
        sources = []
        for hit in results[0]:
            contexts.append(hit.entity.get("text"))
            sources.append({
                "chunk_id": hit.entity.get("chunk_id"),
                "score": float(hit.score),
                "metadata": hit.entity.get("metadata")
            })

        # Step 3: Construct prompt with context
        context_text = "&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;".join([f"Document {i+1}:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;{ctx}" for i, ctx in enumerate(contexts)])

        prompt = f"""You are a helpful AI assistant. Use the following context to answer the user's question.
        If the answer cannot be found in the context, say so.

Context:
{context_text}

User Question: {request.query}

Answer:"""

        # Step 4: Call Bedrock for generation
        response = bedrock_runtime.invoke_model(
            modelId=BEDROCK_MODEL_ID,
            contentType="application/json",
            accept="application/json",
            body=json.dumps({
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": request.max_tokens,
                "messages": [
                    {
                        "role": "user",
                        "content": prompt
                    }
                ],
                "temperature": 0.7
            })
        )

        response_body = json.loads(response['body'].read())
        answer = response_body['content'][0]['text']

        return QueryResponse(
            answer=answer,
            sources=sources,
            metadata={
                "query": request.query,
                "num_sources": len(sources),
                "model": BEDROCK_MODEL_ID
            }
        )

    except Exception as e:
        logger.error(f"Query error: {str(e)}")
        raise HTTPException(status_code=500, detail=str(e))

async def generate_embedding(text: str) -&amp;gt; List[float]:
    """Generate embedding using Bedrock Titan Embeddings"""
    try:
        response = bedrock_runtime.invoke_model(
            modelId="amazon.titan-embed-text-v2:0",
            contentType="application/json",
            accept="application/json",
            body=json.dumps({
                "inputText": text,
                "dimensions": 1024,
                "normalize": True
            })
        )

        response_body = json.loads(response['body'].read())
        return response_body['embedding']

    except Exception as e:
        logger.error(f"Embedding generation error: {str(e)}")
        raise

@app.get("/")
async def root():
    """Root endpoint"""
    return {
        "message": "Enterprise RAG API",
        "version": "1.0.0",
        "endpoints": {
            "health": "/health",
            "query": "/query",
            "docs": "/docs"
        }
    }

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
&lt;/span&gt;&lt;span class="no"&gt;PYTHON_CODE

&lt;/span&gt;&lt;span class="c"&gt;# Create Dockerfile&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; rag-app/Dockerfile &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY src/ ./src/

# Expose port
EXPOSE 8000

# Run application
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 6.2: Build and Push Container Image
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Build container image (using podman or docker)&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;rag-app

&lt;span class="c"&gt;# Option 1: Build with podman&lt;/span&gt;
podman build &lt;span class="nt"&gt;-t&lt;/span&gt; rag-application:v1.0 &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Option 2: Build with docker&lt;/span&gt;
&lt;span class="c"&gt;# docker build -t rag-application:v1.0 .&lt;/span&gt;

&lt;span class="c"&gt;# Tag for OpenShift internal registry&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;IMAGE_REGISTRY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;oc get route default-route &lt;span class="nt"&gt;-n&lt;/span&gt; openshift-image-registry &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.spec.host}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Login to OpenShift registry&lt;/span&gt;
podman login &lt;span class="nt"&gt;-u&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;oc &lt;span class="nb"&gt;whoami&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;oc &lt;span class="nb"&gt;whoami&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$IMAGE_REGISTRY&lt;/span&gt; &lt;span class="nt"&gt;--tls-verify&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;

&lt;span class="c"&gt;# Create image stream&lt;/span&gt;
oc create imagestream rag-application &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application

&lt;span class="c"&gt;# Tag and push&lt;/span&gt;
podman tag rag-application:v1.0 &lt;span class="nv"&gt;$IMAGE_REGISTRY&lt;/span&gt;/rag-application/rag-application:v1.0
podman push &lt;span class="nv"&gt;$IMAGE_REGISTRY&lt;/span&gt;/rag-application/rag-application:v1.0 &lt;span class="nt"&gt;--tls-verify&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false

cd&lt;/span&gt; ..
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 6.3: Deploy Application to OpenShift
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create deployment&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rag-application
  namespace: rag-application
  labels:
    app: rag-application
spec:
  replicas: 2
  selector:
    matchLabels:
      app: rag-application
  template:
    metadata:
      labels:
        app: rag-application
    spec:
      serviceAccountName: bedrock-sa
      containers:
      - name: app
        image: image-registry.openshift-image-registry.svc:5000/rag-application/rag-application:v1.0
        ports:
        - containerPort: 8000
          protocol: TCP
        env:
        - name: MILVUS_HOST
          valueFrom:
            configMapKeyRef:
              name: milvus-config
              key: MILVUS_HOST
        - name: MILVUS_PORT
          valueFrom:
            configMapKeyRef:
              name: milvus-config
              key: MILVUS_PORT
        - name: AWS_REGION
          value: "us-east-1"
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "2"
            memory: "4Gi"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: rag-application
  namespace: rag-application
spec:
  selector:
    app: rag-application
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000
  type: ClusterIP
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: rag-application
  namespace: rag-application
spec:
  to:
    kind: Service
    name: rag-application
  port:
    targetPort: 8000
  tls:
    termination: edge
    insecureEdgeTerminationPolicy: Redirect
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 6.4: Verify Deployment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check deployment status&lt;/span&gt;
oc get deployment rag-application &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application
oc get pods &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;rag-application

&lt;span class="c"&gt;# Get application URL&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;RAG_APP_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;oc get route rag-application &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.spec.host}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"RAG Application URL: https://&lt;/span&gt;&lt;span class="nv"&gt;$RAG_APP_URL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Test health endpoint&lt;/span&gt;
curl https://&lt;span class="nv"&gt;$RAG_APP_URL&lt;/span&gt;/health

&lt;span class="c"&gt;# View application logs&lt;/span&gt;
oc logs &lt;span class="nt"&gt;-f&lt;/span&gt; deployment/rag-application &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Testing and Validation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  End-to-End Testing
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Test 1: Document Ingestion and Processing
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Upload test documents to S3&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; test-doc-1.txt &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
Red Hat OpenShift is an enterprise Kubernetes platform that provides
a complete application platform for developing and deploying containerized
applications. It includes integrated CI/CD, monitoring, and developer tools.
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; test-doc-2.txt &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
Amazon Bedrock is a fully managed service that offers foundation models
from leading AI companies through a single API. It provides access to
models like Claude, Llama, and Stable Diffusion for various use cases.
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Upload to S3&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;test-doc-1.txt s3://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;/raw-documents/
aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;test-doc-2.txt s3://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;/raw-documents/

&lt;span class="c"&gt;# Trigger Glue crawler&lt;/span&gt;
aws glue start-crawler &lt;span class="nt"&gt;--name&lt;/span&gt; rag-document-crawler &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Wait and run ETL job&lt;/span&gt;
&lt;span class="nb"&gt;sleep &lt;/span&gt;120
aws glue start-job-run &lt;span class="nt"&gt;--job-name&lt;/span&gt; rag-document-processor &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Check processed documents&lt;/span&gt;
&lt;span class="nb"&gt;sleep &lt;/span&gt;60
aws s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;/processed-documents/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Test 2: Embedding Generation and Vector Storage
&lt;/h4&gt;

&lt;p&gt;Create a job to process documents into Milvus:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create embedding job&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata:
  name: embed-documents
  namespace: rag-application
spec:
  template:
    spec:
      serviceAccountName: bedrock-sa
      containers:
      - name: embedder
        image: python:3.11-slim
        env:
        - name: MILVUS_HOST
          valueFrom:
            configMapKeyRef:
              name: milvus-config
              key: MILVUS_HOST
        - name: MILVUS_PORT
          valueFrom:
            configMapKeyRef:
              name: milvus-config
              key: MILVUS_PORT
        - name: AWS_REGION
          value: "us-east-1"
        - name: BUCKET_NAME
          value: "&lt;/span&gt;&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;&lt;span class="sh"&gt;"
        command:
        - /bin/bash
        - -c
        - |
          pip install pymilvus boto3
          python3 &amp;lt;&amp;lt;PYTHON
          import boto3
          import json
          import os
          from pymilvus import connections, Collection

          # Connect to services
          s3 = boto3.client('s3')
          bedrock = boto3.client('bedrock-runtime', region_name=os.environ['AWS_REGION'])

          connections.connect(
              host=os.environ['MILVUS_HOST'],
              port=os.environ['MILVUS_PORT']
          )
          collection = Collection('rag_documents')

          # Get processed documents
          bucket = os.environ['BUCKET_NAME']
          response = s3.list_objects_v2(Bucket=bucket, Prefix='processed-documents/')

          for obj in response.get('Contents', []):
              if obj['Key'].endswith('.json'):
                  # Read document chunk
                  doc = json.loads(s3.get_object(Bucket=bucket, Key=obj['Key'])['Body'].read())

                  # Generate embedding
                  embed_response = bedrock.invoke_model(
                      modelId='amazon.titan-embed-text-v2:0',
                      body=json.dumps({
                          'inputText': doc['chunk_text'],
                          'dimensions': 1024,
                          'normalize': True
                      })
                  )

                  embedding = json.loads(embed_response['body'].read())['embedding']

                  # Insert into Milvus
                  collection.insert([
                      [doc['chunk_id']],
                      [embedding],
                      [doc['chunk_text']],
                      [doc['metadata']]
                  ])

                  print(f"Inserted: {doc['chunk_id']}")

          collection.flush()
          print(f"Total entities in collection: {collection.num_entities}")
          PYTHON
      restartPolicy: Never
  backoffLimit: 3
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Monitor job&lt;/span&gt;
oc logs job/embed-documents &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application &lt;span class="nt"&gt;-f&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Test 3: RAG Query
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Test RAG query endpoint&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"https://&lt;/span&gt;&lt;span class="nv"&gt;$RAG_APP_URL&lt;/span&gt;&lt;span class="s2"&gt;/query"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "query": "What is Red Hat OpenShift?",
    "top_k": 3,
    "max_tokens": 500
  }'&lt;/span&gt; | jq &lt;span class="nb"&gt;.&lt;/span&gt;

&lt;span class="c"&gt;# Test another query&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"https://&lt;/span&gt;&lt;span class="nv"&gt;$RAG_APP_URL&lt;/span&gt;&lt;span class="s2"&gt;/query"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "query": "Tell me about Amazon Bedrock foundation models",
    "top_k": 3,
    "max_tokens": 500
  }'&lt;/span&gt; | jq &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Performance Testing
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Apache Bench for load testing&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;yum &lt;span class="nb"&gt;install &lt;/span&gt;httpd-tools &lt;span class="nt"&gt;-y&lt;/span&gt;

&lt;span class="c"&gt;# Create query payload&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; query-payload.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "query": "What are the benefits of using OpenShift?",
  "top_k": 5
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Run load test (100 requests, 10 concurrent)&lt;/span&gt;
ab &lt;span class="nt"&gt;-n&lt;/span&gt; 100 &lt;span class="nt"&gt;-c&lt;/span&gt; 10 &lt;span class="nt"&gt;-p&lt;/span&gt; query-payload.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-T&lt;/span&gt; application/json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"https://&lt;/span&gt;&lt;span class="nv"&gt;$RAG_APP_URL&lt;/span&gt;&lt;span class="s2"&gt;/query"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Resource Cleanup
&lt;/h2&gt;

&lt;p&gt;To avoid ongoing AWS charges, follow these steps to clean up all resources created during this implementation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Delete OpenShift Resources
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete RAG application&lt;/span&gt;
oc delete deployment rag-application &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application
oc delete service rag-application &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application
oc delete route rag-application &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application

&lt;span class="c"&gt;# Delete Milvus&lt;/span&gt;
helm uninstall milvus &lt;span class="nt"&gt;-n&lt;/span&gt; milvus
helm uninstall milvus-operator &lt;span class="nt"&gt;-n&lt;/span&gt; milvus
oc delete pvc &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; milvus

&lt;span class="c"&gt;# Delete RHOAI&lt;/span&gt;
oc delete datasciencecluster default-dsc &lt;span class="nt"&gt;-n&lt;/span&gt; redhat-ods-operator
oc delete subscription rhods-operator &lt;span class="nt"&gt;-n&lt;/span&gt; redhat-ods-operator

&lt;span class="c"&gt;# Delete projects/namespaces&lt;/span&gt;
oc delete project rag-application
oc delete project milvus
oc delete project redhat-ods-applications
oc delete project redhat-ods-operator
oc delete project redhat-ods-monitoring
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Delete ROSA Cluster
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete ROSA cluster (takes ~10-15 minutes)&lt;/span&gt;
rosa delete cluster &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;--yes&lt;/span&gt;

&lt;span class="c"&gt;# Wait for cluster deletion to complete&lt;/span&gt;
rosa logs uninstall &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;--watch&lt;/span&gt;

&lt;span class="c"&gt;# Verify cluster is deleted&lt;/span&gt;
rosa list clusters
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Delete AWS Glue Resources
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete Glue job&lt;/span&gt;
aws glue delete-job &lt;span class="nt"&gt;--job-name&lt;/span&gt; rag-document-processor &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Delete Glue crawler&lt;/span&gt;
aws glue delete-crawler &lt;span class="nt"&gt;--name&lt;/span&gt; rag-document-crawler &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Delete Glue database&lt;/span&gt;
aws glue delete-database &lt;span class="nt"&gt;--name&lt;/span&gt; rag_documents_db &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Delete Glue IAM role&lt;/span&gt;
aws iam delete-role-policy &lt;span class="nt"&gt;--role-name&lt;/span&gt; AWSGlueServiceRole-RAG &lt;span class="nt"&gt;--policy-name&lt;/span&gt; S3Access
aws iam detach-role-policy &lt;span class="nt"&gt;--role-name&lt;/span&gt; AWSGlueServiceRole-RAG &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole
aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; AWSGlueServiceRole-RAG
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Delete S3 Bucket and Contents
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete all objects in bucket&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;rm &lt;/span&gt;s3://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--recursive&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Delete bucket&lt;/span&gt;
aws s3 rb s3://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"S3 bucket deleted: &lt;/span&gt;&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5: Delete VPC Endpoint
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Delete VPC endpoint for Bedrock&lt;/span&gt;
aws ec2 delete-vpc-endpoints &lt;span class="nt"&gt;--vpc-endpoint-ids&lt;/span&gt; &lt;span class="nv"&gt;$BEDROCK_VPC_ENDPOINT&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Delete security group&lt;/span&gt;
aws ec2 delete-security-group &lt;span class="nt"&gt;--group-id&lt;/span&gt; &lt;span class="nv"&gt;$VPC_ENDPOINT_SG&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"VPC endpoint and security group deleted"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 6: Delete IAM Resources
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Detach policy from Bedrock role&lt;/span&gt;
aws iam detach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; rosa-bedrock-access &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity &lt;span class="nt"&gt;--query&lt;/span&gt; Account &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;:policy/BedrockInvokePolicy

&lt;span class="c"&gt;# Delete Bedrock role&lt;/span&gt;
aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; rosa-bedrock-access

&lt;span class="c"&gt;# Delete Bedrock policy&lt;/span&gt;
aws iam delete-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity &lt;span class="nt"&gt;--query&lt;/span&gt; Account &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;:policy/BedrockInvokePolicy

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"IAM roles and policies deleted"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7: Clean Up Local Files
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Remove temporary files&lt;/span&gt;
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; bedrock-policy.json
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; trust-policy.json
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; glue-trust-policy.json
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; glue-s3-policy.json
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; glue-etl-script.py
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; sample-document.txt
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; test-doc-1.txt
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; test-doc-2.txt
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; query-payload.json
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; milvus-values.yaml
&lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; rag-app/

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Local temporary files cleaned up"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Verification
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify ROSA cluster is deleted&lt;/span&gt;
rosa list clusters

&lt;span class="c"&gt;# Verify S3 bucket is deleted&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;ls&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;

&lt;span class="c"&gt;# Verify VPC endpoints are deleted&lt;/span&gt;
aws ec2 describe-vpc-endpoints &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nv"&gt;$BEDROCK_VPC_ENDPOINT&lt;/span&gt;

&lt;span class="c"&gt;# Verify IAM roles are deleted&lt;/span&gt;
aws iam list-roles | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"rosa-bedrock-access|AWSGlueServiceRole-RAG"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Cleanup verification complete"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>ai</category>
      <category>rag</category>
      <category>kubernetes</category>
      <category>aws</category>
    </item>
    <item>
      <title>AWS ML / GenAI Trifecta: Part 2 – AWS Certified Machine Learning Engineer Associate (MLA-C01)</title>
      <dc:creator>Marco Gonzalez</dc:creator>
      <pubDate>Thu, 25 Dec 2025 13:14:05 +0000</pubDate>
      <link>https://dev.to/mgonzalezo/aws-ml-genai-trifecta-part-2-aws-certified-machine-learning-engineer-associate-7mi</link>
      <guid>https://dev.to/mgonzalezo/aws-ml-genai-trifecta-part-2-aws-certified-machine-learning-engineer-associate-7mi</guid>
      <description>&lt;p&gt;This is the second entry in my journey to achieve the &lt;strong&gt;AWS ML / GenAI Trifecta&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;My goal is to master the full stack of AWS intelligence services by completing these three milestones:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;AWS Certified AI Practitioner (Foundational)&lt;/strong&gt; - Completed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Certified Machine Learning Engineer Associate&lt;/strong&gt; or &lt;strong&gt;AWS Certified Data Engineer Associate&lt;/strong&gt; — &lt;em&gt;Current focus&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Certified Machine Learning - Specialty&lt;/strong&gt; - Upcoming&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Study Guide Overview&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This guide is organized by complexity and aligned with the AWS Certified Machine Learning Engineer - Associate (MLA-C01) Exam Domains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Domain 1:&lt;/strong&gt; Data Preparation for ML (28%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain 2:&lt;/strong&gt; ML Model Development (26%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain 3:&lt;/strong&gt; Deployment and Orchestration (22%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain 4:&lt;/strong&gt; Monitoring, Maintenance, and Security (24%)&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Phase 1: Foundational Level
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Real-World ML in Action: Predicting Loan Defaults with AWS&lt;/li&gt;
&lt;li&gt;Data Collection, Ingestion, and Storage for AWS ML Workflows&lt;/li&gt;
&lt;li&gt;AWS SageMaker Built-In Algorithms: Enterprise ML at Your Fingertips&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Phase 2: Intermediate Level - Model Development
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Hyperparameters for Model Training: Exam Essentials&lt;/li&gt;
&lt;li&gt;Binary Classification Model Evaluation: Metrics and Validation&lt;/li&gt;
&lt;li&gt;SageMaker Algorithm Optimization &amp;amp; Experiment Tracking&lt;/li&gt;
&lt;li&gt;AWS Glue: Intelligent Data Integration with Machine Learning&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Phase 3: Advanced Level - Training &amp;amp; Tuning
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Optimizing Hyperparameter Tuning: Warm Start Strategies&lt;/li&gt;
&lt;li&gt;Hyperparameter Tuning: Bayesian Optimization &amp;amp; Random Seeds&lt;/li&gt;
&lt;li&gt;Amazon Bedrock Model Customization: Exam Essentials&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Phase 4: Deployment &amp;amp; Orchestration
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;SageMaker Batch Transform: Exam Essentials&lt;/li&gt;
&lt;li&gt;SageMaker Inference Recommender: Exam Essentials&lt;/li&gt;
&lt;li&gt;Amazon SageMaker Serverless Inference&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Phase 5: Security &amp;amp; Advanced Operations
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Securing Your SageMaker Workflows: IAM Roles and S3 Policies&lt;/li&gt;
&lt;li&gt;Advanced SageMaker Processing: Jobs and Permissions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Real-World ML in Action: Predicting Loan Defaults with AWS
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; ⭐⭐☆☆☆ (Beginner)&lt;br&gt;
&lt;strong&gt;Exam Domain:&lt;/strong&gt; Domain 1 &amp;amp; 2 (Data Preparation + Model Development)&lt;br&gt;
&lt;strong&gt;Exam Weight:&lt;/strong&gt; HIGH&lt;/p&gt;
&lt;h3&gt;
  
  
  Understanding Machine Learning: The Foundation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What is Machine Learning?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Machine learning (ML) is a branch of artificial intelligence that enables systems to analyze data and make predictions without explicit programming instructions. Instead of following hard-coded rules, ML algorithms learn patterns from historical data and apply those patterns to new, unseen data.&lt;/p&gt;
&lt;h3&gt;
  
  
  How Machine Learning Works
&lt;/h3&gt;

&lt;p&gt;The ML workflow consists of four essential phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Data Preprocessing&lt;/strong&gt;: Cleaning, transforming, and preparing raw data for analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training the Model&lt;/strong&gt;: Using algorithms to identify mathematical correlations between inputs and outputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluating the Model&lt;/strong&gt;: Testing how well the model generalizes to new data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimization&lt;/strong&gt;: Refining model performance through parameter tuning and feature engineering&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Key Benefits of Machine Learning
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced Decision-Making&lt;/strong&gt;: Data-driven insights replace guesswork&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automation&lt;/strong&gt;: Routine analytical tasks run without human intervention&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Improved Customer Experiences&lt;/strong&gt;: Personalization at scale&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proactive Management&lt;/strong&gt;: Predict issues before they occur&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continuous Improvement&lt;/strong&gt;: Models learn and adapt over time&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Industry Applications
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Manufacturing&lt;/strong&gt;: Predictive maintenance, quality control&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Healthcare&lt;/strong&gt;: Real-time diagnosis, treatment recommendations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Financial Services&lt;/strong&gt;: Risk analytics, fraud detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retail&lt;/strong&gt;: Inventory optimization, customer service automation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Media &amp;amp; Entertainment&lt;/strong&gt;: Content personalization&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Case Study: Predicting Loan Defaults for Financial Institutions
&lt;/h3&gt;
&lt;h4&gt;
  
  
  The Business Challenge
&lt;/h4&gt;

&lt;p&gt;Financial institutions face significant risk from loan defaults. Traditional rule-based systems often miss subtle patterns that indicate potential defaults. Financial organizations need proactive, data-driven approaches to assess credit risk, optimize lending decisions, and maximize profitability while maintaining regulatory compliance.&lt;/p&gt;
&lt;h4&gt;
  
  
  The AWS Solution
&lt;/h4&gt;

&lt;p&gt;AWS provides comprehensive guidance for building an automated loan default prediction system using serverless and machine learning services. This solution enables financial institutions to leverage ML with minimal development effort and cost.&lt;/p&gt;
&lt;h4&gt;
  
  
  Solution Architecture &amp;amp; Key Components
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;1. Data Integration (Amazon AppFlow)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Securely transfer data from various sources (Salesforce, SAP, etc.)&lt;/li&gt;
&lt;li&gt;Automate data collection from CRM and loan management systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Data Storage (Amazon S3, Amazon Redshift, Amazon RDS)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Centralized, durable storage for raw and processed data&lt;/li&gt;
&lt;li&gt;Support for structured and unstructured data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Data Preparation (SageMaker Data Wrangler)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visual interface for data cleaning and transformation&lt;/li&gt;
&lt;li&gt;Feature engineering without extensive coding&lt;/li&gt;
&lt;li&gt;Data quality checks and anomaly detection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Model Training (SageMaker Autopilot)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automated machine learning (AutoML) capabilities&lt;/li&gt;
&lt;li&gt;Automatically explores multiple algorithms and hyperparameters&lt;/li&gt;
&lt;li&gt;Provides model explainability for regulatory compliance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;5. Model Deployment &amp;amp; Hosting (SageMaker)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time prediction endpoints&lt;/li&gt;
&lt;li&gt;Automatic scaling based on demand&lt;/li&gt;
&lt;li&gt;Model versioning and management&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;6. Monitoring &amp;amp; Retraining (Amazon CloudWatch, SageMaker Model Monitor)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Track model performance and drift&lt;/li&gt;
&lt;li&gt;Automated alerts when model accuracy degrades&lt;/li&gt;
&lt;li&gt;Continuous retraining pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;7. Visualization &amp;amp; Analytics (Amazon QuickSight)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interactive dashboards for business users&lt;/li&gt;
&lt;li&gt;Risk portfolio analysis&lt;/li&gt;
&lt;li&gt;Performance metrics visualization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;8. API Integration (Amazon API Gateway, AWS Lambda)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Serverless endpoints for predictions&lt;/li&gt;
&lt;li&gt;Integration with existing loan origination systems&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Business Benefits
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Quick Risk Assessment&lt;/strong&gt;: Real-time loan default probability scoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Efficiency&lt;/strong&gt;: Serverless, pay-per-use pricing model eliminates upfront infrastructure costs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proactive Risk Management&lt;/strong&gt;: Identify high-risk loans before they default&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regulatory Compliance&lt;/strong&gt;: Model explainability meets regulatory requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Profit Maximization&lt;/strong&gt;: Optimize lending decisions to balance risk and revenue&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Well-Architected Framework Alignment
&lt;/h4&gt;

&lt;p&gt;The solution follows AWS best practices across six pillars:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Operational Excellence&lt;/strong&gt;: Automated data pipelines and model management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: Encryption at rest (KMS), restricted IAM access, VPC isolation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reliability&lt;/strong&gt;: Multi-AZ deployments, automatic backups, durable S3 storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance Efficiency&lt;/strong&gt;: AutoML reduces manual tuning, serverless auto-scaling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Optimization&lt;/strong&gt;: Pay only for resources used, no idle infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sustainability&lt;/strong&gt;: Automated drift detection prevents unnecessary retraining&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;
  
  
  Implementation Workflow
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Data Sources → AppFlow → S3 → Data Wrangler → Feature Store
                                                    ↓
QuickSight ← API Gateway ← Hosted Model ← SageMaker Autopilot
                ↑                              ↑
              Lambda                    Model Monitor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  From Theory to Practice
&lt;/h4&gt;

&lt;p&gt;This loan default prediction solution demonstrates how machine learning theory translates into real business value. By combining automated ML (SageMaker Autopilot) with robust data preparation (Data Wrangler) and continuous monitoring, financial institutions can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduce loan default rates by 20-30%&lt;/li&gt;
&lt;li&gt;Accelerate loan approval processes from days to minutes&lt;/li&gt;
&lt;li&gt;Meet regulatory explainability requirements&lt;/li&gt;
&lt;li&gt;Scale predictions across millions of loan applications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The serverless architecture ensures that even small financial institutions can access enterprise-grade ML capabilities without hiring large data science teams or investing in expensive infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/solutions/guidance/predicting-loan-defaults-for-financial-institutions-on-aws/" rel="noopener noreferrer"&gt;AWS Guidance: Predicting Loan Defaults for Financial Institutions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/what-is/machine-learning/" rel="noopener noreferrer"&gt;What is Machine Learning? - AWS Overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  2. Data Collection, Ingestion, and Storage for AWS ML Workflows
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; ⭐⭐⭐☆☆ (Intermediate)&lt;br&gt;
&lt;strong&gt;Exam Domain:&lt;/strong&gt; Domain 1 (Data Preparation - 28%)&lt;br&gt;
&lt;strong&gt;Exam Weight:&lt;/strong&gt; HIGH&lt;/p&gt;
&lt;h3&gt;
  
  
  SageMaker Data Wrangler: JSON and ORC Data Support
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Overview
&lt;/h4&gt;

&lt;p&gt;Amazon SageMaker Data Wrangler reduces data preparation time for &lt;strong&gt;tabular, image, and text data from weeks to minutes&lt;/strong&gt; through a visual and natural language interface. Since February 2022, Data Wrangler has supported &lt;strong&gt;Optimized Row Columnar (ORC)&lt;/strong&gt;, &lt;strong&gt;JavaScript Object Notation (JSON)&lt;/strong&gt;, and &lt;strong&gt;JSON Lines (JSONL)&lt;/strong&gt; file formats, in addition to CSV and Parquet.&lt;/p&gt;
&lt;h4&gt;
  
  
  Supported File Formats
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Core Formats:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CSV&lt;/strong&gt; (Comma-Separated Values)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parquet&lt;/strong&gt; (Columnar storage format)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON&lt;/strong&gt; (JavaScript Object Notation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSONL&lt;/strong&gt; (JSON Lines - newline-delimited JSON)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ORC&lt;/strong&gt; (Optimized Row Columnar)&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  JSON and ORC-Specific Features
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;1. Data Preview&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Preview ORC, JSON, and JSONL data &lt;strong&gt;before importing&lt;/strong&gt; into Data Wrangler&lt;/li&gt;
&lt;li&gt;Validate data structure and schema before processing&lt;/li&gt;
&lt;li&gt;Ensure correct format selection during import&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Specialized JSON Transformations&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Data Wrangler provides two powerful transforms for nested JSON data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Flatten structured column&lt;/strong&gt;: Converts nested JSON objects into flat tabular columns&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Example: &lt;code&gt;{"user": {"name": "John", "age": 30}}&lt;/code&gt; → separate &lt;code&gt;user.name&lt;/code&gt; and &lt;code&gt;user.age&lt;/code&gt; columns&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Explode array column&lt;/strong&gt;: Expands JSON arrays into multiple rows&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Example: &lt;code&gt;{"items": ["A", "B", "C"]}&lt;/code&gt; → creates three rows with individual items&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. ORC Import Process&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Importing ORC data is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Browse to your ORC file in &lt;strong&gt;Amazon S3&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;ORC as the file type&lt;/strong&gt; during import&lt;/li&gt;
&lt;li&gt;Data Wrangler handles schema inference automatically&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;
  
  
  Use Cases for JSON/ORC in ML Workflows
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;JSON:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API response data (web logs, application telemetry)&lt;/li&gt;
&lt;li&gt;Semi-structured data with nested fields&lt;/li&gt;
&lt;li&gt;Event-driven data streams from applications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;ORC:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Large-scale analytics data (optimized for Hadoop/Spark)&lt;/li&gt;
&lt;li&gt;Columnar storage for efficient querying&lt;/li&gt;
&lt;li&gt;High compression ratios for cost-effective storage&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  AWS ML Engineer Associate: Data Collection, Ingestion &amp;amp; Storage
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Core AWS Services for Data Pipelines
&lt;/h4&gt;

&lt;p&gt;The &lt;strong&gt;AWS ML Engineer Associate certification&lt;/strong&gt; emphasizes data preparation as a critical phase of the ML lifecycle. Key services include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Storage Services:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amazon S3&lt;/strong&gt;: Primary object storage for training data, model artifacts, and outputs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon EBS&lt;/strong&gt;: Block storage for EC2-based processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon EFS&lt;/strong&gt;: Shared file storage for distributed training&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon RDS&lt;/strong&gt;: Relational database for structured data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon DynamoDB&lt;/strong&gt;: NoSQL database for key-value and document data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Data Ingestion Services:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Kinesis&lt;/strong&gt;: Real-time streaming data ingestion

&lt;ul&gt;
&lt;li&gt;Kinesis Data Streams: Real-time data collection&lt;/li&gt;
&lt;li&gt;Kinesis Data Firehose: Load streaming data into S3, Redshift, or Elasticsearch&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Glue&lt;/strong&gt;: ETL service for data transformation and cataloging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Data Pipeline&lt;/strong&gt;: Orchestrate data movement between AWS services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Data Processing &amp;amp; Analytics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS Glue&lt;/strong&gt;: Serverless ETL with Data Catalog&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon EMR&lt;/strong&gt;: Managed Hadoop/Spark clusters for big data processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Athena&lt;/strong&gt;: Serverless SQL queries on S3 data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache Spark on EMR&lt;/strong&gt;: Distributed data processing&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Choosing Data Formats
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Format Selection Criteria:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Compression&lt;/th&gt;
&lt;th&gt;Query Performance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CSV&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple tabular data, human-readable&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Slow (full scan)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;JSON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Semi-structured, nested data&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Slow (parsing overhead)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parquet&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Columnar analytics, ML training&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Fast (columnar)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ORC&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hadoop/Spark workloads&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Fast (columnar)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Best Practices:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;Parquet or ORC&lt;/strong&gt; for large-scale analytics and ML training (columnar formats enable efficient querying and compression)&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;JSON/JSONL&lt;/strong&gt; for semi-structured data with nested fields&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;CSV&lt;/strong&gt; for simple, human-readable datasets or data exchange&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Data Ingestion into SageMaker
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;SageMaker Data Wrangler:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visual interface for importing data from S3, Athena, Redshift, and Snowflake&lt;/li&gt;
&lt;li&gt;Apply transformations (flatten JSON, encode categorical variables, balance datasets)&lt;/li&gt;
&lt;li&gt;Export to SageMaker Feature Store or directly to training jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;SageMaker Feature Store:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Centralized repository for ML features&lt;/li&gt;
&lt;li&gt;Supports online (low-latency) and offline (batch) feature retrieval&lt;/li&gt;
&lt;li&gt;Ensures feature consistency across training and inference&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Merging Data from Multiple Sources
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Using AWS Glue:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Crawlers automatically discover schema from S3, RDS, DynamoDB&lt;/li&gt;
&lt;li&gt;Visual ETL jobs combine data from multiple sources&lt;/li&gt;
&lt;li&gt;Glue Data Catalog provides metadata repository&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Using Apache Spark on EMR:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Distributed joins across massive datasets&lt;/li&gt;
&lt;li&gt;Support for Parquet, ORC, JSON, CSV&lt;/li&gt;
&lt;li&gt;Integrate with S3 for input/output&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Troubleshooting Data Ingestion Issues
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Capacity and Scalability:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;S3 Throughput&lt;/strong&gt;: Use S3 Transfer Acceleration for faster uploads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kinesis Shards&lt;/strong&gt;: Scale based on ingestion rate (1 MB/s per shard)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Glue DPUs&lt;/strong&gt;: Increase Data Processing Units for larger ETL jobs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EMR Cluster Sizing&lt;/strong&gt;: Right-size instance types and counts for workload&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Common Issues:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schema mismatches&lt;/strong&gt;: Use Glue crawlers to infer and update schemas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data quality&lt;/strong&gt;: Apply Data Wrangler quality checks and transformations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access permissions&lt;/strong&gt;: Ensure IAM roles have S3, Glue, Kinesis permissions&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Exam Tips for AWS ML Engineer Associate
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Key Knowledge Areas:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Recognize data types&lt;/strong&gt;: Structured (CSV, Parquet), semi-structured (JSON), unstructured (images, text)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose storage services&lt;/strong&gt;: S3 (object), EBS (block), EFS (file), RDS (relational), DynamoDB (NoSQL)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Select data formats&lt;/strong&gt;: Parquet/ORC for analytics, JSON for nested data, CSV for simplicity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ingest streaming data&lt;/strong&gt;: Kinesis Data Streams for real-time, Firehose for batch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transform data&lt;/strong&gt;: Glue for ETL, Data Wrangler for visual transformations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Troubleshoot&lt;/strong&gt;: Understand capacity limits, IAM permissions, schema evolution&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Target Experience:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;At least &lt;strong&gt;1 year&lt;/strong&gt; in backend development, DevOps, data engineering, or data science&lt;/li&gt;
&lt;li&gt;Hands-on with AWS analytics services: Glue, EMR, Athena, Kinesis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/prepare-and-analyze-json-and-orc-data-with-amazon-sagemaker-data-wrangler/" rel="noopener noreferrer"&gt;Prepare and analyze JSON and ORC data with Amazon SageMaker Data Wrangler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2022/02/json-orc-data-processing-jobs-amazon-sagemaker-data-wrangler/" rel="noopener noreferrer"&gt;Prepare JSON and ORC data with Amazon SageMaker Data Wrangler&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.classcentral.com/course/aws-ml-engineer-associate-11-collect-ingest-and-store-data-295551" rel="noopener noreferrer"&gt;AWS ML Engineer Associate Course&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://d1.awsstatic.com/training-and-certification/docs-machine-learning-engineer-associate/AWS-Certified-Machine-Learning-Engineer-Associate_Exam-Guide.pdf" rel="noopener noreferrer"&gt;AWS Certified Machine Learning Engineer - Associate Exam Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  3. AWS SageMaker Built-In Algorithms: Enterprise ML at Your Fingertips
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; ⭐⭐⭐☆☆ (Intermediate)&lt;br&gt;
&lt;strong&gt;Exam Domain:&lt;/strong&gt; Domain 2 (ML Model Development - 26%)&lt;br&gt;
&lt;strong&gt;Exam Weight:&lt;/strong&gt; HIGH&lt;/p&gt;
&lt;h3&gt;
  
  
  Overview: Pre-Built Intelligence for Every Use Case
&lt;/h3&gt;

&lt;p&gt;AWS SageMaker offers a comprehensive library of production-ready, built-in machine learning algorithms that eliminate the need to build models from scratch. These algorithms are optimized for performance, scalability, and cost-efficiency, enabling data scientists to focus on solving business problems rather than implementing mathematical foundations.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Algorithm Portfolio
&lt;/h3&gt;

&lt;p&gt;SageMaker organizes its built-in algorithms across five major categories:&lt;/p&gt;
&lt;h4&gt;
  
  
  1. Supervised Learning Algorithms
&lt;/h4&gt;

&lt;p&gt;Supervised learning uses labeled training data to predict outcomes for new data. SageMaker provides powerful algorithms for both classification and regression tasks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tabular Data Specialists:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AutoGluon-Tabular&lt;/strong&gt;: Automated ensemble learning that combines multiple models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;XGBoost&lt;/strong&gt;: Industry-standard gradient boosting for structured data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LightGBM&lt;/strong&gt;: Fast, distributed gradient boosting framework&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CatBoost&lt;/strong&gt;: Handles categorical features natively without encoding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linear Learner&lt;/strong&gt;: Scalable linear regression and classification&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TabTransformer&lt;/strong&gt;: Transformer-based architecture for tabular data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;K-Nearest Neighbors (KNN)&lt;/strong&gt;: Simple, interpretable classification and regression&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Factorization Machines&lt;/strong&gt;: Captures feature interactions for high-dimensional sparse data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Specialized Applications:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Object2Vec&lt;/strong&gt;: Generates low-dimensional embeddings for feature engineering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepAR&lt;/strong&gt;: Neural network-based time series forecasting for demand prediction, capacity planning&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  2. Unsupervised Learning Algorithms
&lt;/h4&gt;

&lt;p&gt;Unsupervised learning discovers patterns in unlabeled data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;K-Means Clustering&lt;/strong&gt;: Groups similar data points for customer segmentation, anomaly detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Principal Component Analysis (PCA)&lt;/strong&gt;: Dimensionality reduction for data visualization and noise reduction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Random Cut Forest&lt;/strong&gt;: Anomaly detection in streaming data and time series&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IP Insights&lt;/strong&gt;: Specialized algorithm for detecting unusual network behavior (detailed below)&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  3. Text Analysis Algorithms
&lt;/h4&gt;

&lt;p&gt;Natural language processing and text understanding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BlazingText&lt;/strong&gt;: Fast text classification and word embeddings (Word2Vec implementation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sequence-to-Sequence&lt;/strong&gt;: Neural machine translation, text summarization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latent Dirichlet Allocation (LDA)&lt;/strong&gt;: Topic modeling for document analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Neural Topic Model&lt;/strong&gt;: Deep learning approach to discovering document themes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text Classification&lt;/strong&gt;: Supervised learning for categorizing text documents&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  4. Image Processing Algorithms
&lt;/h4&gt;

&lt;p&gt;Computer vision tasks powered by deep learning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Image Classification&lt;/strong&gt;: Categorize images into predefined classes (MXNet/TensorFlow)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Object Detection&lt;/strong&gt;: Identify and locate multiple objects within images (MXNet/TensorFlow)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Segmentation&lt;/strong&gt;: Pixel-level classification for medical imaging, autonomous vehicles&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  5. Pre-Trained Models &amp;amp; Solution Templates
&lt;/h4&gt;

&lt;p&gt;Ready-to-use models covering 15+ problem types including question answering, sentiment analysis, and popular architectures like MobileNet, YOLO, and BERT.&lt;/p&gt;
&lt;h3&gt;
  
  
  Deep Dive: IP Insights for Security and Fraud Detection
&lt;/h3&gt;
&lt;h4&gt;
  
  
  What is IP Insights?
&lt;/h4&gt;

&lt;p&gt;IP Insights is an unsupervised learning algorithm designed specifically to detect anomalous behavior in network traffic by learning the normal relationship between entities (user IDs, account numbers) and their associated IPv4 addresses.&lt;/p&gt;
&lt;h4&gt;
  
  
  How It Works
&lt;/h4&gt;

&lt;p&gt;The algorithm analyzes historical &lt;code&gt;(entity, IPv4 address)&lt;/code&gt; pairs to learn typical usage patterns. When presented with a new interaction, it generates an anomaly score indicating how unusual the pairing is. High scores suggest potential security threats or fraudulent activity.&lt;/p&gt;
&lt;h4&gt;
  
  
  Primary Use Cases
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fraud Detection&lt;/strong&gt;: Identify account takeovers when users log in from unexpected IP addresses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Enhancement&lt;/strong&gt;: Trigger multi-factor authentication based on anomaly scores&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Threat Detection&lt;/strong&gt;: Integrate with AWS GuardDuty for comprehensive security monitoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature Engineering&lt;/strong&gt;: Generate IP address embeddings for downstream ML models&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;
  
  
  Technical Specifications
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input Format&lt;/strong&gt;: CSV files with entity identifier and IPv4 address columns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output&lt;/strong&gt;: Anomaly scores (0-1 range, higher indicates more unusual)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instance Recommendations&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;Training: GPU instances (P2, P3, G4dn, G5) for faster model development&lt;/li&gt;
&lt;li&gt;Inference: CPU instances for cost-effective predictions&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment Options&lt;/strong&gt;: Real-time endpoints or batch transform jobs&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Example Workflow
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Historical Logins → IP Insights Training → Model Deployment
     ↓
New Login Attempt → Anomaly Score → Risk Assessment → MFA Trigger
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h4&gt;
  
  
  Business Impact
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Reduce fraudulent transactions by detecting compromised accounts early&lt;/li&gt;
&lt;li&gt;Lower false positive rates compared to rule-based systems&lt;/li&gt;
&lt;li&gt;Adapt to evolving attack patterns through continuous retraining&lt;/li&gt;
&lt;li&gt;Seamlessly integrate into existing authentication workflows&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Why Use SageMaker Built-In Algorithms?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Performance&lt;/strong&gt;: Optimized for AWS infrastructure with multi-GPU support and distributed training&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost-Efficiency&lt;/strong&gt;: Pre-built algorithms reduce development time from months to days&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: Handle datasets from gigabytes to petabytes without code changes&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Flexibility&lt;/strong&gt;: Support for multiple instance types (CPU, GPU, inference-optimized)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integration&lt;/strong&gt;: Native compatibility with SageMaker Pipelines, Model Monitor, and Feature Store&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/ip-insights.html" rel="noopener noreferrer"&gt;Amazon SageMaker IP Insights&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html" rel="noopener noreferrer"&gt;Amazon SageMaker Built-In Algorithms&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  4. Hyperparameters for Model Training: Exam Essentials
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; ⭐⭐⭐☆☆ (Intermediate)&lt;br&gt;
&lt;strong&gt;Exam Domain:&lt;/strong&gt; Domain 2 (ML Model Development - 26%)&lt;br&gt;
&lt;strong&gt;Exam Weight:&lt;/strong&gt; MEDIUM-HIGH&lt;/p&gt;
&lt;h3&gt;
  
  
  Key Hyperparameters (SageMaker Autopilot LLM Fine-Tuning)
&lt;/h3&gt;
&lt;h4&gt;
  
  
  1. Epoch Count (&lt;code&gt;epochCount&lt;/code&gt;)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Number of complete passes through entire training dataset&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impact&lt;/strong&gt;: More epochs = better learning, but risk of overfitting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best Practice&lt;/strong&gt;: Set large &lt;code&gt;MaxAutoMLJobRuntimeInSeconds&lt;/code&gt; to prevent early stopping&lt;/li&gt;
&lt;li&gt;Typical: ~10 epochs can take up to 72 hours&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  2. Batch Size (&lt;code&gt;batchSize&lt;/code&gt;)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Number of samples processed per training iteration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Impact&lt;/strong&gt;: Larger batches = faster training, higher memory usage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best Practice&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;Start with batch size = 1&lt;/li&gt;
&lt;li&gt;Incrementally increase until out-of-memory (OOM) error&lt;/li&gt;
&lt;li&gt;Monitor CloudWatch logs: &lt;code&gt;/aws/sagemaker/TrainingJobs&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  3. Learning Rate (&lt;code&gt;learningRate&lt;/code&gt;)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Controls step size for weight updates during training&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High rate&lt;/strong&gt;: Fast convergence, risk of overshooting optimal solution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low rate&lt;/strong&gt;: Stable convergence, slower training&lt;/li&gt;
&lt;li&gt;Critical for Stochastic Gradient Descent (SGD) algorithm&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  4. Learning Rate Warmup Steps (&lt;code&gt;learningRateWarmupSteps&lt;/code&gt;)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Gradual learning rate increase during initial training steps&lt;/li&gt;
&lt;li&gt;Prevents early convergence issues&lt;/li&gt;
&lt;li&gt;Improves model stability&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Training Parameters (AWS Machine Learning)
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Number of Passes
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Sequential iterations over training data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Small datasets&lt;/strong&gt;: Increase passes significantly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large datasets&lt;/strong&gt;: Single pass often sufficient&lt;/li&gt;
&lt;li&gt;Diminishing returns with excessive passes&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Data Shuffling
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Randomizes training data order each pass&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Critical&lt;/strong&gt; for preventing algorithmic bias&lt;/li&gt;
&lt;li&gt;Helps find optimal solution faster&lt;/li&gt;
&lt;li&gt;Prevents overfitting to data patterns&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Regularization
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;L1 Regularization&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feature selection, creates sparse models (reduces feature count)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;L2 Regularization&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Weight stabilization, reduces feature correlation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both prevent overfitting by penalizing large weights&lt;/p&gt;
&lt;h3&gt;
  
  
  Exam Tips
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Epochs&lt;/strong&gt;: Complete dataset passes (more = overfitting risk)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch Size&lt;/strong&gt;: Start small, increase until OOM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learning Rate&lt;/strong&gt;: Balance speed vs stability (too high = overshoot; too low = slow)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shuffling&lt;/strong&gt;: Always shuffle to prevent bias&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L1&lt;/strong&gt;: Sparse models; &lt;strong&gt;L2&lt;/strong&gt;: Weight stability&lt;/li&gt;
&lt;li&gt;Monitor CloudWatch for OOM errors during training&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-llms-finetuning-hyperparameters.html" rel="noopener noreferrer"&gt;SageMaker Autopilot LLM Fine-Tuning Hyperparameters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/machine-learning/latest/dg/training-parameters1.html" rel="noopener noreferrer"&gt;AWS Machine Learning Training Parameters&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  5. Binary Classification Model Evaluation: Metrics and Validation in SageMaker
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; ⭐⭐⭐☆☆ (Intermediate)&lt;br&gt;
&lt;strong&gt;Exam Domain:&lt;/strong&gt; Domain 2 (ML Model Development - 26%)&lt;br&gt;
&lt;strong&gt;Exam Weight:&lt;/strong&gt; HIGH&lt;/p&gt;
&lt;h3&gt;
  
  
  Understanding Binary Classification Metrics
&lt;/h3&gt;

&lt;p&gt;Binary classification models predict one of two possible outcomes (fraud/not fraud, churn/no churn). Evaluating these models requires understanding multiple metrics that capture different aspects of performance.&lt;/p&gt;
&lt;h4&gt;
  
  
  Core Evaluation Metrics
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;1. Confusion Matrix Components&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The foundation of binary classification evaluation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;True Positive (TP)&lt;/strong&gt;: Correctly predicted positive instances&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;True Negative (TN)&lt;/strong&gt;: Correctly predicted negative instances&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;False Positive (FP)&lt;/strong&gt;: Incorrectly predicted positive (Type I error)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;False Negative (FN)&lt;/strong&gt;: Incorrectly predicted negative (Type II error)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Accuracy&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Accuracy = (TP + TN) / (TP + TN + FP + FN)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Range: 0 to 1 (higher is better)&lt;/li&gt;
&lt;li&gt;Overall correctness of predictions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limitation&lt;/strong&gt;: Misleading for imbalanced datasets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Precision&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Precision = TP / (TP + FP)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Range: 0 to 1 (higher is better)&lt;/li&gt;
&lt;li&gt;Fraction of positive predictions that are correct&lt;/li&gt;
&lt;li&gt;Critical when false positives are costly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Recall (Sensitivity/True Positive Rate)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Recall = TP / (TP + FN)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Range: 0 to 1 (higher is better)&lt;/li&gt;
&lt;li&gt;Fraction of actual positives correctly identified&lt;/li&gt;
&lt;li&gt;Critical when false negatives are costly (e.g., fraud detection, disease diagnosis)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;5. F1 Score&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;F1 = 2 × (Precision × Recall) / (Precision + Recall)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Harmonic mean of precision and recall&lt;/li&gt;
&lt;li&gt;Balances both metrics&lt;/li&gt;
&lt;li&gt;Useful when you need equal consideration of false positives and false negatives&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;6. False Positive Rate (FPR)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FPR = FP / (FP + TN)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Range: 0 to 1 (lower is better)&lt;/li&gt;
&lt;li&gt;Measures "false alarm" rate&lt;/li&gt;
&lt;li&gt;Used in ROC curve analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ROC Curve and AUC: Comprehensive Performance Assessment
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Receiver Operating Characteristic (ROC) Curve
&lt;/h4&gt;

&lt;p&gt;The &lt;strong&gt;ROC curve&lt;/strong&gt; is a critical evaluation metric in binary classification that plots &lt;strong&gt;True Positive Rate (Recall)&lt;/strong&gt; against &lt;strong&gt;False Positive Rate&lt;/strong&gt; at various threshold levels. It provides a comprehensive perspective on how different thresholds impact the balance between &lt;strong&gt;sensitivity&lt;/strong&gt; (true positive rate) and &lt;strong&gt;specificity&lt;/strong&gt; (1 - false positive rate).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;X-axis&lt;/strong&gt;: False Positive Rate (FPR)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Y-axis&lt;/strong&gt;: True Positive Rate (Recall)&lt;/li&gt;
&lt;li&gt;Each point represents a different classification threshold&lt;/li&gt;
&lt;li&gt;Diagonal line represents random guessing (baseline AUC = 0.5)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Threshold Selection:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;optimal threshold&lt;/strong&gt; can be chosen based on the point &lt;strong&gt;closest to the plot's upper left corner&lt;/strong&gt; (coordinates: FPR=0, TPR=1), representing the optimal balance between detecting positive instances and minimizing false positives.&lt;/p&gt;

&lt;h4&gt;
  
  
  Area Under the ROC Curve (AUC)
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;AUC&lt;/strong&gt; quantifies overall model performance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Range&lt;/strong&gt;: 0 to 1&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Baseline&lt;/strong&gt;: 0.5 (random guessing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interpretation&lt;/strong&gt;: Values closer to &lt;strong&gt;1.0&lt;/strong&gt; indicate better model performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advantage&lt;/strong&gt;: Threshold-independent metric that measures discrimination ability across all possible thresholds&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  ROC Curve in Amazon SageMaker
&lt;/h4&gt;

&lt;p&gt;In &lt;strong&gt;Amazon SageMaker&lt;/strong&gt;, the ROC curve is especially useful for applications like &lt;strong&gt;fraud detection&lt;/strong&gt;, where the objective is to balance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Minimizing false negatives&lt;/strong&gt;: Catching fraudulent transactions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimizing false positives&lt;/strong&gt;: Avoiding false alarms that inconvenience customers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;SageMaker allows users to &lt;strong&gt;generate ROC curves&lt;/strong&gt; as part of the model evaluation process through &lt;strong&gt;SageMaker Autopilot&lt;/strong&gt; and custom model evaluation jobs, making it easier for data scientists to identify the best classification threshold for their specific use case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When working with balanced datasets&lt;/strong&gt;, the ROC curve provides a reliable way to measure model performance and make informed decisions about threshold tuning. For imbalanced datasets, consider &lt;strong&gt;Balanced Accuracy&lt;/strong&gt; or &lt;strong&gt;Precision-Recall curves&lt;/strong&gt; as complementary metrics.&lt;/p&gt;

&lt;h3&gt;
  
  
  SageMaker Autopilot Validation Techniques
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Cross-Validation
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;K-Fold Cross-Validation&lt;/strong&gt; (typically 5 folds):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatically implemented for datasets ≤ 50,000 instances&lt;/li&gt;
&lt;li&gt;Reduces overfitting and selection bias&lt;/li&gt;
&lt;li&gt;Provides robust performance estimates&lt;/li&gt;
&lt;li&gt;Averaged validation metrics across folds&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Validation Modes
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;1. Hyperparameter Optimization (HPO) Mode:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatic 5-fold cross-validation&lt;/li&gt;
&lt;li&gt;Evaluates multiple hyperparameter combinations&lt;/li&gt;
&lt;li&gt;Selects best model based on averaged metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Ensembling Mode:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cross-validation regardless of dataset size&lt;/li&gt;
&lt;li&gt;80-20% train-validation split&lt;/li&gt;
&lt;li&gt;Out-of-fold (OOF) predictions for stacking&lt;/li&gt;
&lt;li&gt;Combines multiple base models for improved performance&lt;/li&gt;
&lt;li&gt;Supports sample weights for imbalanced datasets&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Best Practices
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use multiple metrics&lt;/strong&gt;: Don't rely solely on accuracy—consider precision, recall, F1, and AUC&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ROC curve analysis&lt;/strong&gt;: Identify optimal threshold for your business context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-validation&lt;/strong&gt;: Essential for small datasets (&amp;lt; 50,000 instances)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Balanced accuracy&lt;/strong&gt;: Use for imbalanced datasets instead of raw accuracy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Threshold tuning&lt;/strong&gt;: Adjust based on cost of false positives vs. false negatives&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-metrics-validation.html" rel="noopener noreferrer"&gt;SageMaker Autopilot Metrics and Validation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/machine-learning/latest/dg/binary-model-insights.html" rel="noopener noreferrer"&gt;AWS Machine Learning Binary Model Insights&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  6. SageMaker Algorithm Optimization &amp;amp; Experiment Tracking
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; ⭐⭐⭐☆☆ (Intermediate)&lt;br&gt;
&lt;strong&gt;Exam Domain:&lt;/strong&gt; Domain 2 (ML Model Development - 26%)&lt;br&gt;
&lt;strong&gt;Exam Weight:&lt;/strong&gt; MEDIUM&lt;/p&gt;
&lt;h3&gt;
  
  
  Training Modes and Performance Optimization
&lt;/h3&gt;

&lt;p&gt;Beyond algorithm selection, SageMaker offers &lt;strong&gt;two training data modes&lt;/strong&gt; that significantly impact performance:&lt;/p&gt;
&lt;h4&gt;
  
  
  File Mode
&lt;/h4&gt;

&lt;p&gt;Downloads entire dataset to training instances before training begins.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Smaller datasets (&amp;lt; 50 GB)&lt;/li&gt;
&lt;li&gt;Random access patterns during training&lt;/li&gt;
&lt;li&gt;Algorithms requiring multiple passes over data&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Pipe Mode
&lt;/h4&gt;

&lt;p&gt;Streams data directly from S3 during training.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Large datasets (&amp;gt; 50 GB)&lt;/li&gt;
&lt;li&gt;Sequential data access patterns&lt;/li&gt;
&lt;li&gt;Reducing training time and storage costs&lt;/li&gt;
&lt;li&gt;Faster startup times (no download wait)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Instance Type Recommendations
&lt;/h3&gt;

&lt;p&gt;Instance type selection varies by algorithm:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;XGBoost/LightGBM/CatBoost&lt;/strong&gt;: Compute-optimized instances (C5, C6i) for CPU-based boosting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepAR&lt;/strong&gt;: GPU instances (P3, P4) for deep learning time series models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image Classification/Object Detection&lt;/strong&gt;: GPU instances with high memory bandwidth&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linear Learner&lt;/strong&gt;: Memory-optimized instances (R5) for large-scale linear models&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Incremental Training Support
&lt;/h3&gt;

&lt;p&gt;Some algorithms (XGBoost, Object Detection, Image Classification) support &lt;strong&gt;incremental training&lt;/strong&gt;—use a previously trained model as starting point when new data arrives, avoiding full retraining.&lt;/p&gt;
&lt;h3&gt;
  
  
  Hyperparameter Tuning: The Performance Multiplier
&lt;/h3&gt;

&lt;p&gt;Algorithm performance depends heavily on hyperparameter selection. SageMaker provides &lt;strong&gt;automatic hyperparameter tuning&lt;/strong&gt; using Bayesian optimization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;hyperparameter_ranges&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;learning_rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ContinuousParameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;IntegerParameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;num_estimators&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;IntegerParameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;tuner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HyperparameterTuner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;estimator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;xgboost_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;hyperparameter_ranges&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;hyperparameter_ranges&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;objective_metric_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;validation:rmse&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_jobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_parallel_jobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This automates what traditionally requires manual experimentation, exploring the hyperparameter space intelligently to find optimal configurations.&lt;/p&gt;

&lt;h3&gt;
  
  
  SageMaker Experiments: From Chaos to Organization
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What is SageMaker Experiments?
&lt;/h4&gt;

&lt;p&gt;An experiment management system that tracks, organizes, and compares ML workflows. Think of it as "version control for machine learning"—capturing not just code, but data, parameters, and results.&lt;/p&gt;

&lt;h4&gt;
  
  
  Organizational Hierarchy
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Experiment&lt;/strong&gt;: High-level project (e.g., "Customer Churn Prediction")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trial/Run&lt;/strong&gt;: Individual training attempt with specific parameters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run Details&lt;/strong&gt;: Automatically captured metadata including:

&lt;ul&gt;
&lt;li&gt;Input parameters and hyperparameters&lt;/li&gt;
&lt;li&gt;Dataset versions and locations&lt;/li&gt;
&lt;li&gt;Training metrics over time&lt;/li&gt;
&lt;li&gt;Model artifacts and outputs&lt;/li&gt;
&lt;li&gt;Instance configurations&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  Key Capabilities
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Automatic Tracking&lt;/strong&gt;: No manual logging—SageMaker captures training job details automatically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visual Comparison&lt;/strong&gt;: Side-by-side comparison of runs to identify best-performing models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reproducibility&lt;/strong&gt;: Trace any production model back to exact training conditions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance Auditing&lt;/strong&gt;: Document model lineage for regulatory requirements&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Important Migration Note
&lt;/h4&gt;

&lt;p&gt;SageMaker Experiments Classic is transitioning to &lt;strong&gt;MLflow integration&lt;/strong&gt;. New projects should use MLflow SDK for experiment tracking, which provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Industry-standard tracking format&lt;/li&gt;
&lt;li&gt;Broader ecosystem compatibility&lt;/li&gt;
&lt;li&gt;Enhanced UI in new SageMaker Studio experience&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Existing Experiments Classic data remains viewable, but new experiments should migrate to MLflow for future-proof tracking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Practical Impact
&lt;/h3&gt;

&lt;p&gt;These capabilities transform ML development from ad-hoc experimentation to systematic engineering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pipe mode&lt;/strong&gt; reduces S3 data transfer costs by 30-50% for large datasets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hyperparameter tuning&lt;/strong&gt; improves model accuracy by 5-15% with zero manual effort&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Experiment tracking&lt;/strong&gt; cuts model debugging time from hours to minutes by providing complete training history&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html" rel="noopener noreferrer"&gt;Amazon SageMaker Built-In Algorithms&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/experiments.html" rel="noopener noreferrer"&gt;Amazon SageMaker Experiments&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  7. AWS Glue: Intelligent Data Integration with Built-In Machine Learning
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; ⭐⭐⭐☆☆ (Intermediate)&lt;br&gt;
&lt;strong&gt;Exam Domain:&lt;/strong&gt; Domain 1 (Data Preparation - 28%)&lt;br&gt;
&lt;strong&gt;Exam Weight:&lt;/strong&gt; MEDIUM&lt;/p&gt;
&lt;h3&gt;
  
  
  What is AWS Glue?
&lt;/h3&gt;

&lt;p&gt;AWS Glue is a &lt;strong&gt;serverless data integration service&lt;/strong&gt; that simplifies the discovery, preparation, movement, and integration of data from multiple sources. Designed for analytics, machine learning, and application development, Glue consolidates complex data workflows into a unified, managed platform—eliminating infrastructure management while automatically scaling to handle any data volume.&lt;/p&gt;
&lt;h3&gt;
  
  
  Core Components
&lt;/h3&gt;
&lt;h4&gt;
  
  
  1. AWS Glue Data Catalog
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Centralized metadata repository&lt;/strong&gt; storing schema, location, and statistics for your datasets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic discovery&lt;/strong&gt; from 70+ data sources including S3, RDS, Redshift, DynamoDB, and on-premises databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Universal access&lt;/strong&gt;: Integrates seamlessly with Athena, EMR, Redshift Spectrum, and SageMaker for querying and analysis&lt;/li&gt;
&lt;li&gt;Acts as a "search engine" for your data lake, making datasets discoverable across your organization&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  2. ETL Jobs
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Visual job creation&lt;/strong&gt; via AWS Glue Studio (drag-and-drop interface)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple job types&lt;/strong&gt;: ETL (Extract-Transform-Load), ELT, and streaming data processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-generated code&lt;/strong&gt;: Glue generates optimized PySpark or Scala code based on visual transformations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Job engines&lt;/strong&gt;: Apache Spark for big data processing, AWS Glue Ray for Python-based ML workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serverless execution&lt;/strong&gt;: No cluster management—Glue provisions resources automatically&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  3. Crawlers
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schema inference&lt;/strong&gt;: Automatically scan data sources and detect table schemas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata population&lt;/strong&gt;: Populate the Data Catalog without manual schema definition&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schedule-based updates&lt;/strong&gt;: Run crawlers on schedules to keep catalog synchronized with evolving data&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Built-In Machine Learning: FindMatches Transform
&lt;/h3&gt;

&lt;p&gt;AWS Glue includes &lt;strong&gt;ML-powered data cleansing&lt;/strong&gt; capabilities through the &lt;strong&gt;FindMatches transform&lt;/strong&gt;, addressing one of data engineering's toughest challenges: identifying duplicate or related records without exact matching keys.&lt;/p&gt;
&lt;h4&gt;
  
  
  What is FindMatches?
&lt;/h4&gt;

&lt;p&gt;FindMatches uses machine learning to identify records that refer to the same entity, even when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Names are spelled differently ("John Doe" vs. "Johnny Doe")&lt;/li&gt;
&lt;li&gt;Addresses have variations ("123 Main St" vs. "123 Main Street")&lt;/li&gt;
&lt;li&gt;Data contains typos or inconsistencies&lt;/li&gt;
&lt;li&gt;Records lack unique identifiers like customer IDs&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Use Cases
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Customer Data Deduplication&lt;/strong&gt;: Merge customer records across CRM systems, marketing databases, and transaction logs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Product Catalog Harmonization&lt;/strong&gt;: Match products from different suppliers or internal systems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fraud Detection&lt;/strong&gt;: Identify suspicious patterns by linking seemingly different accounts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Address Standardization&lt;/strong&gt;: Normalize addresses across inconsistent formats&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entity Resolution&lt;/strong&gt;: Connect related entities in knowledge graphs or master data management&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;
  
  
  How FindMatches Works: The Training Process
&lt;/h4&gt;

&lt;p&gt;Unlike traditional rule-based matching, FindMatches &lt;strong&gt;learns&lt;/strong&gt; what constitutes a match based on your domain-specific labeling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Generate Labeling File&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Glue selects ~100 representative records from your dataset&lt;/li&gt;
&lt;li&gt;Divides them into 10 labeling sets for human review&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Label Training Data&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Review each labeling set and assign labels to indicate matches&lt;/li&gt;
&lt;li&gt;Records that match get the same label (e.g., "A")&lt;/li&gt;
&lt;li&gt;Non-matching records get different labels (e.g., "B", "C")&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example Labeling:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;labeling_set_id | label | first_name | last_name | birthday
SET001         | A     | John       | Doe       | 04/01/1980
SET001         | A     | Johnny     | Doe       | 04/01/1980
SET001         | B     | Jane       | Smith     | 04/03/1980
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the first two records are marked as matches (both labeled "A"), while the third is different (labeled "B").&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Train the Model&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Upload labeled files back to AWS Glue&lt;/li&gt;
&lt;li&gt;The ML algorithm learns patterns: which field differences matter, which don't&lt;/li&gt;
&lt;li&gt;Model improves through &lt;strong&gt;iterative training&lt;/strong&gt;—label more data, upload, retrain&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Apply Transform in ETL Jobs&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use the trained model in Glue Studio visual jobs or PySpark scripts&lt;/li&gt;
&lt;li&gt;Output includes a &lt;strong&gt;match_id&lt;/strong&gt; column grouping related records&lt;/li&gt;
&lt;li&gt;Optionally remove duplicates automatically&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Implementation in AWS Glue Studio
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Basic FindMatches Transform (PySpark):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;MyTransform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;glueContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dfc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;DynamicFrameCollection&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;dynf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dfc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dfc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;())[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglueml.transforms&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FindMatches&lt;/span&gt;

    &lt;span class="n"&gt;findmatches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FindMatches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dynf&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;transformId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;your-transform-id&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;DynamicFrameCollection&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FindMatches&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;findmatches&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;glueContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Incremental Matching:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For continuous data pipelines, use &lt;code&gt;FindIncrementalMatches&lt;/code&gt; to match new records against existing datasets without reprocessing everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;awsglueml.transforms&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FindIncrementalMatches&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FindIncrementalMatches&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;existingFrame&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;existing_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;incrementalFrame&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;new_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;transformId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;your-transform-id&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Technical Requirements
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Glue Version&lt;/strong&gt;: Requires AWS Glue 2.0 or later&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Job Type&lt;/strong&gt;: Works with Spark-based jobs (PySpark/Scala)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Structure&lt;/strong&gt;: Operates on Glue DynamicFrames&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output&lt;/strong&gt;: Adds match_id column; can filter duplicates downstream&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Benefits of AWS Glue
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Serverless Architecture&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No cluster provisioning, configuration, or tuning&lt;/li&gt;
&lt;li&gt;Automatic scaling from gigabytes to petabytes&lt;/li&gt;
&lt;li&gt;Pay only for resources consumed during job execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Integrated ML Capabilities&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No separate ML infrastructure needed&lt;/li&gt;
&lt;li&gt;Human-in-the-loop training for domain-specific matching&lt;/li&gt;
&lt;li&gt;Continuous improvement through iterative labeling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Unified Data Integration&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single platform for cataloging, transforming, and moving data&lt;/li&gt;
&lt;li&gt;Native integration with AWS analytics ecosystem (Athena, Redshift, QuickSight, SageMaker)&lt;/li&gt;
&lt;li&gt;Support for batch and streaming workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cost Efficiency&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pay-per-use pricing model&lt;/li&gt;
&lt;li&gt;No upfront costs or long-term commitments&lt;/li&gt;
&lt;li&gt;Reduced operational overhead compared to managing Spark clusters&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Best Practices
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start Small with Labeling&lt;/strong&gt;: Begin with 10-20 well-labeled records per set for initial training&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Consistent Matching Criteria&lt;/strong&gt;: Define clear rules for what constitutes a match before labeling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iterate and Evaluate&lt;/strong&gt;: Review FindMatches output, relabel edge cases, and retrain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leverage Incremental Matching&lt;/strong&gt;: For ongoing data feeds, use incremental mode to avoid reprocessing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor Job Metrics&lt;/strong&gt;: Use CloudWatch to track ETL job duration, data processed, and errors&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/glue/latest/dg/what-is-glue.html" rel="noopener noreferrer"&gt;What is AWS Glue?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/glue/latest/dg/find-matches-visual-job.html" rel="noopener noreferrer"&gt;AWS Glue FindMatches Visual Jobs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/glue/latest/dg/machine-learning.html" rel="noopener noreferrer"&gt;AWS Glue Machine Learning&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Optimizing Hyperparameter Tuning: Warm Start Strategies and Early Stopping
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; ⭐⭐⭐⭐☆ (Advanced)&lt;br&gt;
&lt;strong&gt;Exam Domain:&lt;/strong&gt; Domain 2 (ML Model Development - 26%)&lt;br&gt;
&lt;strong&gt;Exam Weight:&lt;/strong&gt; MEDIUM-HIGH&lt;/p&gt;
&lt;h3&gt;
  
  
  Warm Start Hyperparameter Tuning: Building on Previous Knowledge
&lt;/h3&gt;

&lt;p&gt;Hyperparameter tuning jobs can be expensive and time-consuming. &lt;strong&gt;Warm start&lt;/strong&gt; allows you to leverage knowledge from previous tuning jobs rather than starting from scratch, making the search process more efficient.&lt;/p&gt;
&lt;h4&gt;
  
  
  IDENTICAL_DATA_AND_ALGORITHM: Incremental Refinement
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Purpose&lt;/strong&gt;: Continue tuning on the exact same dataset and algorithm, refining your hyperparameter search space.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What You Can Change:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hyperparameter ranges (narrow or expand search boundaries)&lt;/li&gt;
&lt;li&gt;Maximum number of training jobs (increase budget)&lt;/li&gt;
&lt;li&gt;Convert hyperparameters between tunable and static&lt;/li&gt;
&lt;li&gt;Maximum concurrent jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What Must Stay the Same:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Training data (identical S3 location)&lt;/li&gt;
&lt;li&gt;Training algorithm (same Docker image/container)&lt;/li&gt;
&lt;li&gt;Objective metric&lt;/li&gt;
&lt;li&gt;Total count of static + tunable hyperparameters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Incremental Budget Increase&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First tuning job: 50 training jobs, find promising region&lt;/li&gt;
&lt;li&gt;Warm start job: Add 100 more jobs exploring that region&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Range Refinement&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parent job found best learning_rate between 0.1-0.15&lt;/li&gt;
&lt;li&gt;Warm start with narrowed range: 0.10-0.12&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Converting Parameters&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parent job: learning_rate was tunable, batch_size was static&lt;/li&gt;
&lt;li&gt;Warm start: Fix learning_rate at optimal value, make batch_size tunable&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Configuration Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.tuner&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;WarmStartConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WarmStartTypes&lt;/span&gt;

&lt;span class="n"&gt;warm_start_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WarmStartConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;warm_start_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;WarmStartTypes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IDENTICAL_DATA_AND_ALGORITHM&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;parents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;previous-tuning-job-name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tuner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HyperparameterTuner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;estimator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;xgboost_estimator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;objective_metric_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;validation:auc&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;hyperparameter_ranges&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;learning_rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ContinuousParameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.12&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;# Refined range
&lt;/span&gt;        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;max_depth&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;IntegerParameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;max_jobs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;warm_start_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;warm_start_config&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  TRANSFER_LEARNING: Adapting to New Scenarios
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Purpose&lt;/strong&gt;: Apply knowledge from previous tuning to related but different problems—new datasets, modified algorithms, or different problem variations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What You Can Change (Everything from IDENTICAL_DATA_AND_ALGORITHM plus):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input data (different dataset, different S3 location)&lt;/li&gt;
&lt;li&gt;Training algorithm image (different version or related algorithm)&lt;/li&gt;
&lt;li&gt;Hyperparameter ranges&lt;/li&gt;
&lt;li&gt;Number of training jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What Must Stay the Same:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Objective metric name and type (maximize/minimize)&lt;/li&gt;
&lt;li&gt;Total hyperparameter count (static + tunable)&lt;/li&gt;
&lt;li&gt;Hyperparameter types (continuous, integer, categorical)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Dataset Evolution&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parent job: Trained on 2023 customer data&lt;/li&gt;
&lt;li&gt;Transfer learning: Apply to 2024 customer data with evolved patterns&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Algorithm Migration&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parent job: XGBoost tuning&lt;/li&gt;
&lt;li&gt;Transfer learning: Apply learnings to LightGBM (similar gradient boosting)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cross-Domain Application&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parent job: Fraud detection for credit cards&lt;/li&gt;
&lt;li&gt;Transfer learning: Fraud detection for insurance claims (similar problem structure)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Configuration Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;warm_start_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WarmStartConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;warm_start_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;WarmStartTypes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TRANSFER_LEARNING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;parents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;credit-card-fraud-tuning-job&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Now tuning on insurance data with similar hyperparameters
&lt;/span&gt;&lt;span class="n"&gt;insurance_tuner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HyperparameterTuner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;estimator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lightgbm_estimator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Different algorithm
&lt;/span&gt;    &lt;span class="n"&gt;objective_metric_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;validation:auc&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Same metric
&lt;/span&gt;    &lt;span class="n"&gt;hyperparameter_ranges&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;learning_rate&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ContinuousParameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;num_leaves&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;IntegerParameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;warm_start_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;warm_start_config&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Warm Start Constraints
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;For Both Types:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maximum &lt;strong&gt;5 parent jobs&lt;/strong&gt; can be referenced&lt;/li&gt;
&lt;li&gt;All parent jobs must be &lt;strong&gt;completed&lt;/strong&gt; (terminal state)&lt;/li&gt;
&lt;li&gt;Maximum &lt;strong&gt;10 changes&lt;/strong&gt; between static/tunable parameters across all parent jobs&lt;/li&gt;
&lt;li&gt;Hyperparameter types cannot change (continuous stays continuous)&lt;/li&gt;
&lt;li&gt;Cannot chain warm starts recursively (warm start from a warm start job)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Performance Considerations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Warm start jobs have &lt;strong&gt;longer startup times&lt;/strong&gt; (proportional to parent job count)&lt;/li&gt;
&lt;li&gt;Trade-off: Slower start but potentially better final model with fewer total jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Early Stopping: Cutting Losses Quickly
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: Some hyperparameter combinations are clearly poor performers—continuing training wastes compute resources.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Early stopping automatically terminates underperforming training jobs before completion.&lt;/p&gt;

&lt;h4&gt;
  
  
  How It Works
&lt;/h4&gt;

&lt;p&gt;After each training epoch, SageMaker:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Retrieves current job's objective metric&lt;/li&gt;
&lt;li&gt;Calculates running averages of all previous jobs' metrics at the same epoch&lt;/li&gt;
&lt;li&gt;Computes the &lt;strong&gt;median&lt;/strong&gt; of those running averages&lt;/li&gt;
&lt;li&gt;Stops current job if its metric is &lt;strong&gt;worse than the median&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Logic&lt;/strong&gt;: If a job is performing below average compared to previous jobs at the same training stage, it's unlikely to catch up—stop it early.&lt;/p&gt;

&lt;h4&gt;
  
  
  Configuration
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Boto3 SDK:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tuning_job_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;TrainingJobEarlyStoppingType&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;AUTO&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;SageMaker Python SDK:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tuner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HyperparameterTuner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;estimator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;objective_metric_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;validation:f1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;hyperparameter_ranges&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;hyperparameter_ranges&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;early_stopping_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Auto&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;  &lt;span class="c1"&gt;# Enable early stopping
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Supported Algorithms
&lt;/h4&gt;

&lt;p&gt;Built-in algorithms with early stopping support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;XGBoost, LightGBM, CatBoost&lt;/li&gt;
&lt;li&gt;AutoGluon-Tabular&lt;/li&gt;
&lt;li&gt;Linear Learner&lt;/li&gt;
&lt;li&gt;Image Classification, Object Detection&lt;/li&gt;
&lt;li&gt;Sequence-to-Sequence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Custom Algorithm Requirements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Must emit objective metrics &lt;strong&gt;after each epoch&lt;/strong&gt; (not just at end)&lt;/li&gt;
&lt;li&gt;TensorFlow: Use callbacks to log metrics&lt;/li&gt;
&lt;li&gt;PyTorch: Manually log metrics via CloudWatch&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Benefits
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost Reduction&lt;/strong&gt;: Stop bad jobs early (15-30% cost savings typical)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faster Tuning&lt;/strong&gt;: More budget for promising hyperparameter combinations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overfitting Prevention&lt;/strong&gt;: Stops jobs that aren't improving&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Difference: Warm Start vs. Early Stopping
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Warm Start&lt;/th&gt;
&lt;th&gt;Early Stopping&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scope&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Across multiple tuning jobs&lt;/td&gt;
&lt;td&gt;Within a single tuning job&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Purpose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Leverage previous tuning knowledge&lt;/td&gt;
&lt;td&gt;Stop individual bad training jobs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;When Applied&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;At tuning job start&lt;/td&gt;
&lt;td&gt;During training job execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Benefit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Better hyperparameter exploration&lt;/td&gt;
&lt;td&gt;Reduced per-job cost&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Combined Strategy&lt;/strong&gt;: Use both together—warm start from previous successful tuning job with early stopping enabled to maximize efficiency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-warm-start.html" rel="noopener noreferrer"&gt;SageMaker Warm Start Hyperparameter Tuning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-early-stopping.html" rel="noopener noreferrer"&gt;SageMaker Automatic Model Tuning Early Stopping&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Hyperparameter Tuning: Bayesian Optimization &amp;amp; Random Seeds
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; ⭐⭐⭐⭐☆ (Advanced)&lt;br&gt;
&lt;strong&gt;Exam Domain:&lt;/strong&gt; Domain 2 (ML Model Development - 26%)&lt;br&gt;
&lt;strong&gt;Exam Weight:&lt;/strong&gt; MEDIUM&lt;/p&gt;
&lt;h3&gt;
  
  
  Bayesian Optimization Strategy
&lt;/h3&gt;
&lt;h4&gt;
  
  
  What It Is
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Intelligent search&lt;/strong&gt; that treats hyperparameter tuning as a &lt;strong&gt;regression problem&lt;/strong&gt;. Learns from previous training job results to select next hyperparameter combinations. More efficient than random or grid search.&lt;/p&gt;
&lt;h4&gt;
  
  
  How It Works
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Trains model with initial hyperparameter set&lt;/li&gt;
&lt;li&gt;Evaluates objective metric (e.g., validation accuracy)&lt;/li&gt;
&lt;li&gt;Uses regression to &lt;strong&gt;predict&lt;/strong&gt; which hyperparameters will perform best&lt;/li&gt;
&lt;li&gt;Selects next combination based on predictions&lt;/li&gt;
&lt;li&gt;Repeats process, continuously learning&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;
  
  
  Exploration vs Exploitation
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exploitation&lt;/strong&gt;: Choose values &lt;strong&gt;close to previous best&lt;/strong&gt; results (refine known good regions)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exploration&lt;/strong&gt;: Choose values &lt;strong&gt;far from previous attempts&lt;/strong&gt; (discover new optimal regions)&lt;/li&gt;
&lt;li&gt;Balances both to find global optimum efficiently&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  vs Random Search
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Random Search&lt;/strong&gt;: Selects hyperparameters randomly, ignores previous results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bayesian Optimization&lt;/strong&gt;: Learns from history, adapts strategy dynamically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benefit&lt;/strong&gt;: Finds optimal hyperparameters with &lt;strong&gt;fewer training jobs&lt;/strong&gt; (lower cost/time)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Random Seeds for Reproducibility
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Purpose
&lt;/h4&gt;

&lt;p&gt;Ensures &lt;strong&gt;reproducible hyperparameter configurations&lt;/strong&gt; across tuning runs. Critical for experimental consistency and debugging.&lt;/p&gt;
&lt;h4&gt;
  
  
  Reproducibility by Strategy
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tuning Strategy&lt;/th&gt;
&lt;th&gt;Reproducibility with Same Seed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Random Search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Up to &lt;strong&gt;100%&lt;/strong&gt; reproducible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hyperband&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Up to &lt;strong&gt;100%&lt;/strong&gt; reproducible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bayesian Optimization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Improved&lt;/strong&gt; (not guaranteed full)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h4&gt;
  
  
  Best Practices
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Specify &lt;strong&gt;fixed integer seed&lt;/strong&gt; (e.g., &lt;code&gt;RandomSeed=42&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;same seed&lt;/strong&gt; across experimental runs for comparison&lt;/li&gt;
&lt;li&gt;Document seed values in experiment logs&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Implementation
&lt;/h4&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tuning_job_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Strategy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Bayesian&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;RandomSeed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Fixed seed for reproducibility
&lt;/span&gt;    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;HyperParameterTuningJobObjective&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Maximize&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MetricName&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;validation:accuracy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  Exam Tips
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Bayesian Optimization:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Learns from previous jobs&lt;/strong&gt; (vs random search which doesn't)&lt;/li&gt;
&lt;li&gt;Uses &lt;strong&gt;regression&lt;/strong&gt; to predict best next hyperparameters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exploitation&lt;/strong&gt; = refine known good areas; &lt;strong&gt;Exploration&lt;/strong&gt; = try new areas&lt;/li&gt;
&lt;li&gt;More &lt;strong&gt;efficient&lt;/strong&gt; than random/grid search (fewer jobs needed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Random Seeds:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Random/Hyperband&lt;/strong&gt;: &lt;strong&gt;100% reproducible&lt;/strong&gt; with same seed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bayesian&lt;/strong&gt;: &lt;strong&gt;Improved&lt;/strong&gt; reproducibility (not perfect)&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;consistent integer seed&lt;/strong&gt; for experimental reproducibility&lt;/li&gt;
&lt;li&gt;Critical for debugging and comparing tuning runs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-considerations.html#automatic-model-tuning-random-seed" rel="noopener noreferrer"&gt;SageMaker Automatic Model Tuning - Random Seeds&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html#automatic-tuning-bayesian-optimization" rel="noopener noreferrer"&gt;SageMaker Bayesian Optimization&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  10. Amazon Bedrock Model Customization: Exam Essentials
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; ⭐⭐⭐☆☆ (Intermediate-Advanced)&lt;br&gt;
&lt;strong&gt;Exam Domain:&lt;/strong&gt; Domain 2 (ML Model Development - 26%)&lt;br&gt;
&lt;strong&gt;Exam Weight:&lt;/strong&gt; MEDIUM (Emerging topic)&lt;/p&gt;
&lt;h3&gt;
  
  
  Customization Methods
&lt;/h3&gt;
&lt;h4&gt;
  
  
  1. Supervised Fine-Tuning
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Uses &lt;strong&gt;labeled training data&lt;/strong&gt; (input-output pairs)&lt;/li&gt;
&lt;li&gt;Adjusts model parameters for specific tasks&lt;/li&gt;
&lt;li&gt;Best for domain-specific applications&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  2. Continued Pre-Training
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Uses &lt;strong&gt;unlabeled data&lt;/strong&gt; to expand domain knowledge&lt;/li&gt;
&lt;li&gt;Incorporates private/proprietary data&lt;/li&gt;
&lt;li&gt;Best for adapting models to specialized domains&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  3. Distillation
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Transfer knowledge from &lt;strong&gt;large teacher model&lt;/strong&gt; to &lt;strong&gt;smaller student model&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Reduces model size while maintaining performance&lt;/li&gt;
&lt;li&gt;Cost-effective deployment&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  4. Reinforcement Fine-Tuning
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Uses &lt;strong&gt;reward functions&lt;/strong&gt; and feedback-based learning&lt;/li&gt;
&lt;li&gt;Improves alignment and response quality&lt;/li&gt;
&lt;li&gt;Can leverage invocation logs&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Model Customization Workflow
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Step 1: Prepare Dataset
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Create &lt;strong&gt;labeled dataset&lt;/strong&gt; in &lt;strong&gt;JSON Lines (JSONL)&lt;/strong&gt; format&lt;/li&gt;
&lt;li&gt;Structure as input-output pairs for supervised fine-tuning&lt;/li&gt;
&lt;li&gt;Optional: Prepare validation dataset for performance evaluation&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Step 2: Configure IAM Permissions
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Create IAM role with &lt;strong&gt;S3 bucket access&lt;/strong&gt; for training/validation data&lt;/li&gt;
&lt;li&gt;Or use existing role with appropriate permissions&lt;/li&gt;
&lt;li&gt;Ensure role can read from input S3 and write to output S3&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Step 3: Security Configuration (Optional)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Set up &lt;strong&gt;KMS keys&lt;/strong&gt; for data encryption at rest&lt;/li&gt;
&lt;li&gt;Configure &lt;strong&gt;VPC&lt;/strong&gt; for secure network communication&lt;/li&gt;
&lt;li&gt;Protect sensitive training data&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Step 4: Start Training Job
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Choose customization method (fine-tuning or continued pre-training)&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;base model&lt;/strong&gt; (foundation or previously customized)&lt;/li&gt;
&lt;li&gt;Configure &lt;strong&gt;hyperparameters&lt;/strong&gt;: epochs, batch size, learning rate&lt;/li&gt;
&lt;li&gt;Specify training/validation data S3 locations&lt;/li&gt;
&lt;li&gt;Define output data S3 location&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Step 5: Evaluate Model
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Monitor training and validation metrics&lt;/li&gt;
&lt;li&gt;Assess model performance improvements&lt;/li&gt;
&lt;li&gt;Run model evaluation jobs if needed&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Step 6: Buy Provisioned Throughput
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Purchase dedicated compute capacity for &lt;strong&gt;high-throughput deployment&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Ensures consistent performance under expected load&lt;/li&gt;
&lt;li&gt;Required for production-scale custom model inference&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Step 7: Deploy and Use
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Deploy customized model in Amazon Bedrock&lt;/li&gt;
&lt;li&gt;Invoke for inference tasks using model ARN&lt;/li&gt;
&lt;li&gt;Model now has enhanced, tailored capabilities&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Using Custom Models
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Two Deployment Options
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;1. Provisioned Throughput&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dedicated compute capacity&lt;/li&gt;
&lt;li&gt;Guaranteed performance/lower latency&lt;/li&gt;
&lt;li&gt;Best for high-volume, predictable workloads&lt;/li&gt;
&lt;li&gt;Requires upfront commitment (purchased in Step 6)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. On-Demand Inference&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pay-per-use pricing&lt;/li&gt;
&lt;li&gt;No pre-provisioned resources&lt;/li&gt;
&lt;li&gt;Invoke using custom model ARN&lt;/li&gt;
&lt;li&gt;Best for variable/unpredictable workloads&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Key Configuration Requirements
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Training Data Format
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;JSONL (JSON Lines)&lt;/strong&gt; for structured input-output pairs&lt;/p&gt;

&lt;p&gt;Example fine-tuning record:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Classify sentiment:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"completion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"positive"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  IAM Requirements
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Read permissions on training/validation S3 buckets&lt;/li&gt;
&lt;li&gt;Write permissions on output S3 bucket&lt;/li&gt;
&lt;li&gt;Trust relationship with Bedrock service&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Job Duration Factors
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Training data size and record count&lt;/li&gt;
&lt;li&gt;Input/output token counts&lt;/li&gt;
&lt;li&gt;Number of epochs&lt;/li&gt;
&lt;li&gt;Batch size configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Exam Tips
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Training data format: &lt;strong&gt;JSONL (JSON Lines)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Fine-tuning = &lt;strong&gt;labeled data&lt;/strong&gt;; Continued pre-training = &lt;strong&gt;unlabeled data&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Custom models require &lt;strong&gt;IAM role&lt;/strong&gt; with S3 access&lt;/li&gt;
&lt;li&gt;Security: Optional &lt;strong&gt;KMS encryption&lt;/strong&gt; and &lt;strong&gt;VPC configuration&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Two inference options: &lt;strong&gt;Provisioned Throughput&lt;/strong&gt; (predictable/high-volume) vs &lt;strong&gt;On-Demand&lt;/strong&gt; (flexible/variable)&lt;/li&gt;
&lt;li&gt;Workflow: Prepare data → Configure IAM → Train → Evaluate → Buy throughput → Deploy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provisioned Throughput required&lt;/strong&gt; for production high-volume deployments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/custom-models.html" rel="noopener noreferrer"&gt;Bedrock Custom Models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-submit.html" rel="noopener noreferrer"&gt;Submit Customization Jobs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-use.html" rel="noopener noreferrer"&gt;Use Customized Models&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  11. SageMaker Batch Transform: Exam Essentials
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; ⭐⭐⭐☆☆ (Intermediate)&lt;br&gt;
&lt;strong&gt;Exam Domain:&lt;/strong&gt; Domain 3 (Deployment &amp;amp; Orchestration - 22%)&lt;br&gt;
&lt;strong&gt;Exam Weight:&lt;/strong&gt; MEDIUM-HIGH&lt;/p&gt;
&lt;h3&gt;
  
  
  What is Batch Transform?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Offline inference service&lt;/strong&gt; for running predictions on large datasets &lt;strong&gt;without maintaining a persistent endpoint&lt;/strong&gt;. Ideal for preprocessing, large-scale inference, and scenarios where real-time predictions aren't needed.&lt;/p&gt;
&lt;h3&gt;
  
  
  When to Use
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Batch Transform&lt;/strong&gt;: Large datasets, offline inference, periodic predictions, no real-time requirement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-Time Endpoints&lt;/strong&gt;: Low-latency responses, interactive applications, continuous availability&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Key Configuration Parameters
&lt;/h3&gt;
&lt;h4&gt;
  
  
  1. Data Splitting
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;SplitType&lt;/code&gt;: Set to &lt;code&gt;Line&lt;/code&gt; to split files into mini-batches&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;BatchStrategy&lt;/code&gt;: Controls how records are batched (&lt;code&gt;MultiRecord&lt;/code&gt; or &lt;code&gt;SingleRecord&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  2. Payload Management
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;MaxPayloadInMB&lt;/code&gt;: Maximum mini-batch size (max &lt;strong&gt;100 MB&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Critical constraint&lt;/strong&gt;: &lt;code&gt;(MaxConcurrentTransforms × MaxPayloadInMB) ≤ 100 MB&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Set to &lt;code&gt;0&lt;/code&gt; for streaming large datasets (not supported by built-in algorithms)&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  3. Parallelization
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;MaxConcurrentTransforms&lt;/code&gt;: Parallel processing threads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best practice&lt;/strong&gt;: Set equal to number of compute workers&lt;/li&gt;
&lt;li&gt;SageMaker automatically partitions S3 objects across instances&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Processing Large Datasets
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Multiple Files&lt;/strong&gt;: Automatically distributed across instances by S3 key&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Single Large File&lt;/strong&gt;: Only one instance processes it (inefficient—split files beforehand)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MaxPayloadInMB&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MaxConcurrentTransforms&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Must satisfy: 2×50 ≤ 100
&lt;/span&gt;    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SplitType&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Line&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BatchStrategy&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MultiRecord&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Input/Output Behavior
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input&lt;/strong&gt;: CSV files in S3&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output&lt;/strong&gt;: &lt;code&gt;.out&lt;/code&gt; files in S3 (preserves input record order)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Association&lt;/strong&gt;: Can join predictions with original input using &lt;code&gt;DataCaptureConfig&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Exam Tips
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Batch Transform = &lt;strong&gt;no persistent endpoint&lt;/strong&gt; (cost-effective for periodic inference)&lt;/li&gt;
&lt;li&gt;Max payload = &lt;strong&gt;100 MB&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Multiple small files &amp;gt; one large file (better parallelization)&lt;/li&gt;
&lt;li&gt;Output maintains input order&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html" rel="noopener noreferrer"&gt;SageMaker Batch Transform&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  12. SageMaker Inference Recommender: Exam Essentials
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; ⭐⭐⭐☆☆ (Intermediate)&lt;br&gt;
&lt;strong&gt;Exam Domain:&lt;/strong&gt; Domain 3 (Deployment &amp;amp; Orchestration - 22%)&lt;br&gt;
&lt;strong&gt;Exam Weight:&lt;/strong&gt; MEDIUM&lt;/p&gt;
&lt;h3&gt;
  
  
  Two Job Types
&lt;/h3&gt;
&lt;h4&gt;
  
  
  1. Default Job (Quick Recommendations)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Duration&lt;/strong&gt;: ~45 minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Input&lt;/strong&gt;: Model package ARN only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Purpose&lt;/strong&gt;: Automated instance type recommendations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output&lt;/strong&gt;: Top instance recommendations with cost/latency metrics&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  2. Advanced Job (Custom Load Testing)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Duration&lt;/strong&gt;: ~2 hours average&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Input&lt;/strong&gt;: Custom traffic patterns, specific instance types, latency/throughput requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Purpose&lt;/strong&gt;: Detailed benchmarking for production workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Can test&lt;/strong&gt;: Up to 10 instance types per job&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Key Configuration Parameters
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Traffic Patterns
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Phases&lt;/strong&gt;: Users spawned at specified rate every minute&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stairs&lt;/strong&gt;: Users added incrementally at timed intervals&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Stopping Conditions
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Max invocations threshold&lt;/li&gt;
&lt;li&gt;Model latency thresholds (e.g., P95 &amp;lt; 100ms)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Metrics Collected
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Performance
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Model latency (P50, P95, P99)&lt;/li&gt;
&lt;li&gt;Maximum invocations per minute&lt;/li&gt;
&lt;li&gt;CPU/Memory utilization&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Cost
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Cost per hour&lt;/li&gt;
&lt;li&gt;Cost per inference&lt;/li&gt;
&lt;li&gt;Initial instance count for autoscaling&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Serverless-Specific
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Max concurrency&lt;/li&gt;
&lt;li&gt;Memory size configuration&lt;/li&gt;
&lt;li&gt;Model setup time&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Exam Tips
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't need both job types&lt;/strong&gt;—choose based on requirements&lt;/li&gt;
&lt;li&gt;Default = quick automated recommendations&lt;/li&gt;
&lt;li&gt;Advanced = custom production-like testing&lt;/li&gt;
&lt;li&gt;Supports &lt;strong&gt;both real-time and serverless endpoints&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Output includes &lt;strong&gt;top 5 recommendations&lt;/strong&gt; with confidence scores&lt;/li&gt;
&lt;li&gt;Used to &lt;strong&gt;optimize deployment configuration&lt;/strong&gt; before production&lt;/li&gt;
&lt;li&gt;Helps estimate &lt;strong&gt;infrastructure costs&lt;/strong&gt; for model inference&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender-load-test.html" rel="noopener noreferrer"&gt;SageMaker Inference Recommender Load Testing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/inference-recommender-recommendation-jobs.html" rel="noopener noreferrer"&gt;SageMaker Inference Recommender Recommendation Jobs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  13. Amazon SageMaker Serverless Inference: On-Demand and Provisioned Concurrency
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; ⭐⭐⭐⭐☆ (Advanced)&lt;br&gt;
&lt;strong&gt;Exam Domain:&lt;/strong&gt; Domain 3 (Deployment &amp;amp; Orchestration - 22%)&lt;br&gt;
&lt;strong&gt;Exam Weight:&lt;/strong&gt; MEDIUM&lt;/p&gt;
&lt;h3&gt;
  
  
  What is SageMaker Serverless Inference?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Amazon SageMaker Serverless Inference&lt;/strong&gt; is designed specifically for deploying and scaling machine learning models &lt;strong&gt;without the hassle of configuring or managing underlying infrastructure&lt;/strong&gt;. This fully managed deployment option is perfect for workloads with &lt;strong&gt;intermittent traffic&lt;/strong&gt; that can handle cold starts. Serverless endpoints automatically initiate and adjust compute resources based on traffic demand, removing the need to select instance types or manage scaling policies.&lt;/p&gt;
&lt;h3&gt;
  
  
  Key Characteristics
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Automatic Infrastructure Management
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Automatically provisions and scales compute resources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scales to zero&lt;/strong&gt; during idle periods (no traffic = no cost)&lt;/li&gt;
&lt;li&gt;No instance type selection or scaling policy configuration required&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Cost-Effective Pricing
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pay-per-use model&lt;/strong&gt;: Charged only for actual compute time and data processed&lt;/li&gt;
&lt;li&gt;Billed by millisecond&lt;/li&gt;
&lt;li&gt;Significant cost savings for sporadic workloads&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Technical Specifications
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory Options&lt;/strong&gt;: 1 GB to 6 GB (1024 MB to 6144 MB)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maximum Container Size&lt;/strong&gt;: 10 GB&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Concurrent Invocation Limits&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;1,000 concurrent invocations (major regions)&lt;/li&gt;
&lt;li&gt;500 concurrent invocations (smaller regions)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maximum Endpoint Concurrency&lt;/strong&gt;: 200 per endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maximum Endpoints&lt;/strong&gt;: 50 per region&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  MaxConcurrency Parameter: Managing Request Flow
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;MaxConcurrency parameter&lt;/strong&gt; determines the &lt;strong&gt;maximum number of requests the endpoint can handle concurrently&lt;/strong&gt;. This critical configuration allows fine-tuning to match processing capacity and traffic patterns.&lt;/p&gt;
&lt;h4&gt;
  
  
  Configuration Examples
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;MaxConcurrency = 1&lt;/strong&gt;: Processes requests &lt;strong&gt;sequentially&lt;/strong&gt; (one at a time)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use case: Models requiring exclusive resource access or single-threaded processing&lt;/li&gt;
&lt;li&gt;Ensures predictable per-request latency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;MaxConcurrency = 50&lt;/strong&gt;: Processes up to 50 requests &lt;strong&gt;simultaneously&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use case: Lightweight models that can share resources efficiently&lt;/li&gt;
&lt;li&gt;Higher throughput for burst traffic&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Benefits
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Efficient handling of traffic bursts during peak periods&lt;/li&gt;
&lt;li&gt;Minimized costs during low-traffic periods&lt;/li&gt;
&lt;li&gt;Fine-grained control over concurrency behavior&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Understanding Cold Starts
&lt;/h3&gt;
&lt;h4&gt;
  
  
  What is a Cold Start?
&lt;/h4&gt;

&lt;p&gt;Cold starts occur when:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Serverless endpoint receives no traffic for a period and scales to zero&lt;/li&gt;
&lt;li&gt;New requests arrive, requiring compute resources to spin up&lt;/li&gt;
&lt;li&gt;Concurrent requests exceed current capacity, triggering additional resource provisioning&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;
  
  
  Cold Start Duration Factors
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Model size and download time from S3&lt;/li&gt;
&lt;li&gt;Container image size and startup time&lt;/li&gt;
&lt;li&gt;Memory configuration&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Monitoring Cold Starts
&lt;/h4&gt;

&lt;p&gt;Use CloudWatch &lt;code&gt;OverheadLatency&lt;/code&gt; metric to track cold start times and optimize configurations.&lt;/p&gt;
&lt;h3&gt;
  
  
  Provisioned Concurrency: Eliminating Cold Starts
&lt;/h3&gt;

&lt;p&gt;Announced in &lt;strong&gt;May 2023&lt;/strong&gt;, &lt;strong&gt;Provisioned Concurrency&lt;/strong&gt; for SageMaker Serverless Inference mitigates cold starts and provides &lt;strong&gt;predictable performance characteristics&lt;/strong&gt; by keeping endpoints &lt;strong&gt;warm and ready&lt;/strong&gt; to respond instantaneously.&lt;/p&gt;
&lt;h4&gt;
  
  
  How Provisioned Concurrency Works
&lt;/h4&gt;

&lt;p&gt;SageMaker ensures that for the number of &lt;strong&gt;Provisioned Concurrency allocated&lt;/strong&gt;, compute resources are &lt;strong&gt;initialized and ready to respond within milliseconds&lt;/strong&gt;—eliminating the delay associated with cold starts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;serverless_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MemorySizeInMB&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;MaxConcurrency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ProvisionedConcurrency&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;  &lt;span class="c1"&gt;# Keep 5 instances warm
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Interpretation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Up to &lt;strong&gt;20 concurrent requests&lt;/strong&gt; total (MaxConcurrency)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5 instances always warm&lt;/strong&gt; (Provisioned Concurrency)&lt;/li&gt;
&lt;li&gt;Requests 1-5: &lt;strong&gt;No cold start&lt;/strong&gt; (instant response)&lt;/li&gt;
&lt;li&gt;Requests 6-20: May experience cold start if scaling needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Use Cases for Provisioned Concurrency
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Ideal For:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Predictable traffic bursts&lt;/strong&gt;: Morning rush hours, scheduled batch jobs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency-sensitive applications&lt;/strong&gt;: Customer-facing APIs with SLA requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost-effective predictable workloads&lt;/strong&gt;: Balance between on-demand (high latency) and fully provisioned endpoints (high cost)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Integration with Auto Scaling
&lt;/h4&gt;

&lt;p&gt;Provisioned Concurrency integrates with &lt;strong&gt;Application Auto Scaling&lt;/strong&gt;, enabling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Schedule-based scaling&lt;/strong&gt;: Increase provisioned concurrency during business hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Target metric scaling&lt;/strong&gt;: Automatically adjust based on invocation rates or latency&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pricing Considerations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Standard Serverless Pricing:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Charged only for compute time during inference&lt;/li&gt;
&lt;li&gt;No charges when idle (scaled to zero)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Provisioned Concurrency Pricing:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Additional charge&lt;/strong&gt; for keeping instances warm&lt;/li&gt;
&lt;li&gt;Pay for provisioned capacity even during idle periods&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trade-off&lt;/strong&gt;: Higher baseline cost for lower latency&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to Use Each Option
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Recommended Option&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sporadic, unpredictable traffic&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Standard Serverless&lt;/strong&gt; (on-demand)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intermittent with tolerable cold starts&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Standard Serverless&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Predictable bursts, latency-sensitive&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Provisioned Concurrency&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Consistently high traffic&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Real-time endpoints&lt;/strong&gt; (provisioned instances)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No GPU support&lt;/strong&gt; (CPU-only)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No Multi-Model Endpoints&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Limited VPC configurations&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Cannot directly convert real-time endpoints to serverless&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Best Practices
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Choose appropriate memory&lt;/strong&gt;: Match or exceed model size&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set MaxConcurrency&lt;/strong&gt;: Based on expected concurrent requests and model capacity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Provisioned Concurrency&lt;/strong&gt;: For latency-sensitive, predictable workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor metrics&lt;/strong&gt;: Track &lt;code&gt;OverheadLatency&lt;/code&gt;, invocation counts, and errors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmark performance&lt;/strong&gt;: Test different memory/concurrency configurations&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html" rel="noopener noreferrer"&gt;SageMaker Serverless Inference Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/blogs/machine-learning/announcing-provisioned-concurrency-for-amazon-sagemaker-serverless-inference/" rel="noopener noreferrer"&gt;Announcing Provisioned Concurrency for SageMaker Serverless Inference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://aws.amazon.com/about-aws/whats-new/2023/05/provisioned-concurrency-amazon-sagemaker-serverless-inference/" rel="noopener noreferrer"&gt;AWS Announcement: Provisioned Concurrency&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  14. Securing Your SageMaker Workflows: Understanding IAM Roles and S3 Access Policies
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; ⭐⭐⭐⭐☆ (Advanced)&lt;br&gt;
&lt;strong&gt;Exam Domain:&lt;/strong&gt; Domain 4 (Monitoring, Maintenance &amp;amp; Security - 24%)&lt;br&gt;
&lt;strong&gt;Exam Weight:&lt;/strong&gt; HIGH&lt;/p&gt;
&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Amazon SageMaker&lt;/strong&gt; is a fully managed machine learning service that enables developers and data scientists to build, train, and deploy ML models at scale. Security is paramount when building ML workflows in AWS. Two critical components govern access control in SageMaker environments: &lt;strong&gt;S3 Access Policies&lt;/strong&gt; and &lt;strong&gt;SageMaker IAM Execution Roles&lt;/strong&gt;. Understanding how these work together ensures your data remains secure while enabling SageMaker to perform necessary operations.&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS S3 Access Policy Language: The Foundation of Resource Control
&lt;/h3&gt;
&lt;h4&gt;
  
  
  What Are Access Policies?
&lt;/h4&gt;

&lt;p&gt;S3 access policies are JSON-based documents that control who can access your S3 resources (buckets and objects) and what actions they can perform. They serve as the gatekeeper for your data stored in S3.&lt;/p&gt;
&lt;h4&gt;
  
  
  Core Policy Components
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;1. Resource&lt;/strong&gt;: Identifies the S3 resource using Amazon Resource Names (ARNs)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bucket: &lt;code&gt;arn:aws:s3:::bucket_name&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;All objects: &lt;code&gt;arn:aws:s3:::bucket_name/*&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Specific prefix: &lt;code&gt;arn:aws:s3:::bucket_name/prefix/*&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Actions&lt;/strong&gt;: Defines specific operations&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;s3:ListBucket&lt;/code&gt; - View bucket contents&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;s3:GetObject&lt;/code&gt; - Read objects&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;s3:PutObject&lt;/code&gt; - Write objects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Effect&lt;/strong&gt;: Determines whether to &lt;code&gt;Allow&lt;/code&gt; or &lt;code&gt;Deny&lt;/code&gt; access&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explicit denials always override allows&lt;/li&gt;
&lt;li&gt;Default behavior is implicit denial&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Principal&lt;/strong&gt;: Specifies who receives the permission (AWS account, IAM user, role, or service)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Condition&lt;/strong&gt; (Optional): Rules that specify when the policy applies using condition keys&lt;/p&gt;
&lt;h4&gt;
  
  
  Policy Types
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Bucket Policies&lt;/strong&gt;: Attached directly to S3 buckets for cross-account access and bucket-level controls&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;IAM Policies&lt;/strong&gt;: Attached to IAM users/roles for granular permissions across AWS services&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Policy:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Principal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"AWS"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:iam::123456789012:user/DataScientist"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="s2"&gt;"s3:GetObject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="s2"&gt;"s3:ListBucket"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:s3:::ml-datasets/*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:s3:::ml-datasets"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  SageMaker IAM Execution Roles: Enabling Service Operations
&lt;/h3&gt;

&lt;h4&gt;
  
  
  What Are Execution Roles?
&lt;/h4&gt;

&lt;p&gt;SageMaker execution roles are &lt;strong&gt;IAM roles&lt;/strong&gt; that grant SageMaker permission to access AWS services on your behalf. They're essential for operations like reading training data from S3, writing model artifacts, pushing logs to CloudWatch, and pulling container images from ECR. The execution role ensures that SageMaker components (notebooks, training jobs, Studio domains) have the necessary permissions to perform tasks while following the &lt;strong&gt;principle of least privilege&lt;/strong&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Trust Relationship Requirement
&lt;/h4&gt;

&lt;p&gt;Every SageMaker execution role requires a trust policy allowing SageMaker service to assume the role:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Principal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Service"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sagemaker.amazonaws.com"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sts:AssumeRole"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Role Types by SageMaker Component
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Notebook Instance Role&lt;/strong&gt;: ECR, S3, CloudWatch access; create/manage training jobs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training Job Role&lt;/strong&gt;: S3 input/output, ECR image pull, CloudWatch logging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SageMaker Studio Domain Role&lt;/strong&gt;: Customizable permissions for specific domains&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Key Permissions
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;S3 Access&lt;/strong&gt;: Read input data, write output results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CloudWatch&lt;/strong&gt;: Push metrics and create log streams&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ECR&lt;/strong&gt;: Pull container images for processing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VPC&lt;/strong&gt; (if applicable): Create network interfaces for private subnets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;KMS&lt;/strong&gt; (if applicable): Encrypt/decrypt data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example Execution Role Policy:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"cloudwatch:PutMetricData"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"logs:CreateLogStream"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"logs:PutLogEvents"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"s3:GetObject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"s3:PutObject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"ecr:GetAuthorizationToken"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"ecr:BatchCheckLayerAvailability"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"ecr:GetDownloadUrlForLayer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                &lt;/span&gt;&lt;span class="s2"&gt;"ecr:BatchGetImage"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"s3:ListBucket"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:s3:::sagemaker-data-bucket"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Inline Policies for Domain-Specific Access Control
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Why Inline Policies?
&lt;/h4&gt;

&lt;p&gt;By creating an &lt;strong&gt;inline policy&lt;/strong&gt; for the execution role of the SageMaker Studio domain, administrators can customize permissions specific to that domain without affecting other domains or users within the environment. This approach is particularly useful in &lt;strong&gt;shared environments&lt;/strong&gt; where multiple teams operate within the same SageMaker Studio instance but require different levels of access.&lt;/p&gt;

&lt;p&gt;The inline policy is attached &lt;strong&gt;directly to the execution role&lt;/strong&gt;, making it part of the role's configuration and ensuring that only the designated SageMaker domain has permissions to access specific AWS resources like S3 buckets. This method aligns with best practices for security and access management, ensuring permissions are both minimal and appropriate for the task at hand.&lt;/p&gt;

&lt;h3&gt;
  
  
  Security Best Practices
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Principle of Least Privilege&lt;/strong&gt;: Grant only the minimum permissions necessary; scope S3 access to specific buckets and prefixes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use IAM Roles Over Credentials&lt;/strong&gt;: Never embed access keys in code or containers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid Public Access&lt;/strong&gt;: Enable S3 Block Public Access; never allow anonymous write access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource-Specific Permissions&lt;/strong&gt;: Replace wildcard &lt;code&gt;*&lt;/code&gt; resources with specific ARNs wherever possible&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regular Audits&lt;/strong&gt;: Review and update policies regularly using IAM Access Analyzer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encryption Considerations&lt;/strong&gt;: Add KMS permissions when using encrypted S3 buckets or EBS volumes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VPC Security&lt;/strong&gt;: For private subnet jobs, include EC2 network interface permissions&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  How They Work Together
&lt;/h3&gt;

&lt;p&gt;When you create a SageMaker Processing Job:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You specify an &lt;strong&gt;IAM execution role&lt;/strong&gt; that SageMaker assumes&lt;/li&gt;
&lt;li&gt;This role's &lt;strong&gt;IAM policy&lt;/strong&gt; grants SageMaker permissions to access AWS services&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;S3 bucket policy&lt;/strong&gt; validates that the assumed role has permission to access your data&lt;/li&gt;
&lt;li&gt;SageMaker reads input from S3, processes it, and writes output back to S3&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both layers must align—the execution role must have the necessary IAM permissions, and the S3 bucket policy must allow access from that role.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html" rel="noopener noreferrer"&gt;Amazon SageMaker Overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html" rel="noopener noreferrer"&gt;SageMaker IAM Roles&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-policy-language-overview.html" rel="noopener noreferrer"&gt;Amazon S3 Access Policy Language Overview&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  15. Advanced SageMaker Processing: Deep Dive into Jobs and Permissions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Complexity:&lt;/strong&gt; ⭐⭐⭐⭐☆ (Advanced)&lt;br&gt;
&lt;strong&gt;Exam Domain:&lt;/strong&gt; Domain 4 (Monitoring, Maintenance &amp;amp; Security - 24%)&lt;br&gt;
&lt;strong&gt;Exam Weight:&lt;/strong&gt; MEDIUM-HIGH&lt;/p&gt;
&lt;h3&gt;
  
  
  Beyond the Basics: Processing Job Technical Details
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Built-In Processing Frameworks
&lt;/h4&gt;

&lt;p&gt;While the overview covered Processing Jobs generally, SageMaker provides &lt;strong&gt;framework-specific processors&lt;/strong&gt; that optimize common workflows:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. SKLearnProcessor&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker.processing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SKLearnProcessor&lt;/span&gt;

&lt;span class="n"&gt;sklearn_processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SKLearnProcessor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;framework_version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0.20.0&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;SageMakerRole&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instance_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instance_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ml.m5.xlarge&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Pre-configured scikit-learn environment&lt;/li&gt;
&lt;li&gt;Ideal for feature engineering and data transformations&lt;/li&gt;
&lt;li&gt;Supports distributed processing across multiple instances&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Spark Processing with PySparkProcessor&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Native Apache Spark integration for big data processing&lt;/li&gt;
&lt;li&gt;Handles large-scale ETL workloads&lt;/li&gt;
&lt;li&gt;Distributed computing across cluster nodes&lt;/li&gt;
&lt;li&gt;Best for processing terabyte-scale datasets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. ScriptProcessor&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flexibility to use custom containers&lt;/li&gt;
&lt;li&gt;Supports any processing framework (R, Julia, custom Python environments)&lt;/li&gt;
&lt;li&gt;Requires specifying Docker image URI&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Data Source Flexibility
&lt;/h4&gt;

&lt;p&gt;Beyond basic S3 input, Processing Jobs support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Athena&lt;/strong&gt;: Query data directly from data lakes using SQL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Redshift&lt;/strong&gt;: Process data warehouse queries and load results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ProcessingInput configurations&lt;/strong&gt;: Multiple input channels with different S3 paths&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Job Lifecycle and Error Handling
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Job States:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;InProgress&lt;/code&gt;: Job is running&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Completed&lt;/code&gt;: Successful completion&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Failed&lt;/code&gt;: Job encountered errors&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Stopping/Stopped&lt;/code&gt;: Manual or automatic termination&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Automatic Cleanup:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compute resources automatically released after job completion&lt;/li&gt;
&lt;li&gt;Reduces costs—no idle infrastructure charges&lt;/li&gt;
&lt;li&gt;Temporary storage (ephemeral volumes) cleaned up&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limitations to Consider:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cold Start Overhead&lt;/strong&gt;: Time required to provision instances and pull containers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Job Duration Limits&lt;/strong&gt;: Maximum runtime constraints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Transfer Costs&lt;/strong&gt;: Moving data between S3 and processing instances&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Advanced IAM Role Configurations
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Trust Relationship Requirements
&lt;/h4&gt;

&lt;p&gt;Every SageMaker execution role requires a &lt;strong&gt;trust policy&lt;/strong&gt; allowing SageMaker service to assume the role:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Principal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"Service"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sagemaker.amazonaws.com"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sts:AssumeRole"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this trust relationship, SageMaker cannot execute jobs on your behalf, even with correct permissions.&lt;/p&gt;

&lt;h4&gt;
  
  
  VPC-Specific Permissions: The Missing Piece
&lt;/h4&gt;

&lt;p&gt;When running Processing Jobs in &lt;strong&gt;private VPC subnets&lt;/strong&gt; (common for compliance requirements), additional EC2 networking permissions are mandatory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"ec2:CreateNetworkInterface"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"ec2:DescribeNetworkInterfaces"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"ec2:DeleteNetworkInterface"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"ec2:DescribeSubnets"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"ec2:DescribeSecurityGroups"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"ec2:DescribeVpcs"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why These Are Needed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SageMaker creates Elastic Network Interfaces (ENIs) to attach instances to your VPC&lt;/li&gt;
&lt;li&gt;Describes network configuration to ensure proper connectivity&lt;/li&gt;
&lt;li&gt;Deletes ENIs after job completion to avoid orphaned resources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Common Pitfall:&lt;/strong&gt; Forgetting these permissions causes cryptic "insufficient permissions" errors during VPC job launches.&lt;/p&gt;

&lt;h4&gt;
  
  
  KMS Encryption: Granular Control
&lt;/h4&gt;

&lt;p&gt;For encrypted datasets and volumes, three distinct KMS permissions are required:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"kms:Decrypt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"kms:Encrypt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"kms:CreateGrant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"kms:DescribeKey"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:kms:region:account-id:key/key-id"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Permission Breakdown:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;kms:Decrypt&lt;/code&gt;: Read encrypted input data from S3&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;kms:Encrypt&lt;/code&gt;: Write encrypted output data to S3&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;kms:CreateGrant&lt;/code&gt;: Allow SageMaker to use the key for EBS volume encryption&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;kms:DescribeKey&lt;/code&gt;: Verify key policies and status&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  ECR Repository Access: Container-Specific Permissions
&lt;/h4&gt;

&lt;p&gt;When using custom Docker containers stored in &lt;strong&gt;Amazon ECR&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"ecr:GetAuthorizationToken"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"ecr:BatchCheckLayerAvailability"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"ecr:GetDownloadUrlForLayer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"ecr:BatchGetImage"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:ecr:region:account-id:repository/repo-name"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best Practice:&lt;/strong&gt; Scope to specific ECR repositories rather than using wildcards to prevent unauthorized container access.&lt;/p&gt;

&lt;h4&gt;
  
  
  Resource-Scoped Permissions: Eliminating Wildcards
&lt;/h4&gt;

&lt;p&gt;Instead of broad &lt;code&gt;"Resource": "*"&lt;/code&gt; permissions, scope to specific resources:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"s3:GetObject"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:s3:::ml-data-bucket/input/*"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"s3:PutObject"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:s3:::ml-data-bucket/output/*"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents SageMaker from reading/writing to unintended S3 locations.&lt;/p&gt;

&lt;h4&gt;
  
  
  Condition Keys for Enhanced Security
&lt;/h4&gt;

&lt;p&gt;Add conditional access based on tags or IP ranges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"s3:GetObject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:s3:::secure-bucket/*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"Condition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"StringEquals"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"s3:ExistingObjectTag/Project"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"LoanDefault"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Practical Implementation Strategy
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with AWS Managed Policy&lt;/strong&gt;: &lt;code&gt;AmazonSageMakerFullAccess&lt;/code&gt; provides baseline permissions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit CloudTrail Logs&lt;/strong&gt;: Identify which permissions are actually used&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remove Unused Permissions&lt;/strong&gt;: Incrementally reduce to least privilege&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test in Staging&lt;/strong&gt;: Validate role works before production deployment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document Custom Policies&lt;/strong&gt;: Maintain clear comments explaining each permission&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html" rel="noopener noreferrer"&gt;Amazon SageMaker Processing Jobs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html#sagemaker-roles-createprocessingjob-perms" rel="noopener noreferrer"&gt;SageMaker IAM Roles - Processing Job Permissions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>machinelearning</category>
      <category>career</category>
      <category>learning</category>
      <category>aws</category>
    </item>
    <item>
      <title>AWS ML / GenAI Trifecta: Part 1 – AWS Certified AI Practitioner (AIF-C01)</title>
      <dc:creator>Marco Gonzalez</dc:creator>
      <pubDate>Tue, 23 Dec 2025 09:06:35 +0000</pubDate>
      <link>https://dev.to/mgonzalezo/aws-ml-genai-trifecta-part-1-aws-certified-ai-practitioner-aif-c01-463n</link>
      <guid>https://dev.to/mgonzalezo/aws-ml-genai-trifecta-part-1-aws-certified-ai-practitioner-aif-c01-463n</guid>
      <description>&lt;p&gt;This is the first entry in my journey to achieve the &lt;strong&gt;AWS ML / GenAI Trifecta&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
My goal is to master the full stack of AWS intelligence services by completing these three milestones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS Certified AI Practitioner (Foundational)&lt;/strong&gt; — &lt;em&gt;Current focus&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Certified Machine Learning Engineer Associate&lt;/strong&gt;
&lt;em&gt;or&lt;/em&gt; &lt;strong&gt;AWS Certified Data Engineer Associate&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AWS Certified Machine Learning - Specialty (MLS-C01)&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are looking to start with AI on AWS, this guide aggregates essential details from &lt;strong&gt;official documentation&lt;/strong&gt;, &lt;strong&gt;AWS Skill Builder&lt;/strong&gt;, and &lt;strong&gt;community study materials&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Exam Overview AIF-C01&lt;/li&gt;
&lt;li&gt;Exam Domains and Topics&lt;/li&gt;
&lt;li&gt;AWS Skill Builder Official Exam Prep Plan&lt;/li&gt;
&lt;li&gt;Third Party Content and Community Resources&lt;/li&gt;
&lt;li&gt;Hands On Labs Crucial for Retention&lt;/li&gt;
&lt;li&gt;Final Tips&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  1. Exam Overview AIF-C01
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;AWS Certified AI Practitioner&lt;/strong&gt; validates your ability to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Describe AI, ML, and Generative AI concepts
&lt;/li&gt;
&lt;li&gt;Identify the correct AWS services for business problems
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Exam Details&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Duration:&lt;/strong&gt; 90 minutes
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Questions:&lt;/strong&gt; 65
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Question Types:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Multiple choice
&lt;/li&gt;
&lt;li&gt;Multiple response
&lt;/li&gt;
&lt;li&gt;Ordering
&lt;/li&gt;
&lt;li&gt;Matching
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Passing Score:&lt;/strong&gt; 700 / 1000
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Target Profile:&lt;/strong&gt;
Professionals with up to &lt;strong&gt;6 months of exposure&lt;/strong&gt; to AI/ML on AWS
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Coding complex algorithms, hyperparameter tuning, and advanced model training are &lt;strong&gt;out of scope&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  2. Exam Domains and Topics
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Domain 1: Fundamentals of AI and ML (20%)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Concepts&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deep Learning
&lt;/li&gt;
&lt;li&gt;Neural Networks
&lt;/li&gt;
&lt;li&gt;NLP
&lt;/li&gt;
&lt;li&gt;Computer Vision
&lt;/li&gt;
&lt;li&gt;Supervised vs. Unsupervised vs. Reinforcement Learning
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Practical Use&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identify real-world applications (fraud detection, forecasting)
&lt;/li&gt;
&lt;li&gt;Understand when &lt;strong&gt;not&lt;/strong&gt; to use AI (cost vs. benefit)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;ML Lifecycle&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data collection
&lt;/li&gt;
&lt;li&gt;Feature engineering
&lt;/li&gt;
&lt;li&gt;Training
&lt;/li&gt;
&lt;li&gt;Deployment
&lt;/li&gt;
&lt;li&gt;Monitoring
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Familiarity with &lt;strong&gt;Amazon SageMaker&lt;/strong&gt; is crucial.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  Domain 2: Fundamentals of Generative AI (24%)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Core Concepts&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tokens
&lt;/li&gt;
&lt;li&gt;Chunking
&lt;/li&gt;
&lt;li&gt;Embeddings
&lt;/li&gt;
&lt;li&gt;Vectors
&lt;/li&gt;
&lt;li&gt;Transformer-based LLMs
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Capabilities &amp;amp; Limitations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hallucinations
&lt;/li&gt;
&lt;li&gt;Bias
&lt;/li&gt;
&lt;li&gt;Non-determinism
&lt;/li&gt;
&lt;li&gt;Cost and latency tradeoffs
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS Infrastructure&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amazon Bedrock
&lt;/li&gt;
&lt;li&gt;Amazon Q
&lt;/li&gt;
&lt;li&gt;SageMaker JumpStart
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Domain 3: Applications of Foundation Models (28%)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Design Considerations&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cost
&lt;/li&gt;
&lt;li&gt;Latency
&lt;/li&gt;
&lt;li&gt;Modality (text, image, multimodal)
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Architectural Patterns&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retrieval Augmented Generation (RAG)
&lt;/li&gt;
&lt;li&gt;Vector Databases:

&lt;ul&gt;
&lt;li&gt;Amazon OpenSearch
&lt;/li&gt;
&lt;li&gt;Amazon Aurora
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Prompt Engineering&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zero-shot and Few-shot prompting
&lt;/li&gt;
&lt;li&gt;Chain-of-thought
&lt;/li&gt;
&lt;li&gt;Preventing prompt injection
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Agents&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-step task execution
&lt;/li&gt;
&lt;li&gt;Model Context Protocol (MCP)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Domain 4: Guidelines for Responsible AI (14%)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Core Principles&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fairness
&lt;/li&gt;
&lt;li&gt;Inclusivity
&lt;/li&gt;
&lt;li&gt;Robustness
&lt;/li&gt;
&lt;li&gt;Safety
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;AWS Tools&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amazon Bedrock Guardrails
&lt;/li&gt;
&lt;li&gt;Amazon SageMaker Clarify
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Risk Awareness&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hallucinations
&lt;/li&gt;
&lt;li&gt;Intellectual property concerns
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Domain 5: Security, Compliance, and Governance (14%)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Security&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IAM roles
&lt;/li&gt;
&lt;li&gt;Encryption with AWS KMS
&lt;/li&gt;
&lt;li&gt;Amazon Macie for sensitive data detection
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Governance&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS Config
&lt;/li&gt;
&lt;li&gt;AWS Audit Manager
&lt;/li&gt;
&lt;li&gt;AWS CloudTrail
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  3. AWS Skill Builder Official Exam Prep Plan
&lt;/h2&gt;

&lt;p&gt;AWS Skill Builder provides the official and most direct preparation path for the AWS Certified AI Practitioner exam.&lt;/p&gt;

&lt;p&gt;Learning Plan URL (English):&lt;br&gt;&lt;br&gt;
&lt;a href="https://skillbuilder.aws/learning-plan/3NRN71QZR2/exam-prep-plan-aws-certified-ai-practitioner-aifc01--english/FBV4STG94B" rel="noopener noreferrer"&gt;https://skillbuilder.aws/learning-plan/3NRN71QZR2/exam-prep-plan-aws-certified-ai-practitioner-aifc01--english/FBV4STG94B&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Limited-Time Free Access
&lt;/h3&gt;

&lt;p&gt;Free AWS Foundational Certification Prep Resources | Limited Time Offer&lt;/p&gt;

&lt;p&gt;AWS is currently offering free access to subscription-based exam prep materials for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS Certified Cloud Practitioner&lt;/li&gt;
&lt;li&gt;AWS Certified AI Practitioner (AIF-C01)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Included resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Official Practice Exams&lt;/li&gt;
&lt;li&gt;AWS SimuLearn&lt;/li&gt;
&lt;li&gt;AWS Escape Room&lt;/li&gt;
&lt;li&gt;Official Pretests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Availability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Up to 13 languages&lt;/li&gt;
&lt;li&gt;Valid through January 5, 2026&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This promotion normally requires a paid AWS Skill Builder subscription.&lt;/p&gt;
&lt;h3&gt;
  
  
  Structure of the Exam Prep Plan
&lt;/h3&gt;

&lt;p&gt;The plan follows a four-step structure aligned with the AIF-C01 exam guide.&lt;/p&gt;
&lt;h3&gt;
  
  
  Orientation and Exam Overview
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Exam Prep Plan Overview&lt;/li&gt;
&lt;li&gt;Exam scope, intended audience, and domain breakdown&lt;/li&gt;
&lt;li&gt;Time allocation guidance per study phase&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Official Assessments
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Official Practice Question Set (20 questions)&lt;/li&gt;
&lt;li&gt;Official Pretest (65 questions, 90 minutes)&lt;/li&gt;
&lt;li&gt;Official Practice Exam (full-length, scored simulation)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All assessments use AWS exam-style question formats, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple choice&lt;/li&gt;
&lt;li&gt;Multiple response&lt;/li&gt;
&lt;li&gt;Ordering&lt;/li&gt;
&lt;li&gt;Matching&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Domain-by-Domain Coverage
&lt;/h3&gt;

&lt;p&gt;For each exam domain (1 through 5), the plan includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Domain Review

&lt;ul&gt;
&lt;li&gt;Instructor-led video lessons&lt;/li&gt;
&lt;li&gt;Mapping of concepts to AWS services&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Domain Practice

&lt;ul&gt;
&lt;li&gt;Exam-style questions&lt;/li&gt;
&lt;li&gt;Flashcards&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Covered domains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Domain 1: Fundamentals of AI and ML&lt;/li&gt;
&lt;li&gt;Domain 2: Fundamentals of Generative AI&lt;/li&gt;
&lt;li&gt;Domain 3: Applications of Foundation Models&lt;/li&gt;
&lt;li&gt;Domain 4: Guidelines for Responsible AI&lt;/li&gt;
&lt;li&gt;Domain 5: Security, Compliance, and Governance&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  AWS SimuLearn
&lt;/h3&gt;

&lt;p&gt;AWS SimuLearn labs are included for selected domains and provide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scenario-based learning&lt;/li&gt;
&lt;li&gt;Guided solution design&lt;/li&gt;
&lt;li&gt;Hands-on experience in a live AWS Management Console&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These labs reinforce real-world decision making and service selection.&lt;/p&gt;
&lt;h3&gt;
  
  
  AWS Escape Room
&lt;/h3&gt;

&lt;p&gt;AWS Escape Room: Exam Prep for AWS Certified AI Practitioner (AIF-C01)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Approximately 6 hours&lt;/li&gt;
&lt;li&gt;3D virtual environment&lt;/li&gt;
&lt;li&gt;Puzzles, exam-style questions, and hands-on assessments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Available modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single-player practice mode&lt;/li&gt;
&lt;li&gt;Tournament-based event mode&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Escape Room is integrated into the Exam Prep Plan and aligns directly with the AIF-C01 exam objectives.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxpxpc18t11su2djjjujm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxpxpc18t11su2djjjujm.png" alt="Amazon Polly"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcznmtehg71900xhrvhdh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcznmtehg71900xhrvhdh.png" alt="Escape Room"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  4. Third Party Content and Community Resources
&lt;/h2&gt;

&lt;p&gt;To maximize your score, combine official content with these community favorites:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stephane Maarek (Udemy)&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Gold-standard AIF-C01 course with concise explanations of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amazon Bedrock
&lt;/li&gt;
&lt;li&gt;SageMaker
&lt;/li&gt;
&lt;li&gt;Amazon Q
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Community Notion Notes&lt;/strong&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For those following the AWS ML / GenAI Trifecta, this Notion entry is a standout community resource.&lt;/p&gt;

&lt;p&gt;This comprehensive guide was created by Christian Greciano. It is widely recognized in the AWS community for being one of the most well-organized study aids for the AIF-C01 exam. Based on Stéphane Maarek’s popular Udemy course, Christian has distilled complex concepts, from Amazon Bedrock and Prompt Engineering to SageMaker and Responsible AI—into a clean, searchable, and highly visual format.&lt;/p&gt;

&lt;p&gt;Kudos to Christian for his "give back" mentality, providing these high-quality notes and associated Anki flashcards for free to help fellow learners bridge the gap between theory and certification.&lt;/p&gt;

&lt;p&gt;Reference Link: AWS AI Practitioner (AIF-C01) Study Notes by Christian Greciano &lt;a href="https://psychedelic-cuticle-e74.notion.site/AWS-AI-Practitioner-AIF-C01-10386c7395e780e89ea4c70bb061451b" rel="noopener noreferrer"&gt;here&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  5. Hands On Labs Crucial for Retention
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Lab 1: Foundation Models in the Playground
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6qmcygz76tqxlegxbmb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6qmcygz76tqxlegxbmb.png" alt="Topology"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Understand model parameters without writing code.&lt;/p&gt;

&lt;p&gt;Steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open the &lt;strong&gt;Amazon Bedrock Console&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Access the &lt;strong&gt;Text Playground&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Select Amazon - NovaPro1 model &lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add the following prompt, then click "run"&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Now add this second prompt, which generates an AWS SAM template&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Generate an AWS SAM template that deploys a serverless function that meets the following requirements:

- Has a parameter named `LambdaRoleArn` to supply the lambda function's IAM role.
- Has a function named `genai-app` with an `Api` POST event source and uses `/` for the path
- Uses the Python 3.12 runtime
- Has a timeout of two minutes
- The function's handler is `lambda_function.lambda_handler`
- Has an output for the API endpoint named `ApiEndpoint`

Do not escape the dollar sign in output values.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: 'An AWS SAM template for a serverless function meeting specific requirements.'

Parameters:
  LambdaRoleArn:
    Type: String
    Description: 'The ARN of the IAM role that has permissions to execute the Lambda function'

Resources:
  GenaiAppFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: genai-app
      Handler: lambda_function.lambda_handler
      Runtime: python3.12
      Timeout: 120
      Role: !Ref LambdaRoleArn
      Events:
        ApiEvent:
          Type: Api
          Properties:
            Path: /
            Method: POST

Outputs:
  ApiEndpoint:
    Description: 'API endpoint for the genai-app Lambda function'
    Value: !Sub 'https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This output demonstrates how a detailed and constrained prompt produces deterministic, infrastructure-ready results, exactly what is required when using LLMs for cloud automation.&lt;/p&gt;

&lt;p&gt;At this stage, you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Used an Amazon Bedrock Generative AI model to generate an AWS SAM template&lt;/li&gt;
&lt;li&gt;Manually verified the structure and logic of the template&lt;/li&gt;
&lt;li&gt;Prepared it for deployment using AWS tooling&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;In this final step, you will deploy the AWS SAM template generated by Amazon Bedrock using the AWS Serverless Application Model (SAM) CLI. First we validate and Lint the Template:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sam validate --lint
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/home/project/genai-app/template.yaml is a valid SAM Template
SAM CLI update available (1.151.0); (1.131.0 installed)
To download: https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A successful result indicates the SAM template is syntactically and structurally valid.&lt;/p&gt;

&lt;p&gt;Package the Template&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sam package \
  --s3-bucket genai-app-code-tynfcmmtll \
  --output-template-file packaged.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uploads the Lambda function code to Amazon S3&lt;/li&gt;
&lt;li&gt;Generates a packaged.yaml file&lt;/li&gt;
&lt;li&gt;Replaces local CodeUri references with S3 object locations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  --output-template-file packaged.yaml
        Uploading to 418ea04718c9e86f32e7c4516c81efba  808 / 808  (100.00%)

Successfully packaged artifacts and wrote output template to file packaged.yaml.
Execute the following command to deploy the packaged template
sam deploy --template-file /home/project/genai-app/packaged.yaml --stack-name &amp;lt;YOUR STACK NAME&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deploy the Application&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sam deploy --template-file packaged.yaml \
  --stack-name genai-app-stack \
  --capabilities CAPABILITY_IAM \
  --parameter-overrides LambdaRoleArn=
arn:aws:iam::230531499630:role/genai-app-lambda-execution
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AWS SAM CLI will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a new AWS CloudFormation stack&lt;/li&gt;
&lt;li&gt;Deploy the Lambda function&lt;/li&gt;
&lt;li&gt;Deploy Amazon API Gateway resources&lt;/li&gt;
&lt;li&gt;Output the API endpoint URL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;        Deploying with following values
        ===============================
        Stack name                   : genai-app-stack
        Region                       : None
        Confirm changeset            : False
        Disable rollback             : False
        Deployment s3 bucket         : None
        Capabilities                 : ["CAPABILITY_IAM"]
        Parameter overrides          : {"LambdaRoleArn": "arn:aws:iam::844514745668:role/genai-app-lambda-execution"}
        Signing Profiles             : {}

Initiating deployment
=====================



Waiting for changeset to be created..

CloudFormation stack changeset
-----------------------------------------------------------------------------------------------------------------------------
Operation                       LogicalResourceId               ResourceType                    Replacement                   
-----------------------------------------------------------------------------------------------------------------------------
+ Add                           GenaiAppFunctionApiEventPermi   AWS::Lambda::Permission         N/A                           
                                ssionProd                                                                                     
+ Add                           GenaiAppFunction                AWS::Lambda::Function           N/A                           
+ Add                           ServerlessRestApiDeployment7b   AWS::ApiGateway::Deployment     N/A                           
                                3a19f907                                                                                      
+ Add                           ServerlessRestApiProdStage      AWS::ApiGateway::Stage          N/A                           
+ Add                           ServerlessRestApi               AWS::ApiGateway::RestApi        N/A                           
-----------------------------------------------------------------------------------------------------------------------------


Changeset created successfully. arn:aws:cloudformation:us-east-1:844514745668:changeSet/samcli-deploy1766478214/5b0a7e08-b84f-4fa7-9ffb-96babebc0d2a


2025-12-23 08:23:40 - Waiting for stack create/update to complete

CloudFormation events from stack operations (refresh every 5.0 seconds)
-----------------------------------------------------------------------------------------------------------------------------
ResourceStatus                  ResourceType                    LogicalResourceId               ResourceStatusReason          
-----------------------------------------------------------------------------------------------------------------------------
CREATE_IN_PROGRESS              AWS::CloudFormation::Stack      genai-app-stack                 User Initiated                
CREATE_IN_PROGRESS              AWS::Lambda::Function           GenaiAppFunction                -                             
CREATE_IN_PROGRESS              AWS::Lambda::Function           GenaiAppFunction                Resource creation Initiated   
CREATE_COMPLETE                 AWS::Lambda::Function           GenaiAppFunction                -                             
CREATE_IN_PROGRESS              AWS::ApiGateway::RestApi        ServerlessRestApi               -                             
CREATE_IN_PROGRESS              AWS::ApiGateway::RestApi        ServerlessRestApi               Resource creation Initiated   
CREATE_COMPLETE                 AWS::ApiGateway::RestApi        ServerlessRestApi               -                             
CREATE_IN_PROGRESS              AWS::ApiGateway::Deployment     ServerlessRestApiDeployment7b   -                             
                                                                3a19f907                                                      
CREATE_IN_PROGRESS              AWS::Lambda::Permission         GenaiAppFunctionApiEventPermi   -                             
                                                                ssionProd                                                     
CREATE_IN_PROGRESS              AWS::Lambda::Permission         GenaiAppFunctionApiEventPermi   Resource creation Initiated   
                                                                ssionProd                                                     
CREATE_IN_PROGRESS              AWS::ApiGateway::Deployment     ServerlessRestApiDeployment7b   Resource creation Initiated   
                                                                3a19f907                                                      
CREATE_COMPLETE                 AWS::Lambda::Permission         GenaiAppFunctionApiEventPermi   -                             
                                                                ssionProd                                                     
CREATE_COMPLETE                 AWS::ApiGateway::Deployment     ServerlessRestApiDeployment7b   -                             
                                                                3a19f907                                                      
CREATE_IN_PROGRESS              AWS::ApiGateway::Stage          ServerlessRestApiProdStage      -                             
CREATE_IN_PROGRESS              AWS::ApiGateway::Stage          ServerlessRestApiProdStage      Resource creation Initiated   
CREATE_COMPLETE                 AWS::ApiGateway::Stage          ServerlessRestApiProdStage      -                             
CREATE_COMPLETE                 AWS::CloudFormation::Stack      genai-app-stack                 -                             
-----------------------------------------------------------------------------------------------------------------------------

CloudFormation outputs from deployed stack
-------------------------------------------------------------------------------------------------------------------------------
Outputs                                                                                                                       
-------------------------------------------------------------------------------------------------------------------------------
Key                 ApiEndpoint                                                                                               
Description         API endpoint for the genai-app Lambda function                                                            
Value               https://1sjfdlw9me.execute-api.us-east-1.amazonaws.com/Prod/                                              
-------------------------------------------------------------------------------------------------------------------------------


Successfully created/updated stack - genai-app-stack in None
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deployment typically completes in about one minute.&lt;/p&gt;

&lt;p&gt;To test your serverless function and API endpoint, enter the following, replacing API_ENDPOINT with your API endpoint URL from the output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -X POST API_ENDPOINT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Summary&lt;br&gt;
In this final step, you deployed your serverless function template using the AWS SAM CLI tool, and verified that the serverless function and accompanying API Gateway are working.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lab 2: Building a Knowledge Base (RAG)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwslfmc4s4lilt9xii0d3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwslfmc4s4lilt9xii0d3.png" alt="Topology"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Master Retrieval Augmented Generation.&lt;/p&gt;

&lt;p&gt;Steps:&lt;/p&gt;

&lt;p&gt;For this demo, I will only use Jupyter Notebook, but this can also be implemented using Google Colab&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add the AWS credentials to use for the following steps
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ACCESS_KEY_ID = '[ACCESS_KEY_ID]'
SECRET_ACCESS_KEY = '[SECRET_ACCESS_KEY]'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;We’ll be building our solution using the LangChain ecosystem. Specifically, this notebook utilizes a few heavy-hitters:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;FAISS: Our go-to library for efficient similarity searches within our vector store.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Amazon Bedrock: Our centralized hub for foundation models, including the specialized Bedrock Text Embedding Model used to process our data.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import boto3
import json

from langchain_community.vectorstores import FAISS
from langchain.embeddings import BedrockEmbeddings
from jinja2 import Template
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;We will interact with the AWS ecosystem using two primary tools:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;bedrock_runtime_client: This manages our connection to the Amazon Bedrock runtime, ensuring our credentials are authenticated for model access.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;embeddings_client: This is responsible for the crucial step of text vectorization, allowing us to map our data into a searchable vector space.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bedrock_runtime_client = boto3.client(
    'bedrock-runtime',
    region_name='us-west-2',
    aws_access_key_id=ACCESS_KEY_ID,
    aws_secret_access_key=SECRET_ACCESS_KEY
)

embeddings_client = BedrockEmbeddings(
    model_id='amazon.titan-embed-text-v2:0',
    client=bedrock_runtime_client
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;The following &lt;em&gt;facts&lt;/em&gt; array represents our unstructured data source. In a real-world scenario, this could be your company’s HR policies or technical manuals. For this demo, we are using a collection of historical and technical bowling data. Each string in this list will be vectorized to allow for semantic search.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Our Knowledge Base: A collection of domain-specific bowling facts
facts = [
    "The first indoor bowling lane was constructed in New York City in 1840, following earlier outdoor lanes in Europe.",
    "Bowling debuted on American television in 1950, significantly boosting the sport's popularity.",
    "At one point, bowling was banned in America to prevent soldiers from gambling and neglecting their duties.",
    "The sport has ancient roots; British archaeologists found bowling equipment in Egyptian tombs dating to 3,200 BCE.",
    "While bowling balls vary in weight, the maximum regulation weight is 23 pounds.",
    "Inclusive play reached a milestone in 1917 with the founding of the Women’s National Bowling Association.",
    "Ball composition has evolved from wood and heavy rubber to the modern polyester resins introduced in the 1960s.",
    "The world's largest bowling facility is the Inazawa Grand Bowling Centre in Japan, boasting 116 lanes.",
    "While 10-pin is the standard, 9-pin bowling remains illegal in every US state except Texas.",
    "Bowling remains a massive pastime, with over 67 million participants in the US annually."
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;We initialize our vector store, &lt;em&gt;db&lt;/em&gt;, using the &lt;em&gt;from_texts&lt;/em&gt; method from the FAISS library. By providing our array of bowling facts and the Bedrock-powered &lt;em&gt;embeddings_client&lt;/em&gt;, the system automatically handles the vectorization and indexing. The result is a searchable vector store containing all 10 of our embedded facts, ready for real-time querying.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db = FAISS.from_texts(facts, embeddings_client)
print(db.index.ntotal)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;To retrieve data we can use the following command:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;query = "What year was bowling first shown on television?"
docs = db.similarity_search_with_score(query, k=3)
data = []

for doc in docs:
    print(doc[0].page_content)
    print(f'Score: {doc[1]}')
    data.append(doc[0].page_content)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;To ensure our model provides accurate, data-backed answers, we use the Jinja2 templating engine to 'augment' our prompt. Think of this as creating a dynamic blueprint: we use the &lt;em&gt;{{ }}&lt;/em&gt; syntax as placeholders where our retrieved bowling facts and the user’s original question are injected. This creates a final, context-rich instruction set that guides the model to answer using only the provided facts.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;template = """
User: {{query}} Find the answer from the following facts inside &amp;lt;facts&amp;gt;&amp;lt;/facts&amp;gt;:

&amp;lt;facts&amp;gt;
{%- for fact in facts %}
- `{{fact}}`{% endfor %}
&amp;lt;/facts&amp;gt;

Provide an answer including parts of the query. If the facts provided are not relevant, respond with "I do not have access to that information and cannot provide an answer."

Bot:
"""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Finally we generate the response:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kwargs = {
    "modelId": "us.amazon.nova-lite-v1:0",
    "contentType": "application/json",
    "accept": "*/*",
    "body": json.dumps({
      "messages": [
        {
          "role": "user",
          "content": [
            {
              "text": prompt
            }
          ]
        }
      ],
      "inferenceConfig": {
        "max_new_tokens": 512,
        "temperature": 0.7,
        "top_p": 0.9
      }
    })
}

response = bedrock_runtime_client.invoke_model(**kwargs)
body = json.loads(response.get('body').read())
answer = body['output']['message']['content'][0]['text']

print(answer)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Final Tips
&lt;/h2&gt;

&lt;p&gt;Have your core ML&amp;amp;AI concepts crystal clear. What is the difference between accuracy, efficiency, recall, F1 when analyzing the test results of a ML model? when do we need GenAI and when is it not necessary? &lt;/p&gt;

&lt;p&gt;Then it comes to understand all AI/ML related AWS Services, key differences and use cases.&lt;/p&gt;

&lt;h3&gt;
  
  
  Good luck with your preparation!
&lt;/h3&gt;

&lt;p&gt;Following this roadmap and completing the hands-on labs will give you a solid foundation for the ML Engineer and GenAI Professional certifications that come next.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>aws</category>
      <category>ai</category>
      <category>beginners</category>
    </item>
    <item>
      <title>vLLM on x86: Because Not Everyone Can Afford a GPU Cluster</title>
      <dc:creator>Marco Gonzalez</dc:creator>
      <pubDate>Tue, 26 Aug 2025 10:38:53 +0000</pubDate>
      <link>https://dev.to/aws-builders/vllm-on-x86-because-not-everyone-can-afford-a-gpu-cluster-15ep</link>
      <guid>https://dev.to/aws-builders/vllm-on-x86-because-not-everyone-can-afford-a-gpu-cluster-15ep</guid>
      <description>&lt;p&gt;After my recent presentation on our AI inference PoC (details &lt;a href="https://dataglobalhub.org/events/gdai/sessions/Sess-17" rel="noopener noreferrer"&gt;here&lt;/a&gt;), I received a bunch of great follow-up questions and DMs. A lot of you were asking the same thing:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"This is a cool demo, but how do we actually take this to the next level and build a real commercial solution?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It's a fantastic question, and it's the crucial step that turns a promising experiment into a production-ready service. So, in today's blog, I want to dive into the more technical details of how I'd approach this. We'll be focusing on one of the most powerful tools for the job: &lt;strong&gt;vLLM&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  📑 Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt; Why Choose vLLM? The Business Value of Inference
&lt;/li&gt;
&lt;li&gt; Chapter Summary
&lt;/li&gt;
&lt;li&gt; High Performance
&lt;/li&gt;
&lt;li&gt; Cross-Platform Compatibility
&lt;/li&gt;
&lt;li&gt; Ease of Use
&lt;/li&gt;
&lt;li&gt; Environment &amp;amp; Setup
&lt;/li&gt;
&lt;li&gt; Prerequisites
&lt;/li&gt;
&lt;li&gt; Installation &amp;amp; Walkthrough
&lt;/li&gt;
&lt;li&gt; How to Verify KV Cache on CPU
&lt;/li&gt;
&lt;li&gt;Key Environment Variables for CPU Performance&lt;/li&gt;
&lt;li&gt;Conclusion&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  🤔 Why Choose vLLM? The Business Value of Inference
&lt;/h2&gt;

&lt;p&gt;Before we get into the nitty-gritty, it's worth touching on why this matters.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;AI inference market&lt;/strong&gt; is where the real business value happens, and it's projected to grow massively—from &lt;strong&gt;$106 billion in 2025 to over $255 billion by 2030&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Having a de facto, standard inference platform is a huge opportunity. That's where &lt;strong&gt;vLLM&lt;/strong&gt; comes in; it's rapidly emerging as the &lt;em&gt;"Linux of GenAI Inference"&lt;/em&gt; for a few key reasons.&lt;/p&gt;




&lt;h2&gt;
  
  
  📖 Chapter Summary
&lt;/h2&gt;

&lt;p&gt;This chapter outlines the core benefits of vLLM that make it a top choice for production-level AI inference. We'll cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;🚀 High Performance&lt;/strong&gt; – advanced algorithms for high QPS&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;🌐 Cross-Platform&lt;/strong&gt; – support for a wide range of accelerators and OEMs&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;👍 Ease of Use&lt;/strong&gt; – integrations and APIs that developers love&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🚀 High Performance
&lt;/h2&gt;

&lt;p&gt;vLLM is engineered for speed and efficiency. It uses advanced algorithms to deliver &lt;strong&gt;high Queries Per Second (QPS)&lt;/strong&gt; serving, which is critical for commercial applications. Its performance is already comparable to optimized solutions like &lt;strong&gt;Nvidia's TRT-LLM&lt;/strong&gt;, making it a benchmark for other methods.&lt;/p&gt;




&lt;h2&gt;
  
  
  🌐 Cross-Platform Compatibility
&lt;/h2&gt;

&lt;p&gt;One of vLLM's biggest strengths is its ability to run on a wide array of hardware (NVIDIA, AMD, Intel, Google, AWS, etc.) and with major OEMs like &lt;strong&gt;Dell, Lenovo, Cisco, and HPE&lt;/strong&gt;. This lets you build enterprise inference without being tied to a specific hardware stack.&lt;/p&gt;




&lt;h2&gt;
  
  
  👍 Ease of Use
&lt;/h2&gt;

&lt;p&gt;High performance doesn’t mean high complexity. vLLM features native Hugging Face integration, simple APIs, and an &lt;strong&gt;OpenAI-Compatible API&lt;/strong&gt;, which is a huge productivity boost for developers.&lt;/p&gt;




&lt;h2&gt;
  
  
  🌍 Environment &amp;amp; Setup
&lt;/h2&gt;

&lt;p&gt;For this walkthrough and our demo benchmarks, we'll use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Host:&lt;/strong&gt; &lt;code&gt;c7i.4xlarge&lt;/code&gt; (16 vCPU), Amazon Linux&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Local model:&lt;/strong&gt; &lt;code&gt;phi3:mini&lt;/code&gt; (fast micro-prompt baseline)&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Python:&lt;/strong&gt; 3.9+&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ✅ Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before starting, ensure your environment meets vLLM's requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Python:&lt;/strong&gt; 3.9–3.12&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;OS:&lt;/strong&gt; Linux&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;CPU Flags:&lt;/strong&gt; &lt;code&gt;avx512f&lt;/code&gt; is recommended.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;💡 &lt;strong&gt;Pro Tip:&lt;/strong&gt; Check for the required CPU flag with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;lscpu | &lt;span class="nb"&gt;grep &lt;/span&gt;avx512f
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  🛠️ Installation &amp;amp; Walkthrough
&lt;/h2&gt;

&lt;p&gt;Instead of generic instructions, here are the exact steps I followed to get vLLM running from source and then containerized with Docker on Amazon Linux.&lt;/p&gt;

&lt;p&gt;Step 1: Set Up Python Environment&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;uv venv --python 3.12 --seed
source .venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 2: Install System Dependencies&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo dnf update -y
sudo dnf install -y git gcc gcc-c++ gperftools-devel numactl-devel libSM-devel libXext-devel mesa-libGL-devel

# Install EPEL and RPM Fusion for extra packages like ffmpeg
sudo dnf install -y [https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm](https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm)
sudo dnf install -y [https://download1.rpmfusion.org/free/el/rpmfusion-free-release-9.noarch.rpm](https://download1.rpmfusion.org/free/el/rpmfusion-free-release-9.noarch.rpm)
sudo dnf install -y ffmpeg

# Set the default compiler
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc 10 --slave /usr/bin/g++ g++ /usr/bin/g++
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 3: Clone and Build vLLM&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone [https://github.com/vllm-project/vllm.git](https://github.com/vllm-project/vllm.git) vllm_source
cd vllm_source

# Install dependencies
uv pip install -r requirements/cpu-build.txt --torch-backend auto --index-strategy unsafe-best-match
uv pip install -r requirements/cpu.txt --torch-backend auto --index-strategy unsafe-best-match

# Build and install vLLM for CPU
VLLM_TARGET_DEVICE=cpu python setup.py install
# (Optional) For development mode
# VLLM_TARGET_DEVICE=cpu python setup.py develop
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 4: Build the Docker Image&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo docker build -f docker/Dockerfile.cpu \
        --build-arg VLLM_CPU_AVX512BF16=false \
        --build-arg VLLM_CPU_AVX512VNNI=false \
        --build-arg VLLM_CPU_DISABLE_AVX512=false \
        --tag vllm-cpu-env \
        --target vllm-openai .
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Step 5: Run &amp;amp; Test&lt;/p&gt;

&lt;p&gt;Run the container with the Phi-3-mini-4k-instruct LLM&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo docker run --rm \
    --privileged=true \
    --shm-size=4g \
    -p 8000:8000 \
    -e VLLM_CPU_KVCACHE_SPACE=8 \
    vllm-cpu-env \
    --model=microsoft/Phi-3-mini-4k-instruct \
    --dtype=bfloat16 \
    --disable-sliding-window
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
INFO 08-26 10:00:17 [__init__.py:241] Automatically detected platform cpu.
(APIServer pid=1) INFO 08-26 10:00:19 [api_server.py:1873] vLLM API server version 0.10.1rc2.dev204+g2da02dd0d
(APIServer pid=1) INFO 08-26 10:00:19 [utils.py:326] non-default args: {'model': 'microsoft/Phi-3-mini-4k-instruct', 'dtype': 'bfloat16', 'disable_sliding_window': True}
(APIServer pid=1) INFO 08-26 10:00:24 [__init__.py:742] Resolved architecture: Phi3ForCausalLM
(APIServer pid=1) INFO 08-26 10:00:24 [__init__.py:1786] Using max model len 2047
(APIServer pid=1) INFO 08-26 10:00:24 [scheduler.py:222] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 08-26 10:00:28 [__init__.py:241] Automatically detected platform cpu.
(EngineCore_0 pid=94) INFO 08-26 10:00:29 [core.py:644] Waiting for init message from front-end.
(EngineCore_0 pid=94) INFO 08-26 10:00:29 [core.py:74] Initializing a V1 LLM engine (v0.10.1rc2.dev204+g2da02dd0d) with config: model='microsoft/Phi-3-mini-4k-instruct', speculative_config=None, tokenizer='microsoft/Phi-3-mini-4k-instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=2047, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cpu, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=microsoft/Phi-3-mini-4k-instruct, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"level":2,"debug_dump_path":"","cache_dir":"","backend":"inductor","custom_ops":["none"],"splitting_ops":null,"use_inductor":true,"compile_sizes":null,"inductor_compile_config":{"enable_auto_functionalized_v2":false,"dce":true,"size_asserts":false,"nan_asserts":false,"epilogue_fusion":true},"inductor_passes":{},"cudagraph_mode":0,"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"pass_config":{},"max_capture_size":null,"local_cache_dir":null}
(EngineCore_0 pid=94) INFO 08-26 10:00:29 [importing.py:43] Triton is installed but 0 active driver(s) found (expected 1). Disabling Triton to prevent runtime errors.
(EngineCore_0 pid=94) INFO 08-26 10:00:29 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available.
(EngineCore_0 pid=94) WARNING 08-26 10:00:29 [_logger.py:72] Pin memory is not supported on CPU.
(EngineCore_0 pid=94) INFO 08-26 10:00:30 [cpu_worker.py:172] auto thread-binding list (id, physical core): [(8, 0), (9, 1), (10, 2), (11, 3), (12, 4), (13, 5), (14, 6), (15, 7)]
(EngineCore_0 pid=94) INFO 08-26 10:00:30 [cpu_worker.py:63] OMP threads binding of Process 94:
(EngineCore_0 pid=94) INFO 08-26 10:00:30 [cpu_worker.py:63]    OMP tid: 94, core 8
(EngineCore_0 pid=94) INFO 08-26 10:00:30 [cpu_worker.py:63]    OMP tid: 122, core 9
(EngineCore_0 pid=94) INFO 08-26 10:00:30 [cpu_worker.py:63]    OMP tid: 123, core 10
(EngineCore_0 pid=94) INFO 08-26 10:00:30 [cpu_worker.py:63]    OMP tid: 124, core 11
(EngineCore_0 pid=94) INFO 08-26 10:00:30 [cpu_worker.py:63]    OMP tid: 125, core 12
(EngineCore_0 pid=94) INFO 08-26 10:00:30 [cpu_worker.py:63]    OMP tid: 126, core 13
(EngineCore_0 pid=94) INFO 08-26 10:00:30 [cpu_worker.py:63]    OMP tid: 127, core 14
(EngineCore_0 pid=94) INFO 08-26 10:00:30 [cpu_worker.py:63]    OMP tid: 128, core 15
(EngineCore_0 pid=94) INFO 08-26 10:00:30 [cpu_worker.py:63] 
(EngineCore_0 pid=94) INFO 08-26 10:00:30 [parallel_state.py:1134] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_0 pid=94) INFO 08-26 10:00:30 [cpu_model_runner.py:87] Starting to load model microsoft/Phi-3-mini-4k-instruct...
(EngineCore_0 pid=94) INFO 08-26 10:00:30 [cpu.py:100] Using Torch SDPA backend.
(EngineCore_0 pid=94) INFO 08-26 10:00:30 [weight_utils.py:294] Using model weights format ['*.safetensors']
(EngineCore_0 pid=94) INFO 08-26 10:01:59 [weight_utils.py:310] Time spent downloading weights for microsoft/Phi-3-mini-4k-instruct: 88.702533 seconds
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00&amp;lt;?, ?it/s]
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:00&amp;lt;00:00,  4.44it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00&amp;lt;00:00,  2.45it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00&amp;lt;00:00,  2.52it/s]
(EngineCore_0 pid=94) 
(EngineCore_0 pid=94) INFO 08-26 10:02:00 [default_loader.py:267] Loading weights took 0.86 seconds
(EngineCore_0 pid=94) INFO 08-26 10:02:00 [kv_cache_utils.py:849] GPU KV cache size: 21,760 tokens
(EngineCore_0 pid=94) INFO 08-26 10:02:00 [kv_cache_utils.py:853] Maximum concurrency for 2,047 tokens per request: 10.62x
(EngineCore_0 pid=94) INFO 08-26 10:02:01 [cpu_model_runner.py:99] Warming up model for the compilation...
(EngineCore_0 pid=94) INFO 08-26 10:03:01 [cpu_model_runner.py:103] Warming up done.
(EngineCore_0 pid=94) INFO 08-26 10:03:01 [core.py:215] init engine (profile, create kv cache, warmup model) took 61.05 seconds
(APIServer pid=1) INFO 08-26 10:03:01 [loggers.py:142] Engine 000: vllm cache_config_info with initialization after num_gpu_blocks is: 170
(APIServer pid=1) INFO 08-26 10:03:01 [async_llm.py:165] Torch profiler disabled. AsyncLLM CPU traces will not be collected.
(APIServer pid=1) INFO 08-26 10:03:01 [api_server.py:1679] Supported_tasks: ['generate']
(APIServer pid=1) INFO 08-26 10:03:01 [api_server.py:1948] Starting vLLM API server 0 on http://0.0.0.0:8000
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:36] Available routes are:
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /docs, Methods: GET, HEAD
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /redoc, Methods: GET, HEAD
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /health, Methods: GET
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /load, Methods: GET
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /ping, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /ping, Methods: GET
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /tokenize, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /detokenize, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /v1/models, Methods: GET
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /version, Methods: GET
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /v1/responses, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /v1/chat/completions, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /v1/completions, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /v1/embeddings, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /pooling, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /classify, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /score, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /v1/score, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /v1/audio/transcriptions, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /v1/audio/translations, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /rerank, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /v1/rerank, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /v2/rerank, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /invocations, Methods: POST
(APIServer pid=1) INFO 08-26 10:03:01 [launcher.py:44] Route: /metrics, Methods: GET
(APIServer pid=1) INFO:     Started server process [1]
(APIServer pid=1) INFO:     Waiting for application startup.
(APIServer pid=1) INFO:     Application startup complete.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test the endpoint with curl:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "microsoft/Phi-3-mini-4k-instruct",
  "messages": [
    {"role": "user", "content": "Analyze the main changes for dijkstra algorithm"}
  ],
  "temperature": 0.7,
  "max_tokens": 50
}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;1st Result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{"id":"chatcmpl-8b1d987f6979436d90bba661b088f6c7","object":"chat.completion","created":1756202802,"model":"microsoft/Phi-3-mini-4k-instruct","choices":[{"index":0,"message":{"role":"assistant","content":" The Dijkstra algorithm is an algorithm for finding the shortest path between nodes in a graph. It was invented by computer scientist Edsger W. Dijkstra in 1956 and published three years later. Throughout","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":14,"total_tokens":64,"completion_tokens":50,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Logs collected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;APIServer pid=1) INFO 08-26 10:06:42 [chat_utils.py:470] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.
(EngineCore_0 pid=94) WARNING 08-26 10:06:42 [logger.py:71] cudagraph dispatching keys are not initialized. No cudagraph will be used.
(APIServer pid=1) INFO:     172.17.0.1:53458 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=1) INFO 08-26 10:06:52 [loggers.py:123] Engine 000: Avg prompt throughput: 1.4 tokens/s, Avg generation throughput: 5.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's have a look if we run the same query once again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(APIServer pid=1) INFO 08-26 10:07:12 [loggers.py:123] Engine 000: Avg prompt throughput: 2.0 tokens/s, Avg generation throughput: 4.6 tokens/s, Running: 1 reqs, Waiting: 0 reqs, **`GPU KV cache usage: 0.6%`**, Prefix cache hit rate: 0.0%
(APIServer pid=1) INFO:     172.17.0.1:44836 - "POST /v1/chat/completions HTTP/1.1" 200 OK

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  🔍 How to Verify KV Cache on CPU
&lt;/h2&gt;

&lt;p&gt;To enable and allocate space for the CPU KV cache, you must set the VLLM_CPU_KVCACHE_SPACE environment variable. The value is in GB. In our docker run command, we allocated 8 GB:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-e VLLM_CPU_KVCACHE_SPACE=8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When running in CPU-only mode, you might see logs mentioning GPU KV cache usage: 0.6%&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explanation:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;VLLM_CPU_KVCACHE_SPACE: This defines the KV cache's memory allocation in GiB. A larger value allows for more concurrent requests and longer contexts. Start with a conservative value (e.g., 4 or 8) and monitor memory usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  ⚙️Key Environment Variables for CPU Performance
&lt;/h2&gt;

&lt;p&gt;Fine-tuning vLLM on CPUs involves a few key environment variables. Here's a quick guide to the most important ones for performance tuning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;VLLM_CPU_OMP_THREADS_BIND&lt;/code&gt;&lt;/strong&gt;: 📌 This setting &lt;strong&gt;pins processing threads to specific CPU cores&lt;/strong&gt;. You can set it to &lt;code&gt;auto&lt;/code&gt; (the default) for automatic assignment based on your hardware's NUMA architecture, or you can specify core ranges manually (e.g., &lt;code&gt;0-31&lt;/code&gt;). For multi-process tensor parallelism, you can assign different cores to each process using a pipe &lt;code&gt;|&lt;/code&gt; (e.g., &lt;code&gt;0-31|32-63&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;VLLM_CPU_NUM_OF_RESERVED_CPU&lt;/code&gt;&lt;/strong&gt;: 🛡️ &lt;strong&gt;Reserves a number of CPU cores&lt;/strong&gt;, keeping them free from vLLM's main processing threads. This is useful for system overhead or other processes and only works when the thread binding above is set to &lt;code&gt;auto&lt;/code&gt;. By default, one core is reserved per process in multi-process setups.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;VLLM_CPU_MOE_PREPACK&lt;/code&gt;&lt;/strong&gt;: 🚀 (x86 only) A performance optimization for models using &lt;strong&gt;Mixture-of-Experts (MoE) layers&lt;/strong&gt;. It's enabled by default (&lt;code&gt;1&lt;/code&gt;), but you may need to disable it by setting it to &lt;code&gt;0&lt;/code&gt; if you run into issues on unsupported CPUs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;VLLM_CPU_SGL_KERNEL&lt;/code&gt;&lt;/strong&gt;: 🧪 (Experimental, x86 only) Enables &lt;strong&gt;specialized kernels for low-latency tasks&lt;/strong&gt; like real-time serving. This requires a CPU with the AMX instruction set, BFloat16 model weights, and specific weight shapes. It's disabled by default (&lt;code&gt;0&lt;/code&gt;).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  👋 Conclusion
&lt;/h2&gt;

&lt;p&gt;Transitioning an AI PoC to a production-ready service hinges on maximizing performance and reliability. As we've seen, &lt;strong&gt;vLLM's environment variables are the key to unlocking this potential on standard CPU hardware&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;By strategically managing memory with &lt;code&gt;VLLM_CPU_KVCACHE_SPACE&lt;/code&gt; and precisely controlling thread behavior with &lt;code&gt;VLLM_CPU_OMP_THREADS_BIND&lt;/code&gt;, you move beyond default settings to achieve significant gains in throughput and latency. This fine-grained control is what transforms a functional demo into a &lt;strong&gt;scalable, cost-effective, and commercially viable inference solution&lt;/strong&gt; ready for real-world traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  📚 References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;vLLM Docs&lt;/strong&gt;: &lt;a href="https://docs.vllm.ai/en/stable/getting_started/installation/cpu.html#build-image-from-source" rel="noopener noreferrer"&gt;Build a Docker Image from Source for CPU&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hugging Face&lt;/strong&gt;: &lt;a href="https://huggingface.co/microsoft/Phi-3-mini-4k-instruct" rel="noopener noreferrer"&gt;Microsoft Phi-3-mini-4k-instruct Model Card&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Console&lt;/strong&gt;: &lt;a href="https://aws.amazon.com/console/" rel="noopener noreferrer"&gt;AWS Management Console Login&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>vllm</category>
      <category>production</category>
    </item>
    <item>
      <title>AWS vs. Azure: A Decision Matrix for Building Enterprise RAG Platforms</title>
      <dc:creator>Marco Gonzalez</dc:creator>
      <pubDate>Thu, 17 Jul 2025 00:11:39 +0000</pubDate>
      <link>https://dev.to/mgonzalezo/my-ai-pair-programmer-is-better-than-yours-a-cursor-kiro-granite-showdown-2kdj</link>
      <guid>https://dev.to/mgonzalezo/my-ai-pair-programmer-is-better-than-yours-a-cursor-kiro-granite-showdown-2kdj</guid>
      <description>&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Introduction&lt;/li&gt;
&lt;li&gt;Platform Overview&lt;/li&gt;
&lt;li&gt;Cloud Platform Decision Matrix&lt;/li&gt;
&lt;li&gt;Prerequisites&lt;/li&gt;
&lt;li&gt;
Project 1: Enterprise-Grade RAG Platform

&lt;ul&gt;
&lt;li&gt;AWS Implementation&lt;/li&gt;
&lt;li&gt;Azure Implementation&lt;/li&gt;
&lt;li&gt;Cost Comparison&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Project 2: Hybrid MLOps Pipeline

&lt;ul&gt;
&lt;li&gt;AWS Implementation&lt;/li&gt;
&lt;li&gt;Azure Implementation&lt;/li&gt;
&lt;li&gt;Cost Comparison&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Project 3: Unified Data Fabric (Data Lakehouse)

&lt;ul&gt;
&lt;li&gt;AWS Implementation&lt;/li&gt;
&lt;li&gt;Azure Implementation&lt;/li&gt;
&lt;li&gt;Cost Comparison&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Multi-Cloud Integration Patterns&lt;/li&gt;
&lt;li&gt;Total Cost of Ownership Analysis&lt;/li&gt;
&lt;li&gt;Migration Strategies&lt;/li&gt;
&lt;li&gt;Resource Cleanup&lt;/li&gt;
&lt;li&gt;Troubleshooting&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Modern enterprises face a critical decision when building cloud-native AI and data platforms: &lt;strong&gt;AWS or Azure?&lt;/strong&gt; This comprehensive guide demonstrates how to build three production-grade platforms on &lt;strong&gt;both&lt;/strong&gt; cloud providers, providing side-by-side comparisons to help you make informed decisions.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You'll Learn
&lt;/h3&gt;

&lt;p&gt;This guide shows you how to implement identical architectures on both AWS and Azure:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project 1: Enterprise RAG Platform&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS&lt;/strong&gt;: Amazon Bedrock + AWS Glue + Milvus on ROSA&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure&lt;/strong&gt;: Azure OpenAI + Azure Data Factory + Milvus on ARO&lt;/li&gt;
&lt;li&gt;Privacy-first Retrieval-Augmented Generation&lt;/li&gt;
&lt;li&gt;Vector database integration&lt;/li&gt;
&lt;li&gt;Secure private connectivity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Project 2: Hybrid MLOps Pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS&lt;/strong&gt;: SageMaker + OpenShift Pipelines + KServe on ROSA&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure&lt;/strong&gt;: Azure ML + Azure DevOps + KServe on ARO&lt;/li&gt;
&lt;li&gt;Cost-optimized GPU training&lt;/li&gt;
&lt;li&gt;Kubernetes-native serving&lt;/li&gt;
&lt;li&gt;End-to-end automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Project 3: Unified Data Fabric&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS&lt;/strong&gt;: Apache Spark + AWS Glue Catalog + S3 + Iceberg&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure&lt;/strong&gt;: Apache Spark + Azure Purview + ADLS Gen2 + Delta Lake&lt;/li&gt;
&lt;li&gt;Stateless compute architecture&lt;/li&gt;
&lt;li&gt;Medallion data organization&lt;/li&gt;
&lt;li&gt;ACID transactions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why This Comparison Matters
&lt;/h3&gt;

&lt;p&gt;Choosing the right cloud platform impacts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Total Cost&lt;/strong&gt;: 20-40% difference in monthly spending&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer Productivity&lt;/strong&gt;: Ecosystem integration and tooling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vendor Lock-in&lt;/strong&gt;: Portability and migration flexibility&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise Integration&lt;/strong&gt;: Existing infrastructure and contracts&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Platform Overview
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Unified Multi-Cloud Architecture
&lt;/h3&gt;

&lt;p&gt;Both implementations follow the same architectural patterns while leveraging platform-specific managed services:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F35xlx2grsfnk1vbq7rfg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F35xlx2grsfnk1vbq7rfg.png" alt="Architecture" width="789" height="584"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Technology Stack: AWS vs Azure
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;AWS Solution&lt;/th&gt;
&lt;th&gt;Azure Solution&lt;/th&gt;
&lt;th&gt;OpenShift Platform&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kubernetes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ROSA (Red Hat OpenShift on AWS)&lt;/td&gt;
&lt;td&gt;ARO (Azure Red Hat OpenShift)&lt;/td&gt;
&lt;td&gt;Both use Red Hat OpenShift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon Bedrock (Claude 3.5)&lt;/td&gt;
&lt;td&gt;Azure OpenAI Service (GPT-4)&lt;/td&gt;
&lt;td&gt;Same API patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ML Training&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon SageMaker&lt;/td&gt;
&lt;td&gt;Azure Machine Learning&lt;/td&gt;
&lt;td&gt;Both burst from OpenShift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Catalog&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Glue Data Catalog&lt;/td&gt;
&lt;td&gt;Azure Purview / Unity Catalog&lt;/td&gt;
&lt;td&gt;Unified metadata layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Object Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon S3&lt;/td&gt;
&lt;td&gt;Azure Data Lake Storage Gen2&lt;/td&gt;
&lt;td&gt;S3-compatible APIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Table Format&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apache Iceberg&lt;/td&gt;
&lt;td&gt;Delta Lake&lt;/td&gt;
&lt;td&gt;Open source options&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector DB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Milvus (self-hosted)&lt;/td&gt;
&lt;td&gt;Milvus / Cosmos DB&lt;/td&gt;
&lt;td&gt;Same deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ETL Service&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Glue (serverless)&lt;/td&gt;
&lt;td&gt;Azure Data Factory (serverless)&lt;/td&gt;
&lt;td&gt;Similar orchestration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CI/CD&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenShift Pipelines (Tekton)&lt;/td&gt;
&lt;td&gt;Azure DevOps / Tekton&lt;/td&gt;
&lt;td&gt;Kubernetes-native&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;K8s Integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Controllers (ACK)&lt;/td&gt;
&lt;td&gt;Azure Service Operator (ASO)&lt;/td&gt;
&lt;td&gt;Custom resources&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Private Network&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS PrivateLink&lt;/td&gt;
&lt;td&gt;Azure Private Link&lt;/td&gt;
&lt;td&gt;VPC/VNet integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Authentication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;IRSA (IAM for Service Accounts)&lt;/td&gt;
&lt;td&gt;Workload Identity&lt;/td&gt;
&lt;td&gt;Pod-level identity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Cloud Platform Decision Matrix
&lt;/h2&gt;

&lt;h3&gt;
  
  
  When to Choose AWS
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best For&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;AI/ML Innovation&lt;/strong&gt;: Amazon Bedrock offers broader model selection (Claude, Llama 2, Stable Diffusion)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serverless-First&lt;/strong&gt;: AWS Glue, Lambda, and Bedrock have no minimum fees&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Startup/Scale-up&lt;/strong&gt;: Pay-as-you-go pricing favors variable workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Engineering&lt;/strong&gt;: S3 + Glue + Athena is industry standard&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Region&lt;/strong&gt;: Better global infrastructure coverage&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;AWS Advantages&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Superior AI model marketplace (Anthropic, Cohere, AI21, Meta)&lt;/li&gt;
&lt;li&gt;True serverless data catalog (Glue) with no base costs&lt;/li&gt;
&lt;li&gt;More mature spot instance ecosystem for cost savings&lt;/li&gt;
&lt;li&gt;Better S3 ecosystem and tooling integration&lt;/li&gt;
&lt;li&gt;Stronger open-source community adoption&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to Choose Azure
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best For&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft Ecosystem&lt;/strong&gt;: Tight integration with Office 365, Teams, Power Platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise Windows&lt;/strong&gt;: Native Windows container support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid Cloud&lt;/strong&gt;: Azure Arc and on-premises integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise Agreements&lt;/strong&gt;: Existing Microsoft licensing discounts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regulated Industries&lt;/strong&gt;: Better compliance certifications in some regions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Azure Advantages&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Seamless Microsoft 365 and Active Directory integration&lt;/li&gt;
&lt;li&gt;Superior Windows and .NET container support&lt;/li&gt;
&lt;li&gt;Better hybrid cloud story with Azure Arc&lt;/li&gt;
&lt;li&gt;Integrated Azure Synapse for unified analytics&lt;/li&gt;
&lt;li&gt;Potentially lower costs with existing EA agreements&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Decision Criteria Scorecard
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criteria&lt;/th&gt;
&lt;th&gt;AWS Score&lt;/th&gt;
&lt;th&gt;Azure Score&lt;/th&gt;
&lt;th&gt;Weight&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AI Model Selection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;7/10&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;AWS Bedrock has more models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ML Training Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Equivalent spot pricing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Lake Maturity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;S3 is industry standard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Serverless Pricing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;7/10&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;AWS Glue has no minimums&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Enterprise Integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;7/10&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Azure wins for Microsoft shops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hybrid Cloud&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;7/10&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Azure Arc is superior&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Developer Ecosystem&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;7/10&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Larger open-source community&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compliance Certifications&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;9/10&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Equivalent for most use cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Global Infrastructure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10/10&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;AWS has more regions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pricing Transparency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8/10&lt;/td&gt;
&lt;td&gt;7/10&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;AWS pricing is clearer&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Total Weighted Score&lt;/strong&gt;: AWS: 8.5/10 | Azure: 8.1/10&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict&lt;/strong&gt;: Choose based on your organization's existing ecosystem. Both platforms are capable; the difference is in integration, not capability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Common Prerequisites (Both Platforms)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Required Accounts&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cloud platform account with administrative access&lt;/li&gt;
&lt;li&gt;Red Hat Account with OpenShift subscription&lt;/li&gt;
&lt;li&gt;Credit card for cloud charges&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Required Tools&lt;/strong&gt; (install on your workstation):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Common tools for both platforms&lt;/span&gt;
&lt;span class="c"&gt;# OpenShift CLI (oc)&lt;/span&gt;
wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz
&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xvf&lt;/span&gt; openshift-client-linux.tar.gz
&lt;span class="nb"&gt;sudo mv &lt;/span&gt;oc kubectl /usr/local/bin/
oc version

&lt;span class="c"&gt;# Helm (v3)&lt;/span&gt;
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version

&lt;span class="c"&gt;# Tekton CLI&lt;/span&gt;
curl &lt;span class="nt"&gt;-LO&lt;/span&gt; https://github.com/tektoncd/cli/releases/download/v0.33.0/tkn_0.33.0_Linux_x86_64.tar.gz
&lt;span class="nb"&gt;tar &lt;/span&gt;xvzf tkn_0.33.0_Linux_x86_64.tar.gz
&lt;span class="nb"&gt;sudo mv &lt;/span&gt;tkn /usr/local/bin/
tkn version

&lt;span class="c"&gt;# Python 3.11+&lt;/span&gt;
python3 &lt;span class="nt"&gt;--version&lt;/span&gt;

&lt;span class="c"&gt;# Container tools (Docker or Podman)&lt;/span&gt;
podman &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS-Specific Prerequisites
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# AWS CLI (v2)&lt;/span&gt;
curl &lt;span class="s2"&gt;"https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="s2"&gt;"awscliv2.zip"&lt;/span&gt;
unzip awscliv2.zip
&lt;span class="nb"&gt;sudo&lt;/span&gt; ./aws/install
aws &lt;span class="nt"&gt;--version&lt;/span&gt;

&lt;span class="c"&gt;# ROSA CLI&lt;/span&gt;
wget https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz
&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xvf&lt;/span&gt; rosa-linux.tar.gz
&lt;span class="nb"&gt;sudo mv &lt;/span&gt;rosa /usr/local/bin/rosa
rosa version

&lt;span class="c"&gt;# Configure AWS&lt;/span&gt;
aws configure
aws sts get-caller-identity

&lt;span class="c"&gt;# Initialize ROSA&lt;/span&gt;
rosa login
rosa verify quota
rosa verify permissions
rosa init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure-Specific Prerequisites
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Azure CLI&lt;/span&gt;
curl &lt;span class="nt"&gt;-sL&lt;/span&gt; https://aka.ms/InstallAzureCLIDeb | &lt;span class="nb"&gt;sudo &lt;/span&gt;bash
az &lt;span class="nt"&gt;--version&lt;/span&gt;

&lt;span class="c"&gt;# ARO extension&lt;/span&gt;
az extension add &lt;span class="nt"&gt;--name&lt;/span&gt; aro &lt;span class="nt"&gt;--index&lt;/span&gt; https://az.aroapp.io/stable

&lt;span class="c"&gt;# Azure CLI login&lt;/span&gt;
az login
az account show

&lt;span class="c"&gt;# Register required providers&lt;/span&gt;
az provider register &lt;span class="nt"&gt;--namespace&lt;/span&gt; Microsoft.RedHatOpenShift &lt;span class="nt"&gt;--wait&lt;/span&gt;
az provider register &lt;span class="nt"&gt;--namespace&lt;/span&gt; Microsoft.Compute &lt;span class="nt"&gt;--wait&lt;/span&gt;
az provider register &lt;span class="nt"&gt;--namespace&lt;/span&gt; Microsoft.Storage &lt;span class="nt"&gt;--wait&lt;/span&gt;
az provider register &lt;span class="nt"&gt;--namespace&lt;/span&gt; Microsoft.Network &lt;span class="nt"&gt;--wait&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Service Quotas Verification
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AWS&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# EC2 vCPU quota&lt;/span&gt;
aws service-quotas get-service-quota &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-code&lt;/span&gt; ec2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quota-code&lt;/span&gt; L-1216C47A &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1

&lt;span class="c"&gt;# SageMaker training instances&lt;/span&gt;
aws service-quotas get-service-quota &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-code&lt;/span&gt; sagemaker &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quota-code&lt;/span&gt; L-2E8D9C5E &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-east-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Azure&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check compute quota&lt;/span&gt;
az vm list-usage &lt;span class="nt"&gt;--location&lt;/span&gt; eastus &lt;span class="nt"&gt;--output&lt;/span&gt; table

&lt;span class="c"&gt;# Check ML compute quota&lt;/span&gt;
az ml compute list-usage &lt;span class="nt"&gt;--location&lt;/span&gt; eastus
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Project 1: Enterprise-Grade RAG Platform
&lt;/h2&gt;

&lt;h3&gt;
  
  
  RAG Platform Overview
&lt;/h3&gt;

&lt;p&gt;This project implements a privacy-first Retrieval-Augmented Generation (RAG) system. Both AWS and Azure implementations achieve the same functionality but use platform-specific managed services.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture Comparison
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AWS Architecture&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ROSA → AWS PrivateLink → Amazon Bedrock (Claude 3.5)
  ↓
Milvus Vector DB (on ROSA)
  ↓
AWS Glue ETL → S3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Azure Architecture&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ARO → Azure Private Link → Azure OpenAI (GPT-4)
  ↓
Milvus Vector DB (on ARO)
  ↓
Azure Data Factory → Blob Storage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Side-by-Side Service Mapping
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;AWS Service&lt;/th&gt;
&lt;th&gt;Azure Service&lt;/th&gt;
&lt;th&gt;Implementation Difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon Bedrock&lt;/td&gt;
&lt;td&gt;Azure OpenAI Service&lt;/td&gt;
&lt;td&gt;Different model families&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Private Network&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS PrivateLink&lt;/td&gt;
&lt;td&gt;Azure Private Link&lt;/td&gt;
&lt;td&gt;Similar configuration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ETL Pipeline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Glue (Serverless)&lt;/td&gt;
&lt;td&gt;Azure Data Factory&lt;/td&gt;
&lt;td&gt;Different pricing models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Metadata&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Glue Data Catalog&lt;/td&gt;
&lt;td&gt;Azure Purview&lt;/td&gt;
&lt;td&gt;Different scopes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon S3&lt;/td&gt;
&lt;td&gt;Azure Blob Storage / ADLS Gen2&lt;/td&gt;
&lt;td&gt;S3 API vs Blob API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector DB&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Milvus on ROSA&lt;/td&gt;
&lt;td&gt;Milvus on ARO / Cosmos DB&lt;/td&gt;
&lt;td&gt;Self-hosted vs managed option&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auth&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;IRSA (IAM Roles)&lt;/td&gt;
&lt;td&gt;Workload Identity&lt;/td&gt;
&lt;td&gt;Similar pod-level identity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Embedding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Titan Embeddings&lt;/td&gt;
&lt;td&gt;OpenAI Embeddings&lt;/td&gt;
&lt;td&gt;Different dimensions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  AWS Implementation (RAG)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWS Phase 1: ROSA Cluster Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set environment variables&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLUSTER_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"rag-platform-aws"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MACHINE_TYPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"m5.2xlarge"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;COMPUTE_NODES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3

&lt;span class="c"&gt;# Create ROSA cluster (takes ~40 minutes)&lt;/span&gt;
rosa create cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cluster-name&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--multi-az&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compute-machine-type&lt;/span&gt; &lt;span class="nv"&gt;$MACHINE_TYPE&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--compute-nodes&lt;/span&gt; &lt;span class="nv"&gt;$COMPUTE_NODES&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--machine-cidr&lt;/span&gt; 10.0.0.0/16 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-cidr&lt;/span&gt; 172.30.0.0/16 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--pod-cidr&lt;/span&gt; 10.128.0.0/14 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--host-prefix&lt;/span&gt; 23 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--yes&lt;/span&gt;

&lt;span class="c"&gt;# Monitor installation&lt;/span&gt;
rosa logs &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;--watch&lt;/span&gt;

&lt;span class="c"&gt;# Create admin and connect&lt;/span&gt;
rosa create admin &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt;
oc login &amp;lt;api-url&amp;gt; &lt;span class="nt"&gt;--username&lt;/span&gt; cluster-admin &lt;span class="nt"&gt;--password&lt;/span&gt; &amp;lt;password&amp;gt;

&lt;span class="c"&gt;# Create namespaces&lt;/span&gt;
oc new-project redhat-ods-applications
oc new-project rag-application
oc new-project milvus
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS Phase 2: Amazon Bedrock via PrivateLink
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Get ROSA VPC details&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ROSA_VPC_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 describe-vpcs &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--filters&lt;/span&gt; &lt;span class="s2"&gt;"Name=tag:Name,Values=*&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;CLUSTER_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;*"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Vpcs[0].VpcId'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PRIVATE_SUBNET_IDS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 describe-subnets &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--filters&lt;/span&gt; &lt;span class="s2"&gt;"Name=vpc-id,Values=&lt;/span&gt;&lt;span class="nv"&gt;$ROSA_VPC_ID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"Name=tag:Name,Values=*private*"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Subnets[*].SubnetId'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Create VPC Endpoint Security Group&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;VPC_ENDPOINT_SG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 create-security-group &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--group-name&lt;/span&gt; bedrock-vpc-endpoint-sg &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--description&lt;/span&gt; &lt;span class="s2"&gt;"Security group for Bedrock VPC endpoint"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vpc-id&lt;/span&gt; &lt;span class="nv"&gt;$ROSA_VPC_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'GroupId'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Allow HTTPS from ROSA nodes&lt;/span&gt;
aws ec2 authorize-security-group-ingress &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--group-id&lt;/span&gt; &lt;span class="nv"&gt;$VPC_ENDPOINT_SG&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--protocol&lt;/span&gt; tcp &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 443 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cidr&lt;/span&gt; 10.0.0.0/16 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Create Bedrock VPC Endpoint&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BEDROCK_VPC_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws ec2 create-vpc-endpoint &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vpc-id&lt;/span&gt; &lt;span class="nv"&gt;$ROSA_VPC_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vpc-endpoint-type&lt;/span&gt; Interface &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-name&lt;/span&gt; com.amazonaws.&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;.bedrock-runtime &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--subnet-ids&lt;/span&gt; &lt;span class="nv"&gt;$PRIVATE_SUBNET_IDS&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--security-group-ids&lt;/span&gt; &lt;span class="nv"&gt;$VPC_ENDPOINT_SG&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--private-dns-enabled&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'VpcEndpoint.VpcEndpointId'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Wait for availability&lt;/span&gt;
aws ec2 &lt;span class="nb"&gt;wait &lt;/span&gt;vpc-endpoint-available &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vpc-endpoint-ids&lt;/span&gt; &lt;span class="nv"&gt;$BEDROCK_VPC_ENDPOINT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Create IAM role for Bedrock access (IRSA pattern)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;rosa describe cluster &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; json | jq &lt;span class="nt"&gt;-r&lt;/span&gt; .aws.sts.oidc_endpoint_url | &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s|https://||'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws sts get-caller-identity &lt;span class="nt"&gt;--query&lt;/span&gt; Account &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; bedrock-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": "arn:aws:bedrock:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0"
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;aws iam create-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-name&lt;/span&gt; BedrockInvokePolicy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-document&lt;/span&gt; file://bedrock-policy.json

&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; trust-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:oidc-provider/&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OIDC_PROVIDER&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;:sub": "system:serviceaccount:rag-application:bedrock-sa"
        }
      }
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BEDROCK_ROLE_ARN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws iam create-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; rosa-bedrock-access &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assume-role-policy-document&lt;/span&gt; file://trust-policy.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'Role.Arn'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

aws iam attach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; rosa-bedrock-access &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:policy/BedrockInvokePolicy

&lt;span class="c"&gt;# Create Kubernetes service account&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: bedrock-sa
  namespace: rag-application
  annotations:
    eks.amazonaws.com/role-arn: &lt;/span&gt;&lt;span class="nv"&gt;$BEDROCK_ROLE_ARN&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS Phase 3: AWS Glue Data Pipeline
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create S3 bucket&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"rag-documents-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
aws s3 mb s3://&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Enable versioning&lt;/span&gt;
aws s3api put-bucket-versioning &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--versioning-configuration&lt;/span&gt; &lt;span class="nv"&gt;Status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Enabled &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Create folder structure&lt;/span&gt;
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; raw-documents/
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; processed-documents/
aws s3api put-object &lt;span class="nt"&gt;--bucket&lt;/span&gt; &lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt; &lt;span class="nt"&gt;--key&lt;/span&gt; embeddings/

&lt;span class="c"&gt;# Create Glue IAM role&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; glue-trust-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {"Service": "glue.amazonaws.com"},
      "Action": "sts:AssumeRole"
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;aws iam create-role &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; AWSGlueServiceRole-RAG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assume-role-policy-document&lt;/span&gt; file://glue-trust-policy.json

aws iam attach-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; AWSGlueServiceRole-RAG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole

&lt;span class="c"&gt;# Create S3 access policy&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; glue-s3-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
      "Resource": "arn:aws:s3:::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;/*"
    },
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": "arn:aws:s3:::&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BUCKET_NAME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;aws iam put-role-policy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role-name&lt;/span&gt; AWSGlueServiceRole-RAG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-name&lt;/span&gt; S3Access &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--policy-document&lt;/span&gt; file://glue-s3-policy.json

&lt;span class="c"&gt;# Create Glue database&lt;/span&gt;
aws glue create-database &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{
    "Name": "rag_documents_db",
    "Description": "RAG document metadata"
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;

&lt;span class="c"&gt;# Create Glue crawler&lt;/span&gt;
aws glue create-crawler &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; rag-document-crawler &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt; arn:aws:iam::&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:role/AWSGlueServiceRole-RAG &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--database-name&lt;/span&gt; rag_documents_db &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--targets&lt;/span&gt; &lt;span class="s1"&gt;'{
    "S3Targets": [{"Path": "s3://'&lt;/span&gt;&lt;span class="nv"&gt;$BUCKET_NAME&lt;/span&gt;&lt;span class="s1"&gt;'/raw-documents/"}]
  }'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS Phase 4: Milvus Vector Database
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Milvus using Helm&lt;/span&gt;
helm repo add milvus https://milvus-io.github.io/milvus-helm/
helm repo update

helm &lt;span class="nb"&gt;install &lt;/span&gt;milvus-operator milvus/milvus-operator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; milvus &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--create-namespace&lt;/span&gt;

&lt;span class="c"&gt;# Create PVCs&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: milvus-etcd-pvc
  namespace: milvus
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 10Gi
  storageClassName: gp3-csi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: milvus-minio-pvc
  namespace: milvus
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 50Gi
  storageClassName: gp3-csi
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Deploy Milvus&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; milvus-values.yaml &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
cluster:
  enabled: true
service:
  type: ClusterIP
  port: 19530
standalone:
  replicas: 1
  resources:
    limits:
      cpu: "4"
      memory: 8Gi
    requests:
      cpu: "2"
      memory: 4Gi
etcd:
  persistence:
    enabled: true
    existingClaim: milvus-etcd-pvc
minio:
  persistence:
    enabled: true
    existingClaim: milvus-minio-pvc
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;helm &lt;span class="nb"&gt;install &lt;/span&gt;milvus milvus/milvus &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; milvus &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--values&lt;/span&gt; milvus-values.yaml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--wait&lt;/span&gt;

&lt;span class="c"&gt;# Get Milvus endpoint&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MILVUS_HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;oc get svc milvus &lt;span class="nt"&gt;-n&lt;/span&gt; milvus &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.spec.clusterIP}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MILVUS_PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;19530
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS Phase 5: RAG Application Deployment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create application code&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; rag-app-aws/src

&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; rag-app-aws/requirements.txt &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
pymilvus==2.3.3
boto3==1.29.7
python-dotenv==1.0.0
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create FastAPI application (abbreviated for space)&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; rag-app-aws/src/main.py &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;'
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import os, json, boto3
from pymilvus import connections, Collection

app = FastAPI(title="Enterprise RAG API - AWS")

MILVUS_HOST = os.getenv("MILVUS_HOST")
AWS_REGION = os.getenv("AWS_REGION", "us-east-1")
BEDROCK_MODEL = "anthropic.claude-3-5-sonnet-20241022-v2:0"

bedrock = boto3.client('bedrock-runtime', region_name=AWS_REGION)

@app.on_event("startup")
async def startup():
    connections.connect(host=MILVUS_HOST, port=19530)

class QueryRequest(BaseModel):
    query: str
    top_k: int = 5
    max_tokens: int = 1000

@app.post("/query")
async def query_rag(req: QueryRequest):
    # Generate embedding with Bedrock Titan
    embed_resp = bedrock.invoke_model(
        modelId="amazon.titan-embed-text-v2:0",
        body=json.dumps({"inputText": req.query, "dimensions": 1024})
    )
    embedding = json.loads(embed_resp['body'].read())['embedding']

    # Search Milvus
    coll = Collection("rag_documents")
    results = coll.search([embedding], "embedding", {"metric_type": "L2"}, limit=req.top_k)

    # Build context
    context = "&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;".join([hit.entity.get("text") for hit in results[0]])

    # Call Bedrock Claude
    prompt = f"Context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;{context}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;Question: {req.query}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;Answer:"
    response = bedrock.invoke_model(
        modelId=BEDROCK_MODEL,
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": req.max_tokens,
            "messages": [{"role": "user", "content": prompt}]
        })
    )

    answer = json.loads(response['body'].read())['content'][0]['text']
    return {"answer": answer, "sources": [{"chunk": hit.entity.get("text")} for hit in results[0]]}

@app.get("/health")
async def health():
    return {"status": "healthy", "platform": "AWS", "model": "Claude 3.5 Sonnet"}
&lt;/span&gt;&lt;span class="no"&gt;PYTHON

&lt;/span&gt;&lt;span class="c"&gt;# Create Dockerfile&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; rag-app-aws/Dockerfile &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY src/ ./src/
EXPOSE 8000
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Build and deploy&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;rag-app-aws
podman build &lt;span class="nt"&gt;-t&lt;/span&gt; rag-app-aws:v1.0 &lt;span class="nb"&gt;.&lt;/span&gt;
oc create imagestream rag-app-aws &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application
podman tag rag-app-aws:v1.0 image-registry.openshift-image-registry.svc:5000/rag-application/rag-app-aws:v1.0
podman push image-registry.openshift-image-registry.svc:5000/rag-application/rag-app-aws:v1.0 &lt;span class="nt"&gt;--tls-verify&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false
cd&lt;/span&gt; ..

&lt;span class="c"&gt;# Deploy to OpenShift&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rag-app-aws
  namespace: rag-application
spec:
  replicas: 2
  selector:
    matchLabels:
      app: rag-app-aws
  template:
    metadata:
      labels:
        app: rag-app-aws
    spec:
      serviceAccountName: bedrock-sa
      containers:
      - name: app
        image: image-registry.openshift-image-registry.svc:5000/rag-application/rag-app-aws:v1.0
        ports:
        - containerPort: 8000
        env:
        - name: MILVUS_HOST
          value: "&lt;/span&gt;&lt;span class="nv"&gt;$MILVUS_HOST&lt;/span&gt;&lt;span class="sh"&gt;"
        - name: AWS_REGION
          value: "&lt;/span&gt;&lt;span class="nv"&gt;$AWS_REGION&lt;/span&gt;&lt;span class="sh"&gt;"
---
apiVersion: v1
kind: Service
metadata:
  name: rag-app-aws
  namespace: rag-application
spec:
  selector:
    app: rag-app-aws
  ports:
  - port: 80
    targetPort: 8000
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: rag-app-aws
  namespace: rag-application
spec:
  to:
    kind: Service
    name: rag-app-aws
  tls:
    termination: edge
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Get URL and test&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;RAG_URL_AWS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;oc get route rag-app-aws &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.spec.host}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
curl https://&lt;span class="nv"&gt;$RAG_URL_AWS&lt;/span&gt;/health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Azure Implementation (RAG)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Azure Phase 1: ARO Cluster Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set environment variables&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;CLUSTER_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"rag-platform-azure"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;LOCATION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"eastus"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;RESOURCE_GROUP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"rag-platform-rg"&lt;/span&gt;

&lt;span class="c"&gt;# Create resource group&lt;/span&gt;
az group create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nv"&gt;$LOCATION&lt;/span&gt;

&lt;span class="c"&gt;# Create virtual network&lt;/span&gt;
az network vnet create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--address-prefixes&lt;/span&gt; 10.0.0.0/22

&lt;span class="c"&gt;# Create master subnet&lt;/span&gt;
az network vnet subnet create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vnet-name&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; master-subnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--address-prefixes&lt;/span&gt; 10.0.0.0/23 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-endpoints&lt;/span&gt; Microsoft.ContainerRegistry

&lt;span class="c"&gt;# Create worker subnet&lt;/span&gt;
az network vnet subnet create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vnet-name&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; worker-subnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--address-prefixes&lt;/span&gt; 10.0.2.0/23 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--service-endpoints&lt;/span&gt; Microsoft.ContainerRegistry

&lt;span class="c"&gt;# Disable subnet private endpoint policies&lt;/span&gt;
az network vnet subnet update &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; master-subnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vnet-name&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--disable-private-link-service-network-policies&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# Create ARO cluster (takes ~35 minutes)&lt;/span&gt;
az aro create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vnet&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--master-subnet&lt;/span&gt; master-subnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--worker-subnet&lt;/span&gt; worker-subnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--worker-count&lt;/span&gt; 3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--worker-vm-size&lt;/span&gt; Standard_D8s_v3

&lt;span class="c"&gt;# Get credentials&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ARO_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az aro show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; consoleUrl &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ARO_PASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az aro list-credentials &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; kubeadminPassword &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Login&lt;/span&gt;
oc login &lt;span class="nv"&gt;$ARO_URL&lt;/span&gt; &lt;span class="nt"&gt;-u&lt;/span&gt; kubeadmin &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="nv"&gt;$ARO_PASSWORD&lt;/span&gt;

&lt;span class="c"&gt;# Create namespaces&lt;/span&gt;
oc new-project rag-application
oc new-project milvus
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure Phase 2: Azure OpenAI via Private Link
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create Azure OpenAI resource&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"rag-openai-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RANDOM&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

az cognitiveservices account create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kind&lt;/span&gt; OpenAI &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku&lt;/span&gt; S0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nv"&gt;$LOCATION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--custom-domain&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--public-network-access&lt;/span&gt; Disabled

&lt;span class="c"&gt;# Deploy GPT-4 model&lt;/span&gt;
az cognitiveservices account deployment create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--deployment-name&lt;/span&gt; gpt-4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-name&lt;/span&gt; gpt-4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-version&lt;/span&gt; &lt;span class="s2"&gt;"0613"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-format&lt;/span&gt; OpenAI &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku-capacity&lt;/span&gt; 10 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku-name&lt;/span&gt; &lt;span class="s2"&gt;"Standard"&lt;/span&gt;

&lt;span class="c"&gt;# Deploy text-embedding model&lt;/span&gt;
az cognitiveservices account deployment create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--deployment-name&lt;/span&gt; text-embedding-ada-002 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-name&lt;/span&gt; text-embedding-ada-002 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-version&lt;/span&gt; &lt;span class="s2"&gt;"2"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model-format&lt;/span&gt; OpenAI &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku-capacity&lt;/span&gt; 10 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku-name&lt;/span&gt; &lt;span class="s2"&gt;"Standard"&lt;/span&gt;

&lt;span class="c"&gt;# Create Private Endpoint&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;VNET_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az network vnet show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SUBNET_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az network vnet subnet show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vnet-name&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; worker-subnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az cognitiveservices account show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

az network private-endpoint create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; openai-private-endpoint &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--vnet-name&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--subnet&lt;/span&gt; worker-subnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--private-connection-resource-id&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--group-id&lt;/span&gt; account &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--connection-name&lt;/span&gt; openai-connection

&lt;span class="c"&gt;# Create Private DNS Zone&lt;/span&gt;
az network private-dns zone create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; privatelink.openai.azure.com

az network private-dns &lt;span class="nb"&gt;link &lt;/span&gt;vnet create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--zone-name&lt;/span&gt; privatelink.openai.azure.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; openai-dns-link &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--virtual-network&lt;/span&gt; aro-vnet &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--registration-enabled&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;

&lt;span class="c"&gt;# Create DNS record&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ENDPOINT_IP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az network private-endpoint show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; openai-private-endpoint &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'customDnsConfigs[0].ipAddresses[0]'&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

az network private-dns record-set a create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--zone-name&lt;/span&gt; privatelink.openai.azure.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt;

az network private-dns record-set a add-record &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--record-set-name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--zone-name&lt;/span&gt; privatelink.openai.azure.com &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ipv4-address&lt;/span&gt; &lt;span class="nv"&gt;$ENDPOINT_IP&lt;/span&gt;

&lt;span class="c"&gt;# Configure Workload Identity&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ARO_OIDC_ISSUER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az aro show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'serviceIdentity.url'&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Create managed identity&lt;/span&gt;
az identity create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; rag-app-identity &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;IDENTITY_CLIENT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az identity show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; rag-app-identity &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; clientId &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;IDENTITY_PRINCIPAL_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az identity show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; rag-app-identity &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; principalId &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Grant OpenAI access&lt;/span&gt;
az role assignment create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--assignee&lt;/span&gt; &lt;span class="nv"&gt;$IDENTITY_PRINCIPAL_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt; &lt;span class="s2"&gt;"Cognitive Services OpenAI User"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scope&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_ID&lt;/span&gt;

&lt;span class="c"&gt;# Create federated credential&lt;/span&gt;
az identity federated-credential create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; rag-app-federated &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--identity-name&lt;/span&gt; rag-app-identity &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--issuer&lt;/span&gt; &lt;span class="nv"&gt;$ARO_OIDC_ISSUER&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--subject&lt;/span&gt; &lt;span class="s2"&gt;"system:serviceaccount:rag-application:openai-sa"&lt;/span&gt;

&lt;span class="c"&gt;# Create Kubernetes service account&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: openai-sa
  namespace: rag-application
  annotations:
    azure.workload.identity/client-id: &lt;/span&gt;&lt;span class="nv"&gt;$IDENTITY_CLIENT_ID&lt;/span&gt;&lt;span class="sh"&gt;
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Get OpenAI endpoint and key&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_ENDPOINT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az cognitiveservices account show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; properties.endpoint &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az cognitiveservices account keys list &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; key1 &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Create secret&lt;/span&gt;
oc create secret generic openai-credentials &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--from-literal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$OPENAI_ENDPOINT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--from-literal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$OPENAI_KEY&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure Phase 3: Azure Data Factory Pipeline
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create Data Factory&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ADF_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"rag-adf-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RANDOM&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

az datafactory create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--factory-name&lt;/span&gt; &lt;span class="nv"&gt;$ADF_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nv"&gt;$LOCATION&lt;/span&gt;

&lt;span class="c"&gt;# Create Storage Account&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;STORAGE_ACCOUNT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"ragdocs&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RANDOM&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

az storage account create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$STORAGE_ACCOUNT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nv"&gt;$LOCATION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku&lt;/span&gt; Standard_LRS &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kind&lt;/span&gt; StorageV2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--hierarchical-namespace&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# Get storage key&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;STORAGE_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;az storage account keys list &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--account-name&lt;/span&gt; &lt;span class="nv"&gt;$STORAGE_ACCOUNT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s1"&gt;'[0].value'&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; tsv&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Create containers&lt;/span&gt;
az storage container create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; raw-documents &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--account-name&lt;/span&gt; &lt;span class="nv"&gt;$STORAGE_ACCOUNT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--account-key&lt;/span&gt; &lt;span class="nv"&gt;$STORAGE_KEY&lt;/span&gt;

az storage container create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; processed-documents &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--account-name&lt;/span&gt; &lt;span class="nv"&gt;$STORAGE_ACCOUNT&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--account-key&lt;/span&gt; &lt;span class="nv"&gt;$STORAGE_KEY&lt;/span&gt;

&lt;span class="c"&gt;# Create linked service for storage&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; adf-storage-linked-service.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "name": "StorageLinkedService",
  "properties": {
    "type": "AzureBlobStorage",
    "typeProperties": {
      "connectionString": "DefaultEndpointsProtocol=https;AccountName=&lt;/span&gt;&lt;span class="nv"&gt;$STORAGE_ACCOUNT&lt;/span&gt;&lt;span class="sh"&gt;;AccountKey=&lt;/span&gt;&lt;span class="nv"&gt;$STORAGE_KEY&lt;/span&gt;&lt;span class="sh"&gt;;EndpointSuffix=core.windows.net"
    }
  }
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;az datafactory linked-service create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--factory-name&lt;/span&gt; &lt;span class="nv"&gt;$ADF_NAME&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; StorageLinkedService &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--properties&lt;/span&gt; @adf-storage-linked-service.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure Phase 4: Milvus Deployment (Same as AWS)
&lt;/h3&gt;

&lt;p&gt;The Milvus deployment on ARO is identical to ROSA since both use OpenShift:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Same Helm commands as AWS implementation&lt;/span&gt;
helm repo add milvus https://milvus-io.github.io/milvus-helm/
helm &lt;span class="nb"&gt;install &lt;/span&gt;milvus-operator milvus/milvus-operator &lt;span class="nt"&gt;--namespace&lt;/span&gt; milvus &lt;span class="nt"&gt;--create-namespace&lt;/span&gt;

&lt;span class="c"&gt;# Create PVCs using Azure Disk&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: milvus-etcd-pvc
  namespace: milvus
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 10Gi
  storageClassName: managed-premium
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: milvus-minio-pvc
  namespace: milvus
spec:
  accessModes: [ReadWriteOnce]
  resources:
    requests:
      storage: 50Gi
  storageClassName: managed-premium
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Deploy Milvus (same values file as AWS)&lt;/span&gt;
helm &lt;span class="nb"&gt;install &lt;/span&gt;milvus milvus/milvus &lt;span class="nt"&gt;--namespace&lt;/span&gt; milvus &lt;span class="nt"&gt;--values&lt;/span&gt; milvus-values.yaml &lt;span class="nt"&gt;--wait&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure Phase 5: RAG Application Deployment
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create Azure-specific application&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; rag-app-azure/src

&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; rag-app-azure/requirements.txt &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
fastapi==0.104.1
uvicorn[standard]==0.24.0
pydantic==2.5.0
pymilvus==2.3.3
openai==1.3.5
azure-identity==1.14.0
python-dotenv==1.0.0
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; rag-app-azure/src/main.py &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;'
from fastapi import FastAPI
from pydantic import BaseModel
import os
from openai import AzureOpenAI
from pymilvus import connections, Collection

app = FastAPI(title="Enterprise RAG API - Azure")

client = AzureOpenAI(
    api_key=os.getenv("OPENAI_KEY"),
    api_version="2023-05-15",
    azure_endpoint=os.getenv("OPENAI_ENDPOINT")
)

@app.on_event("startup")
async def startup():
    connections.connect(host=os.getenv("MILVUS_HOST"), port=19530)

class QueryRequest(BaseModel):
    query: str
    top_k: int = 5
    max_tokens: int = 1000

@app.post("/query")
async def query_rag(req: QueryRequest):
    # Generate embedding with Azure OpenAI
    embed_resp = client.embeddings.create(
        input=req.query,
        model="text-embedding-ada-002"
    )
    embedding = embed_resp.data[0].embedding

    # Search Milvus
    coll = Collection("rag_documents")
    results = coll.search([embedding], "embedding", {"metric_type": "L2"}, limit=req.top_k)

    # Build context
    context = "&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;".join([hit.entity.get("text") for hit in results[0]])

    # Call Azure OpenAI GPT-4
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": f"Context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;{context}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;Question: {req.query}"}
        ],
        max_tokens=req.max_tokens
    )

    answer = response.choices[0].message.content
    return {"answer": answer, "sources": [{"chunk": hit.entity.get("text")} for hit in results[0]]}

@app.get("/health")
async def health():
    return {"status": "healthy", "platform": "Azure", "model": "GPT-4"}
&lt;/span&gt;&lt;span class="no"&gt;PYTHON

&lt;/span&gt;&lt;span class="c"&gt;# Build and deploy (similar to AWS)&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;rag-app-azure
podman build &lt;span class="nt"&gt;-t&lt;/span&gt; rag-app-azure:v1.0 &lt;span class="nb"&gt;.&lt;/span&gt;
oc create imagestream rag-app-azure &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application
podman tag rag-app-azure:v1.0 image-registry.openshift-image-registry.svc:5000/rag-application/rag-app-azure:v1.0
podman push image-registry.openshift-image-registry.svc:5000/rag-application/rag-app-azure:v1.0 &lt;span class="nt"&gt;--tls-verify&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false
cd&lt;/span&gt; ..

&lt;span class="c"&gt;# Deploy with Azure credentials&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rag-app-azure
  namespace: rag-application
spec:
  replicas: 2
  selector:
    matchLabels:
      app: rag-app-azure
  template:
    metadata:
      labels:
        app: rag-app-azure
    spec:
      serviceAccountName: openai-sa
      containers:
      - name: app
        image: image-registry.openshift-image-registry.svc:5000/rag-application/rag-app-azure:v1.0
        ports:
        - containerPort: 8000
        env:
        - name: MILVUS_HOST
          value: "milvus.milvus.svc.cluster.local"
        - name: OPENAI_ENDPOINT
          valueFrom:
            secretKeyRef:
              name: openai-credentials
              key: endpoint
        - name: OPENAI_KEY
          valueFrom:
            secretKeyRef:
              name: openai-credentials
              key: key
---
apiVersion: v1
kind: Service
metadata:
  name: rag-app-azure
  namespace: rag-application
spec:
  selector:
    app: rag-app-azure
  ports:
  - port: 80
    targetPort: 8000
---
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: rag-app-azure
  namespace: rag-application
spec:
  to:
    kind: Service
    name: rag-app-azure
  tls:
    termination: edge
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Get URL and test&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;RAG_URL_AZURE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;oc get route rag-app-azure &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="nv"&gt;jsonpath&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'{.spec.host}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
curl https://&lt;span class="nv"&gt;$RAG_URL_AZURE&lt;/span&gt;/health
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cost Comparison (RAG)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Monthly Cost Breakdown
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;AWS Cost&lt;/th&gt;
&lt;th&gt;Azure Cost&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kubernetes Cluster&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- 3x worker nodes&lt;/td&gt;
&lt;td&gt;$1,460 (m5.2xlarge)&lt;/td&gt;
&lt;td&gt;$1,380 (D8s_v3)&lt;/td&gt;
&lt;td&gt;Similar specs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Control plane&lt;/td&gt;
&lt;td&gt;$0 (managed by ROSA)&lt;/td&gt;
&lt;td&gt;$0 (managed by ARO)&lt;/td&gt;
&lt;td&gt;Both included&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LLM API Calls&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- 1M input tokens&lt;/td&gt;
&lt;td&gt;$3 (Claude 3.5)&lt;/td&gt;
&lt;td&gt;$30 (GPT-4)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS 10x cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- 1M output tokens&lt;/td&gt;
&lt;td&gt;$15 (Claude 3.5)&lt;/td&gt;
&lt;td&gt;$60 (GPT-4)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS 4x cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Embeddings&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- 1M tokens&lt;/td&gt;
&lt;td&gt;$0.10 (Titan)&lt;/td&gt;
&lt;td&gt;$0.10 (Ada-002)&lt;/td&gt;
&lt;td&gt;Equivalent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Pipeline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- ETL service&lt;/td&gt;
&lt;td&gt;$10 (Glue, serverless)&lt;/td&gt;
&lt;td&gt;$15 (Data Factory)&lt;/td&gt;
&lt;td&gt;AWS slightly cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Metadata catalog&lt;/td&gt;
&lt;td&gt;$1 (Glue Catalog)&lt;/td&gt;
&lt;td&gt;$20 (Purview min)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Azure has minimum fee&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Object Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- 100 GB storage&lt;/td&gt;
&lt;td&gt;$2.30 (S3)&lt;/td&gt;
&lt;td&gt;$2.05 (Blob)&lt;/td&gt;
&lt;td&gt;Equivalent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Requests (100k)&lt;/td&gt;
&lt;td&gt;$0.05 (S3)&lt;/td&gt;
&lt;td&gt;$0.04 (Blob)&lt;/td&gt;
&lt;td&gt;Equivalent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vector Database&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Self-hosted Milvus&lt;/td&gt;
&lt;td&gt;$0 (on cluster)&lt;/td&gt;
&lt;td&gt;$0 (on cluster)&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Networking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Private Link&lt;/td&gt;
&lt;td&gt;$7.20 (PrivateLink)&lt;/td&gt;
&lt;td&gt;$7.20 (Private Link)&lt;/td&gt;
&lt;td&gt;Same pricing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Data transfer&lt;/td&gt;
&lt;td&gt;$5 (1 TB out)&lt;/td&gt;
&lt;td&gt;$5 (1 TB out)&lt;/td&gt;
&lt;td&gt;Equivalent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TOTAL/MONTH&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1,503.65&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1,519.39&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS 1% cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key Cost Insights&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;LLM API costs favor AWS&lt;/strong&gt; by a significant margin (Claude is cheaper than GPT-4)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure Purview&lt;/strong&gt; has a minimum monthly fee vs Glue's pay-per-use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compute costs are similar&lt;/strong&gt; between ROSA and ARO&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Winner: AWS by ~$16/month (1%)&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Cost Optimization Strategies
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AWS&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use Claude Instant for non-critical queries (6x cheaper)&lt;/li&gt;
&lt;li&gt;Leverage Glue serverless (no base cost)&lt;/li&gt;
&lt;li&gt;Use S3 Intelligent-Tiering for old documents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Azure&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use GPT-3.5-Turbo instead of GPT-4 (20x cheaper)&lt;/li&gt;
&lt;li&gt;Negotiate EA pricing for Azure OpenAI&lt;/li&gt;
&lt;li&gt;Use cool/archive tiers for old data&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Project 2: Hybrid MLOps Pipeline
&lt;/h2&gt;

&lt;h3&gt;
  
  
  MLOps Platform Overview
&lt;/h3&gt;

&lt;p&gt;This project demonstrates cost-optimized machine learning operations by bursting GPU training workloads to managed services while keeping inference on Kubernetes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture Comparison
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AWS Architecture&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OpenShift Pipelines → ACK → SageMaker (ml.p4d.24xlarge)
                            ↓
                        S3 Model Storage
                            ↓
                    KServe on ROSA (CPU)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Azure Architecture&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Azure DevOps / Tekton → ASO → Azure ML (NC96ads_A100_v4)
                               ↓
                           Blob Model Storage
                               ↓
                       KServe on ARO (CPU)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Service Mapping
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;AWS Service&lt;/th&gt;
&lt;th&gt;Azure Service&lt;/th&gt;
&lt;th&gt;Key Difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ML Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon SageMaker&lt;/td&gt;
&lt;td&gt;Azure Machine Learning&lt;/td&gt;
&lt;td&gt;Similar capabilities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GPU Training&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ml.p4d.24xlarge (8x A100)&lt;/td&gt;
&lt;td&gt;NC96ads_A100_v4 (8x A100)&lt;/td&gt;
&lt;td&gt;Same hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Spot Training&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Managed Spot Training&lt;/td&gt;
&lt;td&gt;Low Priority VMs&lt;/td&gt;
&lt;td&gt;Different reservation models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model Registry&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;S3 + SageMaker Registry&lt;/td&gt;
&lt;td&gt;Blob + ML Model Registry&lt;/td&gt;
&lt;td&gt;Different metadata approaches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;K8s Operator&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ACK (AWS Controllers)&lt;/td&gt;
&lt;td&gt;ASO (Azure Service Operator)&lt;/td&gt;
&lt;td&gt;Different CRD structures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pipelines&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OpenShift Pipelines (Tekton)&lt;/td&gt;
&lt;td&gt;Azure DevOps / Tekton&lt;/td&gt;
&lt;td&gt;Both support Tekton&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inference&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;KServe on ROSA&lt;/td&gt;
&lt;td&gt;KServe on ARO&lt;/td&gt;
&lt;td&gt;Identical&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  AWS Implementation (MLOps)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWS MLOps Phase 1: OpenShift Pipelines Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install OpenShift Pipelines Operator&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: openshift-pipelines-operator
  namespace: openshift-operators
spec:
  channel: latest
  name: openshift-pipelines-operator-rh
  source: redhat-operators
  sourceNamespace: openshift-marketplace
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create namespace&lt;/span&gt;
oc new-project mlops-pipelines

&lt;span class="c"&gt;# Create service account&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: v1
kind: ServiceAccount
metadata:
  name: pipeline-sa
  namespace: mlops-pipelines
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS MLOps Phase 2: ACK SageMaker Controller
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install ACK SageMaker controller&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sagemaker
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;RELEASE_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-sL&lt;/span&gt; https://api.github.com/repos/aws-controllers-k8s/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="nt"&gt;-controller&lt;/span&gt;/releases/latest | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s1"&gt;'\"tag_name\":'&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;'\"'&lt;/span&gt; &lt;span class="nt"&gt;-f4&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

wget https://github.com/aws-controllers-k8s/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SERVICE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="nt"&gt;-controller&lt;/span&gt;/releases/download/&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RELEASE_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/install.yaml
kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; install.yaml

&lt;span class="c"&gt;# Create IAM role for ACK&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ack-sagemaker-policy.json &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sagemaker:CreateTrainingJob",
        "sagemaker:DescribeTrainingJob",
        "sagemaker:StopTrainingJob"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": ["s3:*"],
      "Resource": "arn:aws:s3:::mlops-*"
    },
    {
      "Effect": "Allow",
      "Action": ["iam:PassRole"],
      "Resource": "*",
      "Condition": {
        "StringEquals": {"iam:PassedToService": "sagemaker.amazonaws.com"}
      }
    }
  ]
}
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;aws iam create-policy &lt;span class="nt"&gt;--policy-name&lt;/span&gt; ACKSageMakerPolicy &lt;span class="nt"&gt;--policy-document&lt;/span&gt; file://ack-sagemaker-policy.json

&lt;span class="c"&gt;# Create trust policy and role (similar to RAG project)&lt;/span&gt;
&lt;span class="c"&gt;# ... (abbreviated for space)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AWS MLOps Phase 3: Training Job Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create S3 buckets&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ML_BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"mlops-artifacts-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;DATA_BUCKET&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"mlops-datasets-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

aws s3 mb s3://&lt;span class="nv"&gt;$ML_BUCKET&lt;/span&gt;
aws s3 mb s3://&lt;span class="nv"&gt;$DATA_BUCKET&lt;/span&gt;

&lt;span class="c"&gt;# Upload training script&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; train.py &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;PYTHON&lt;/span&gt;&lt;span class="sh"&gt;'
import argparse, joblib
from sklearn.ensemble import RandomForestClassifier
import numpy as np

parser = argparse.ArgumentParser()
parser.add_argument('--n_estimators', type=int, default=100)
args = parser.parse_args()

# Training code
X = np.random.rand(1000, 20)
y = np.random.randint(0, 2, 1000)

model = RandomForestClassifier(n_estimators=args.n_estimators)
model.fit(X, y)

joblib.dump(model, '/opt/ml/model/model.joblib')
print(f"Training completed with {args.n_estimators} estimators")
&lt;/span&gt;&lt;span class="no"&gt;PYTHON

&lt;/span&gt;&lt;span class="c"&gt;# Create Dockerfile&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; Dockerfile &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
FROM python:3.10-slim
RUN pip install scikit-learn joblib numpy
COPY train.py /opt/ml/code/
ENTRYPOINT ["python", "/opt/ml/code/train.py"]
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Build and push to ECR&lt;/span&gt;
aws ecr create-repository &lt;span class="nt"&gt;--repository-name&lt;/span&gt; mlops/training
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ECR_URI&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.dkr.ecr.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AWS_REGION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.amazonaws.com/mlops/training"&lt;/span&gt;
aws ecr get-login-password | docker login &lt;span class="nt"&gt;--username&lt;/span&gt; AWS &lt;span class="nt"&gt;--password-stdin&lt;/span&gt; &lt;span class="nv"&gt;$ECR_URI&lt;/span&gt;
docker build &lt;span class="nt"&gt;-t&lt;/span&gt; mlops-training &lt;span class="nb"&gt;.&lt;/span&gt;
docker tag mlops-training:latest &lt;span class="nv"&gt;$ECR_URI&lt;/span&gt;:latest
docker push &lt;span class="nv"&gt;$ECR_URI&lt;/span&gt;:latest

&lt;span class="c"&gt;# Create SageMaker training job via ACK&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: TrainingJob
metadata:
  name: rf-training-job
  namespace: mlops-pipelines
spec:
  trainingJobName: rf-training-&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%s&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;
  roleARN: &lt;/span&gt;&lt;span class="nv"&gt;$SAGEMAKER_ROLE_ARN&lt;/span&gt;&lt;span class="sh"&gt;
  algorithmSpecification:
    trainingImage: &lt;/span&gt;&lt;span class="nv"&gt;$ECR_URI&lt;/span&gt;&lt;span class="sh"&gt;:latest
    trainingInputMode: File
  resourceConfig:
    instanceType: ml.m5.xlarge
    instanceCount: 1
    volumeSizeInGB: 50
  outputDataConfig:
    s3OutputPath: s3://&lt;/span&gt;&lt;span class="nv"&gt;$ML_BUCKET&lt;/span&gt;&lt;span class="sh"&gt;/models/
  stoppingCondition:
    maxRuntimeInSeconds: 3600
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Azure Implementation (MLOps)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Azure MLOps Phase 1: Azure ML Workspace
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create ML workspace&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ML_WORKSPACE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"mlops-workspace-&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RANDOM&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

az ml workspace create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$ML_WORKSPACE&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nv"&gt;$LOCATION&lt;/span&gt;

&lt;span class="c"&gt;# Create compute cluster (spot instances)&lt;/span&gt;
az ml compute create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; gpu-cluster &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--type&lt;/span&gt; amlcompute &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--min-instances&lt;/span&gt; 0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-instances&lt;/span&gt; 4 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--size&lt;/span&gt; Standard_NC6s_v3 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tier&lt;/span&gt; LowPriority &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--workspace-name&lt;/span&gt; &lt;span class="nv"&gt;$ML_WORKSPACE&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure MLOps Phase 2: Azure Service Operator
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install ASO&lt;/span&gt;
helm repo add aso2 https://raw.githubusercontent.com/Azure/azure-service-operator/main/v2/charts
helm &lt;span class="nb"&gt;install &lt;/span&gt;aso2 aso2/azure-service-operator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--create-namespace&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; azureserviceoperator-system &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;azureSubscriptionID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$SUBSCRIPTION_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;azureTenantID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$TENANT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;azureClientID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLIENT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;azureClientSecret&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$CLIENT_SECRET&lt;/span&gt;

&lt;span class="c"&gt;# Create ML job via ASO&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: machinelearningservices.azure.com/v1alpha1
kind: Job
metadata:
  name: rf-training-job
  namespace: mlops-pipelines
spec:
  owner:
    name: &lt;/span&gt;&lt;span class="nv"&gt;$ML_WORKSPACE&lt;/span&gt;&lt;span class="sh"&gt;
  compute:
    target: gpu-cluster
    instanceCount: 1
  environment:
    image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04
  codeConfiguration:
    codeArtifactId: azureml://code/train-script
    scoringScript: train.py
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cost Comparison (MLOps)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;AWS Monthly&lt;/th&gt;
&lt;th&gt;Azure Monthly&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Training&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- 4 hrs/week spot GPU&lt;/td&gt;
&lt;td&gt;$157 (ml.p4d.24xlarge)&lt;/td&gt;
&lt;td&gt;$153 (NC96ads_A100_v4)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Azure slightly cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Model artifacts (50 GB)&lt;/td&gt;
&lt;td&gt;$1.15 (S3)&lt;/td&gt;
&lt;td&gt;$1.00 (Blob)&lt;/td&gt;
&lt;td&gt;Similar&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ML Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- ML service&lt;/td&gt;
&lt;td&gt;$0 (pay-per-use)&lt;/td&gt;
&lt;td&gt;$0 (pay-per-use)&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inference (on OpenShift)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Shared ROSA/ARO cluster&lt;/td&gt;
&lt;td&gt;$0 (shared)&lt;/td&gt;
&lt;td&gt;$0 (shared)&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TOTAL/MONTH&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$158&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$154&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Azure 2.5% cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Winner: Azure&lt;/strong&gt; by $4/month (negligible difference)&lt;/p&gt;

&lt;h2&gt;
  
  
  Project 3: Unified Data Fabric (Data Lakehouse)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Lakehouse Platform Overview
&lt;/h3&gt;

&lt;p&gt;This project implements a stateless data lakehouse where compute (Spark) can be destroyed without data loss.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture Comparison
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AWS Architecture&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Spark on ROSA → AWS Glue Catalog → S3 + Iceberg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Azure Architecture&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Spark on ARO → Azure Purview / Unity Catalog → ADLS Gen2 + Delta Lake
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Service Mapping
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;AWS Service&lt;/th&gt;
&lt;th&gt;Azure Service&lt;/th&gt;
&lt;th&gt;Key Difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Catalog&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS Glue Data Catalog&lt;/td&gt;
&lt;td&gt;Azure Purview / Unity Catalog&lt;/td&gt;
&lt;td&gt;Glue is serverless&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Table Format&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apache Iceberg&lt;/td&gt;
&lt;td&gt;Delta Lake&lt;/td&gt;
&lt;td&gt;Iceberg is cloud-agnostic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon S3&lt;/td&gt;
&lt;td&gt;ADLS Gen2&lt;/td&gt;
&lt;td&gt;ADLS has hierarchical namespace&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compute&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Spark on ROSA&lt;/td&gt;
&lt;td&gt;Spark on ARO / Databricks&lt;/td&gt;
&lt;td&gt;ARO or managed Databricks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Query Engine&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Amazon Athena&lt;/td&gt;
&lt;td&gt;Azure Synapse Serverless SQL&lt;/td&gt;
&lt;td&gt;Similar serverless query&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  AWS Implementation (Lakehouse)
&lt;/h2&gt;

&lt;p&gt;(Due to length constraints, showing key differences only)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Spark Operator&lt;/span&gt;
helm &lt;span class="nb"&gt;install &lt;/span&gt;spark-operator spark-operator/spark-operator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--namespace&lt;/span&gt; spark-operator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set&lt;/span&gt; &lt;span class="nv"&gt;sparkJobNamespace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;spark-jobs

&lt;span class="c"&gt;# Create Glue databases&lt;/span&gt;
aws glue create-database &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{"Name": "bronze"}'&lt;/span&gt;
aws glue create-database &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{"Name": "silver"}'&lt;/span&gt;
aws glue create-database &lt;span class="nt"&gt;--database-input&lt;/span&gt; &lt;span class="s1"&gt;'{"Name": "gold"}'&lt;/span&gt;

&lt;span class="c"&gt;# Build custom Spark image with Iceberg&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; Dockerfile &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
FROM gcr.io/spark-operator/spark:v3.5.0
USER root
RUN curl -L https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/1.4.2/iceberg-spark-runtime-3.5_2.12-1.4.2.jar &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    -o /opt/spark/jars/iceberg-spark-runtime.jar
RUN curl -L https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.4/hadoop-aws-3.3.4.jar &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    -o /opt/spark/jars/hadoop-aws.jar
USER 185
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Deploy SparkApplication with Glue integration&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: lakehouse-etl
spec:
  type: Python
  sparkVersion: "3.5.0"
  mainApplicationFile: s3://bucket/scripts/etl.py
  sparkConf:
    "spark.sql.catalog.glue_catalog": "org.apache.iceberg.spark.SparkCatalog"
    "spark.sql.catalog.glue_catalog.catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog"
    "spark.hadoop.fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem"
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Azure Implementation (Lakehouse)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Option 1: Use Azure Databricks (managed)&lt;/span&gt;
az databricks workspace create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; databricks-lakehouse &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nv"&gt;$LOCATION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sku&lt;/span&gt; premium

&lt;span class="c"&gt;# Option 2: Deploy Spark on ARO with Delta Lake&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; Dockerfile &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
FROM gcr.io/spark-operator/spark:v3.5.0
USER root
RUN curl -L https://repo1.maven.org/maven2/io/delta/delta-core_2.12/2.4.0/delta-core_2.12-2.4.0.jar &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="sh"&gt;
    -o /opt/spark/jars/delta-core.jar
USER 185
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Create ADLS Gen2 storage&lt;/span&gt;
az storage account create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; datalake&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RANDOM&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt; &lt;span class="nv"&gt;$LOCATION&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kind&lt;/span&gt; StorageV2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--hierarchical-namespace&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# Deploy SparkApplication with Delta Lake&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt; | oc apply -f -
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: lakehouse-etl
spec:
  type: Python
  sparkVersion: "3.5.0"
  mainApplicationFile: abfss://container@storage.dfs.core.windows.net/scripts/etl.py
  sparkConf:
    "spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension"
    "spark.sql.catalog.spark_catalog": "org.apache.spark.sql.delta.catalog.DeltaCatalog"
&lt;/span&gt;&lt;span class="no"&gt;EOF
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cost Comparison (Lakehouse)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;AWS Monthly&lt;/th&gt;
&lt;th&gt;Azure Monthly&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compute&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Spark cluster (3x m5.4xlarge)&lt;/td&gt;
&lt;td&gt;$1,500&lt;/td&gt;
&lt;td&gt;$1,450 (D16s_v3)&lt;/td&gt;
&lt;td&gt;Similar&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Metadata Catalog&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Catalog service&lt;/td&gt;
&lt;td&gt;$10 (Glue, 1M requests)&lt;/td&gt;
&lt;td&gt;$20 (Purview minimum)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;AWS cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Storage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Data lake (1 TB)&lt;/td&gt;
&lt;td&gt;$23 (S3)&lt;/td&gt;
&lt;td&gt;$18 (ADLS Gen2 hot)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Azure cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Query Engine&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;- Serverless queries (1 TB)&lt;/td&gt;
&lt;td&gt;$5 (Athena)&lt;/td&gt;
&lt;td&gt;$5 (Synapse serverless)&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TOTAL/MONTH&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1,538&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1,493&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Azure 3% cheaper&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Winner: Azure&lt;/strong&gt; by $45/month (3%)&lt;/p&gt;

&lt;h2&gt;
  
  
  Total Cost of Ownership Analysis
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Combined Monthly Costs
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;AWS Total&lt;/th&gt;
&lt;th&gt;Azure Total&lt;/th&gt;
&lt;th&gt;Difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RAG Platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$1,504&lt;/td&gt;
&lt;td&gt;$1,519&lt;/td&gt;
&lt;td&gt;AWS -$15 (-1%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MLOps Pipeline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$158&lt;/td&gt;
&lt;td&gt;$154&lt;/td&gt;
&lt;td&gt;Azure -$4 (-2.5%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Lakehouse&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$1,538&lt;/td&gt;
&lt;td&gt;$1,493&lt;/td&gt;
&lt;td&gt;Azure -$45 (-3%)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TOTAL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$3,200/month&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$3,166/month&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Azure -$34/month (-1%)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Annual Projection
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS&lt;/strong&gt;: $3,200 × 12 = &lt;strong&gt;$38,400/year&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure&lt;/strong&gt;: $3,166 × 12 = &lt;strong&gt;$37,992/year&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Savings with Azure&lt;/strong&gt;: &lt;strong&gt;$408/year (1%)&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cost Sensitivity Analysis
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Scenario 1: High LLM Usage&lt;/strong&gt; (10M tokens/month)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS: +$180 (Claude cheaper)&lt;/li&gt;
&lt;li&gt;Azure: +$900 (GPT-4 more expensive)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AWS wins by $720/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scenario 2: Heavy ML Training&lt;/strong&gt; (20 hrs/week GPU)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS: +$785&lt;/li&gt;
&lt;li&gt;Azure: +$765&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Azure wins by $20/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scenario 3: Large Data Lake&lt;/strong&gt; (10 TB storage)&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS: +$230&lt;/li&gt;
&lt;li&gt;Azure: +$180&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Azure wins by $50/month&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;: &lt;strong&gt;AWS is better for AI-heavy workloads&lt;/strong&gt; due to cheaper LLM pricing. &lt;strong&gt;Azure is better for data-heavy workloads&lt;/strong&gt; due to cheaper storage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-Cloud Integration Patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Unified RBAC Strategy
&lt;/h3&gt;

&lt;p&gt;Both platforms support similar pod-level identity:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AWS (IRSA)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-sa&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;eks.amazonaws.com/role-arn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arn:aws:iam::ACCOUNT:role/AppRole&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Azure (Workload Identity)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ServiceAccount&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;app-sa&lt;/span&gt;
  &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;azure.workload.identity/client-id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CLIENT_ID&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multi-Cloud Disaster Recovery
&lt;/h3&gt;

&lt;p&gt;Deploy identical workloads on both platforms for DR:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Primary: AWS&lt;/span&gt;
&lt;span class="c"&gt;# Standby: Azure&lt;/span&gt;
&lt;span class="c"&gt;# Failover time: &amp;lt; 5 minutes with DNS switch&lt;/span&gt;

&lt;span class="c"&gt;# Shared components:&lt;/span&gt;
&lt;span class="c"&gt;# - OpenShift APIs (same)&lt;/span&gt;
&lt;span class="c"&gt;# - Application code (same)&lt;/span&gt;
&lt;span class="c"&gt;# - Milvus deployment (same)&lt;/span&gt;

&lt;span class="c"&gt;# Platform-specific:&lt;/span&gt;
&lt;span class="c"&gt;# - Cloud credentials&lt;/span&gt;
&lt;span class="c"&gt;# - Storage endpoints&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Migration Strategies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWS to Azure Migration
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Phase 1: Data Migration&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Use AzCopy for S3 → Blob migration&lt;/span&gt;
azcopy copy &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"https://s3.amazonaws.com/bucket/*"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"https://storageaccount.blob.core.windows.net/container"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--recursive&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Phase 2: Metadata Migration&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Export Glue Catalog to JSON&lt;/li&gt;
&lt;li&gt;Import to Azure Purview via API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Phase 3: Application Migration&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Update environment variables&lt;/li&gt;
&lt;li&gt;Switch cloud credentials&lt;/li&gt;
&lt;li&gt;Deploy to ARO&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Azure to AWS Migration
&lt;/h3&gt;

&lt;p&gt;Similar process in reverse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Use AWS DataSync for Blob → S3&lt;/span&gt;
aws datasync create-task &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--source-location-arn&lt;/span&gt; arn:aws:datasync:...:location/azure-blob &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--destination-location-arn&lt;/span&gt; arn:aws:datasync:...:location/s3-bucket
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Resource Cleanup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWS Complete Cleanup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Complete AWS resource cleanup&lt;/span&gt;

&lt;span class="c"&gt;# RAG Platform&lt;/span&gt;
rosa delete cluster &lt;span class="nt"&gt;--cluster&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;rag-platform-aws &lt;span class="nt"&gt;--yes&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;rm &lt;/span&gt;s3://rag-documents-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;--recursive&lt;/span&gt;
aws s3 rb s3://rag-documents-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
aws glue delete-crawler &lt;span class="nt"&gt;--name&lt;/span&gt; rag-document-crawler
aws glue delete-database &lt;span class="nt"&gt;--name&lt;/span&gt; rag_documents_db
aws ec2 delete-vpc-endpoints &lt;span class="nt"&gt;--vpc-endpoint-ids&lt;/span&gt; &lt;span class="nv"&gt;$BEDROCK_VPC_ENDPOINT&lt;/span&gt;
aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; rosa-bedrock-access
aws iam delete-policy &lt;span class="nt"&gt;--policy-arn&lt;/span&gt; arn:aws:iam::&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;:policy/BedrockInvokePolicy

&lt;span class="c"&gt;# MLOps Platform&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;rm &lt;/span&gt;s3://mlops-artifacts-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;--recursive&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;rm &lt;/span&gt;s3://mlops-datasets-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;--recursive&lt;/span&gt;
aws s3 rb s3://mlops-artifacts-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
aws s3 rb s3://mlops-datasets-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
aws ecr delete-repository &lt;span class="nt"&gt;--repository-name&lt;/span&gt; mlops/training &lt;span class="nt"&gt;--force&lt;/span&gt;
aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; ACKSageMakerControllerRole

&lt;span class="c"&gt;# Data Lakehouse&lt;/span&gt;
aws s3 &lt;span class="nb"&gt;rm &lt;/span&gt;s3://lakehouse-data-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;--recursive&lt;/span&gt;
aws s3 rb s3://lakehouse-data-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;ACCOUNT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;db &lt;span class="k"&gt;in &lt;/span&gt;bronze silver gold&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;aws glue delete-database &lt;span class="nt"&gt;--name&lt;/span&gt; &lt;span class="nv"&gt;$db&lt;/span&gt;
&lt;span class="k"&gt;done
&lt;/span&gt;aws iam delete-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; SparkGlueCatalogRole

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"AWS cleanup complete"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Azure Complete Cleanup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Complete Azure resource cleanup&lt;/span&gt;

&lt;span class="c"&gt;# Delete all resources in resource group&lt;/span&gt;
az group delete &lt;span class="nt"&gt;--name&lt;/span&gt; rag-platform-rg &lt;span class="nt"&gt;--yes&lt;/span&gt; &lt;span class="nt"&gt;--no-wait&lt;/span&gt;

&lt;span class="c"&gt;# This deletes:&lt;/span&gt;
&lt;span class="c"&gt;# - ARO cluster&lt;/span&gt;
&lt;span class="c"&gt;# - Azure OpenAI service&lt;/span&gt;
&lt;span class="c"&gt;# - Storage accounts&lt;/span&gt;
&lt;span class="c"&gt;# - Data Factory&lt;/span&gt;
&lt;span class="c"&gt;# - Azure ML workspace&lt;/span&gt;
&lt;span class="c"&gt;# - All networking components&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Azure cleanup complete (deleting in background)"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Common Multi-Cloud Issues
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Issue: Cross-Cloud Latency
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: Slow API responses when accessing cloud services&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AWS Solution&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify VPC endpoint is in correct AZ&lt;/span&gt;
aws ec2 describe-vpc-endpoints &lt;span class="nt"&gt;--vpc-endpoint-ids&lt;/span&gt; &lt;span class="nv"&gt;$ENDPOINT_ID&lt;/span&gt;

&lt;span class="c"&gt;# Check PrivateLink latency&lt;/span&gt;
oc run &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;curlimages/curl &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  curl &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"@curl-format.txt"&lt;/span&gt; https://bedrock-runtime.us-east-1.amazonaws.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Azure Solution&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify Private Link in same region as ARO&lt;/span&gt;
az network private-endpoint show &lt;span class="nt"&gt;--name&lt;/span&gt; openai-private-endpoint

&lt;span class="c"&gt;# Test latency&lt;/span&gt;
oc run &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;--image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;curlimages/curl &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  curl &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"@curl-format.txt"&lt;/span&gt; https://OPENAI_NAME.openai.azure.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Issue: Authentication Failures
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;AWS IRSA Troubleshooting&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify OIDC provider&lt;/span&gt;
rosa describe cluster &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="nv"&gt;$CLUSTER_NAME&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; json | jq .aws.sts.oidc_endpoint_url

&lt;span class="c"&gt;# Test token&lt;/span&gt;
kubectl create token bedrock-sa &lt;span class="nt"&gt;-n&lt;/span&gt; rag-application

&lt;span class="c"&gt;# Verify IAM trust policy&lt;/span&gt;
aws iam get-role &lt;span class="nt"&gt;--role-name&lt;/span&gt; rosa-bedrock-access
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Azure Workload Identity Troubleshooting&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify federated credential&lt;/span&gt;
az identity federated-credential show &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt; rag-app-federated &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--identity-name&lt;/span&gt; rag-app-identity &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--resource-group&lt;/span&gt; &lt;span class="nv"&gt;$RESOURCE_GROUP&lt;/span&gt;

&lt;span class="c"&gt;# Test managed identity&lt;/span&gt;
az account get-access-token &lt;span class="nt"&gt;--resource&lt;/span&gt; https://cognitiveservices.azure.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Platform Selection Recommendations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Choose AWS if you&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prioritize AI/ML model diversity (Bedrock marketplace)&lt;/li&gt;
&lt;li&gt;Have variable, unpredictable workloads (serverless pricing)&lt;/li&gt;
&lt;li&gt;Value open-source ecosystem compatibility&lt;/li&gt;
&lt;li&gt;Need global multi-region deployments&lt;/li&gt;
&lt;li&gt;Want lower LLM API costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Azure if you&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Have existing Microsoft enterprise agreements&lt;/li&gt;
&lt;li&gt;Need Windows container support&lt;/li&gt;
&lt;li&gt;Require hybrid cloud with on-premises&lt;/li&gt;
&lt;li&gt;Have Microsoft 365 / Teams integration requirements&lt;/li&gt;
&lt;li&gt;Want slightly lower infrastructure costs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Multi-Cloud if you&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Need disaster recovery across providers&lt;/li&gt;
&lt;li&gt;Want to avoid vendor lock-in&lt;/li&gt;
&lt;li&gt;Have regulatory requirements for redundancy&lt;/li&gt;
&lt;li&gt;Can manage operational complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Final Cost Summary
&lt;/h3&gt;

&lt;p&gt;For the three projects combined:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AWS Total&lt;/strong&gt;: $3,200/month ($38,400/year)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure Total&lt;/strong&gt;: $3,166/month ($37,992/year)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Difference&lt;/strong&gt;: 1% ($408/year favoring Azure)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Verdict&lt;/strong&gt;: &lt;strong&gt;Costs are effectively equivalent&lt;/strong&gt;. Choose based on ecosystem fit, not cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Technical Takeaways
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;OpenShift provides platform portability&lt;/strong&gt; - same APIs on both clouds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud-specific services&lt;/strong&gt; (Bedrock, Azure OpenAI) require different code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage abstractions&lt;/strong&gt; (S3 vs Blob) are the main migration challenge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IAM patterns&lt;/strong&gt; (IRSA vs Workload Identity) are conceptually similar&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Next Steps
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;To Expand This Implementation&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add GitOps with ArgoCD for both platforms&lt;/li&gt;
&lt;li&gt;Implement cross-cloud disaster recovery&lt;/li&gt;
&lt;li&gt;Add comprehensive monitoring with Grafana&lt;/li&gt;
&lt;li&gt;Automate deployments with Terraform/Bicep&lt;/li&gt;
&lt;li&gt;Implement cost governance and FinOps&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Thank you for reading this comprehensive multi-cloud implementation guide!&lt;/p&gt;

</description>
      <category>coding</category>
      <category>aws</category>
      <category>machinelearning</category>
      <category>azure</category>
    </item>
    <item>
      <title>Open APIs in Telecom: Your Ticket to the Developer’s Playground</title>
      <dc:creator>Marco Gonzalez</dc:creator>
      <pubDate>Thu, 27 Feb 2025 16:35:50 +0000</pubDate>
      <link>https://dev.to/aws-builders/open-apis-in-telecom-your-ticket-to-the-developers-playground-4153</link>
      <guid>https://dev.to/aws-builders/open-apis-in-telecom-your-ticket-to-the-developers-playground-4153</guid>
      <description>&lt;p&gt;Let’s get straight to the point and turn you into a real Open API developer by breaking down the core concepts of how a Network API actually works.  &lt;/p&gt;

&lt;p&gt;First, let’s clarify what an Open API is and how it differs from a standard API.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"The main difference is that an Open API is publicly available, whereas a regular API might be restricted to specific users or partners"&lt;/em&gt; by Gemini&lt;/p&gt;

&lt;p&gt;What this means is that as long as you keep your API publicly available—including documentation and specifications—you can call yourself an Open API developer.  &lt;/p&gt;

&lt;p&gt;For a Proof of Concept (PoC), this sounds like a lot of fun. But whenever a new trend or technology emerges, there’s always an opportunity to make it profitable.&lt;/p&gt;

&lt;p&gt;In this lab, I will show you how to interact with Vonage Network API for &lt;strong&gt;FREE&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Index
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
Introduction
&lt;/li&gt;
&lt;li&gt;
Scope
&lt;/li&gt;
&lt;li&gt;
State of Art

&lt;ul&gt;
&lt;li&gt;
Communication APIs
&lt;/li&gt;
&lt;li&gt;
Network APIs
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
Implementation

&lt;ul&gt;
&lt;li&gt;
Pre-requisites
&lt;/li&gt;
&lt;li&gt;
Create Vonage Account
&lt;/li&gt;
&lt;li&gt;
Verify Your Account and Log In Securely
&lt;/li&gt;
&lt;li&gt;
Vonage API Dashboard
&lt;/li&gt;
&lt;li&gt;
Vonage SMS API
&lt;/li&gt;
&lt;li&gt;
Vonage Number Verification API
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
What is next?
&lt;/li&gt;
&lt;li&gt;
Final words
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Scope:
&lt;/h3&gt;

&lt;p&gt;I will show you how to interact with Vonage SMS API and Number Verification API, one of the &lt;code&gt;insert complexity here&lt;/code&gt; use cases from Communication and Network API catalogue.&lt;/p&gt;

&lt;h3&gt;
  
  
  State of Art:
&lt;/h3&gt;

&lt;p&gt;Vonage CPaaS (Communications Platform as a Service) is a cloud-based platform that provides various real-time communication features, including voice, messaging, and video. Instead of building a communication infrastructure from scratch—which requires significant time, resources, and maintenance—businesses can integrate specific communication functions into their applications with ease. Vonage handles most of the maintenance, allowing companies to focus on their core services while leveraging powerful communication capabilities.&lt;/p&gt;

&lt;p&gt;CPaaS can include the following 2 categories:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Communication APIs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Voice&lt;/li&gt;
&lt;li&gt;SMS&lt;/li&gt;
&lt;li&gt;Video&lt;/li&gt;
&lt;li&gt;Authentication&lt;/li&gt;
&lt;li&gt;IP Chat&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Network APIs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Silent Auth&lt;/li&gt;
&lt;li&gt;QoD&lt;/li&gt;
&lt;li&gt;Location&lt;/li&gt;
&lt;li&gt;Device Data&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Implementation
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Pre-requisites
&lt;/h4&gt;

&lt;p&gt;For this demo you only need:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Valid phone number&lt;/li&gt;
&lt;li&gt;Vonage Free Account&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Create Vonage Account
&lt;/h4&gt;

&lt;p&gt;Create your account here: &lt;a href="https://dashboard.nexmo.com/sign-up" rel="noopener noreferrer"&gt;https://dashboard.nexmo.com/sign-up&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8fy7jfgzbbtzyofnqkfl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8fy7jfgzbbtzyofnqkfl.png" alt="Create Account" width="577" height="584"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Verify Your Account and Log In Securely
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi1wwp3fc73vf1m7eaf94.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi1wwp3fc73vf1m7eaf94.png" alt="Verify your account" width="800" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Click "Verify Email Address"&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After clicking, you will be redirected to a web page.
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Enter Your Phone Number&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;On the web page, you will see a phone number input screen.
&lt;/li&gt;
&lt;li&gt;Enter your phone number, and you will receive an SMS with a verification code.
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Enter the Verification Code&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input the received verification code on the web page to complete your account creation.
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Enable Extra Security (Optional)&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you select &lt;em&gt;"Repeat this step whenever I log in from an unusual device or location,"&lt;/em&gt;
SMS verification will be required each time you log in from a new device or IP address.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Vonage API Dashboard
&lt;/h4&gt;

&lt;p&gt;Once you are registered and authenticated, you will access to Vonage API dashboard that highlights all the capabilities you as a brand-new API developer can achieve. Let's take a quick look on the Dashboard details:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F48nhl66prdw8keu3s3n2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F48nhl66prdw8keu3s3n2.png" alt="Vonage API Dashboard" width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Description:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Vonage Credit:&lt;/strong&gt; Did I say Vonage is free? It was true! &lt;strong&gt;BUT&lt;/strong&gt; until certain point. Think about this like an AWS Free credits benefit for API Gateway, based of course on # of API calls. Once you created your account, you will be granted a $10 Credit* to start playing with existing APIs, create Applications that use Vonage API and even setup integration with other vendors e.g. AWS, Azure, OpenAI among others.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;API Key and API key:&lt;/strong&gt; These are your unique ID to authenticate your account while testing Vonage APIs, so do not disclose to anyone unless you want to give away your free credit, or your whole wallet.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Troubleshoot &amp;amp; Learn:&lt;/strong&gt; As it name implies, this is the section we will dive deep in this lab as it includes the 2 APIs we are going to work with: "Send a SMS"  and "Verify User"&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Vonage SMS API
&lt;/h4&gt;

&lt;p&gt;By utilizing SMS, you can reduce the volume of incoming calls.  &lt;/p&gt;

&lt;p&gt;Benefits:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Various System Integrations&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 API call = 1 SMS sent → Easily integrates with other systems
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Domestic Delivery Routes&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Redundancy with multiple suppliers for domestic delivery routes
&lt;/li&gt;
&lt;li&gt;You can set your existing phone number as the Sender ID
&lt;/li&gt;
&lt;li&gt;To avoid delivery blocks, you can register the message with the supplier
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Delivery Feedback (Delivery Receipt "DLR")&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can instantly check delivery results
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Global Support&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compliance with regulations in each country and application support
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Reporting&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Usage and delivery reports, log search, and management screen available
&lt;/li&gt;
&lt;li&gt;Custom report development is possible
&lt;/li&gt;
&lt;/ul&gt;

&lt;h6&gt;
  
  
  SMS API - GUI Implementation:
&lt;/h6&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8kjl73qmy1a9fzx9yyko.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8kjl73qmy1a9fzx9yyko.png" alt="SMS API" width="800" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;SMS API call is quite straightforward, and your only concern will be how many characters you can use and if any special character is not allowed. Find below an example of a success SMS API call:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpx60biekqc9ec3lzyl2f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpx60biekqc9ec3lzyl2f.png" alt="SMS API 200" width="425" height="636"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h6&gt;
  
  
  SMS API - Code Implementation:
&lt;/h6&gt;

&lt;p&gt;To implement this using your favorite programming language, please refer to the following code. There are quite a few options to choose, but I will use Python for sake of easiness:&lt;/p&gt;

&lt;p&gt;Install the library&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install vonage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Initialize the library&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;client = vonage.Client(key="XXXXX", secret="YYYYYYY")
sms = vonage.Sms(client)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Write the code&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;responseData = sms.send_message(
    {
        "from": "Vonage APIs",
        "to": "817014166666",
        "text": "Hello from https://dev.to/mgonzalezo",
    }
)

if responseData["messages"][0]["status"] == "0":
    print("Message sent successfully.")
else:
    print(f"Message failed with error: {responseData['messages'][0]['error-text']}")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Vonage Number verification API
&lt;/h4&gt;

&lt;p&gt;The Verify API allows you to send a PIN to a user's phone and confirm its receipt. It can be used for authentication and fraud prevention, including two-factor authentication, passwordless login, and phone number verification.&lt;/p&gt;

&lt;h6&gt;
  
  
  Number verification API - GUI implementation:
&lt;/h6&gt;

&lt;p&gt;You start by selecting the PIN length and send the verification SMS:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmpvbjd6ce4xujos0pprp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmpvbjd6ce4xujos0pprp.png" alt="Verify Number API" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Notice there are 2 channels to notify the PIN Code, through SMS and up to 2 phone calls. This is possible due to the API design and specifications to include these alternatives&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2t00kil3xqd748zf1qeb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2t00kil3xqd748zf1qeb.png" alt="Verify Number API 2" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After PIN code is entered and verification is successful, you will get a report of the credits consumed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flf3rzwmvaia0dvv2x6dv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flf3rzwmvaia0dvv2x6dv.png" alt="Verify Number API 3" width="800" height="666"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h6&gt;
  
  
  Number verification API - Code Implementation:
&lt;/h6&gt;

&lt;p&gt;Install the library&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install vonage
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Initialize the Library&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;client = vonage.Client(key="xxxxx", secret="yyyy")
verify = vonage.Verify(client)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make a verification request&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response = verify.start_verification(number="817014166666", brand="AcmeInc")

if response["status"] == "0":
    print("Started verification request_id is %s" % (response["request_id"]))
else:
    print("Error: %s" % response["error_text"])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check the request with a code&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response = verify.check(REQUEST_ID, code=CODE)

if response["status"] == "0":
    print("Verification successful, event_id is %s" % (response["event_id"]))
else:
    print("Error: %s" % response["error_text"])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cancel The Request&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response = verify.cancel(REQUEST_ID)

if response["status"] == "0":
    print("Cancellation successful")
else:
    print("Error: %s" % response["error_text"])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What is next?
&lt;/h3&gt;

&lt;p&gt;For you, eager developer, the next step is to review the documentation for each of these 2 APIs and start thinking new integrations or variations for your own use case!&lt;/p&gt;

&lt;p&gt;SMS API:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developer.vonage.com/en/messaging/sms/overview" rel="noopener noreferrer"&gt;https://developer.vonage.com/en/messaging/sms/overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developer.vonage.com/en/api/sms" rel="noopener noreferrer"&gt;https://developer.vonage.com/en/api/sms&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Number Verification API:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developer.vonage.com/en/verify/overview" rel="noopener noreferrer"&gt;https://developer.vonage.com/en/verify/overview&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developer.vonage.com/en/api/camara/number-verification#verifyNumberVerification" rel="noopener noreferrer"&gt;https://developer.vonage.com/en/api/camara/number-verification#verifyNumberVerification&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Create your own API using AWS?&lt;/p&gt;

&lt;p&gt;I don't want to re-invent the wheel, so please check this interesting blog entry by Raktim Midya about Rest API implementation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/geekculture/provision-resources-in-aws-using-your-own-rest-api-cc54b390a71f" rel="noopener noreferrer"&gt;https://medium.com/geekculture/provision-resources-in-aws-using-your-own-rest-api-cc54b390a71f&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Final words
&lt;/h3&gt;

&lt;p&gt;That's a wrap! I hope you enjoy implementing these use cases and exploring more about Open API technologies.&lt;/p&gt;

&lt;p&gt;Happy Learning!&lt;/p&gt;

</description>
      <category>api</category>
      <category>aws</category>
      <category>telecom</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Troubleshoot your OpenAI integration - 101</title>
      <dc:creator>Marco Gonzalez</dc:creator>
      <pubDate>Wed, 11 Sep 2024 07:38:14 +0000</pubDate>
      <link>https://dev.to/aws-builders/troubleshoot-your-openai-integration-101-2ljj</link>
      <guid>https://dev.to/aws-builders/troubleshoot-your-openai-integration-101-2ljj</guid>
      <description>&lt;p&gt;Hey everyone!&lt;/p&gt;

&lt;p&gt;In this tutorial, I'm going to walk you through how to troubleshoot various scenarios when integrating your backend application with OpenAI's Language Model (LLM) solution.&lt;/p&gt;

&lt;h4&gt;
  
  
  Important Note:
&lt;/h4&gt;

&lt;p&gt;For this guide, I'll be using Cloud AI services as an example. However, the steps and tips I'll share are applicable to any cloud provider you might be using. So, let's dive in!&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
Tools to use

&lt;ol&gt;
&lt;li&gt;Visual Studio Code&lt;/li&gt;
&lt;li&gt;Postman&lt;/li&gt;
&lt;li&gt;
Postman Installation

&lt;ol&gt;
&lt;li&gt;Step 1: Download the Postman App&lt;/li&gt;
&lt;li&gt;Step 2: Install Postman&lt;/li&gt;
&lt;li&gt;Step 3: Launch Postman&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;/ol&gt;

&lt;/li&gt;

&lt;li&gt;

Troubleshooting

&lt;ol&gt;
&lt;li&gt;
Troubleshooting API Integration - Multimodal Model

&lt;ol&gt;
&lt;li&gt;Step 0: Collect OpenAI related information&lt;/li&gt;
&lt;li&gt;Step 1: Verify Correct Endpoint&lt;/li&gt;
&lt;li&gt;Step 2: Understand Body Configuration&lt;/li&gt;
&lt;li&gt;Step 3: Test OpenAI Endpoint&lt;/li&gt;
&lt;li&gt;Step 4: Test OpenAI Endpoint - VSC&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;li&gt;Troubleshooting API Integration - Embedding Model&lt;/li&gt;

&lt;/ol&gt;

&lt;/li&gt;

&lt;li&gt;Useful Links&lt;/li&gt;

&lt;/ol&gt;

&lt;h2&gt;
  
  
  Tools to use
&lt;/h2&gt;

&lt;p&gt;For this tutorial, I will use the following tools and Information:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visual Studio Code&lt;/li&gt;
&lt;li&gt;Postman&lt;/li&gt;
&lt;li&gt;Azure AI Service

&lt;ul&gt;
&lt;li&gt;Azure OpenAI

&lt;ul&gt;
&lt;li&gt;Endpoint&lt;/li&gt;
&lt;li&gt;API Key&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  Visual Studio Code
&lt;/h4&gt;

&lt;p&gt;Visual Studio Code (VS Code) is a powerful and versatile code editor developed by Microsoft. 🖥️ It supports various programming languages and comes equipped with features like debugging, intelligent code completion, and extensions for enhanced functionality. 🛠️ VS Code's lightweight design and customization options make it popular among developers worldwide. 🌍&lt;/p&gt;

&lt;h4&gt;
  
  
  Postman
&lt;/h4&gt;

&lt;p&gt;Postman is a popular software tool that allows developers to build, test, and modify APIs. It provides a user-friendly interface for sending requests to web servers and viewing responses, making it easier to understand and debug the interactions between client applications and backend APIs. Postman supports various HTTP methods and functionalities, which helps in creating more efficient and effective API solutions.&lt;/p&gt;

&lt;h4&gt;
  
  
  Postman Installation
&lt;/h4&gt;

&lt;h5&gt;
  
  
  Step 1: Download the Postman App
&lt;/h5&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Visit the Postman Website&lt;/strong&gt;: Open your web browser and go to the &lt;a href="https://www.postman.com/" rel="noopener noreferrer"&gt;Postman website&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Navigate to Downloads&lt;/strong&gt;: Click on the "Download" option from the main menu, or scroll to the "Downloads" section on the Postman homepage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Select the Windows Version&lt;/strong&gt;: Choose the appropriate version for your Windows architecture (32-bit or 64-bit). If you are unsure, 64-bit is the most common for modern computers.&lt;/li&gt;
&lt;/ol&gt;

&lt;h5&gt;
  
  
  Step 2: Install Postman
&lt;/h5&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Run the Installer&lt;/strong&gt;: Once the download is complete, open the executable file (&lt;code&gt;Postman-win64-&amp;lt;version&amp;gt;-Setup.exe&lt;/code&gt; for 64-bit) to start the installation process.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Follow the Installation Wizard&lt;/strong&gt;: The installer will guide you through the necessary steps. You can choose the default settings, which are suitable for most users.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Finish Installation&lt;/strong&gt;: After the installation is complete, Postman will be installed on your machine. You might find a shortcut on your desktop or in your start menu.&lt;/li&gt;
&lt;/ol&gt;

&lt;h5&gt;
  
  
  Step 3: Launch Postman
&lt;/h5&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Open Postman&lt;/strong&gt;: Click on the Postman icon from your desktop or search for Postman in your start menu and open it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sign In or Create an Account&lt;/strong&gt;: When you first open Postman, you’ll be prompted to sign in or create a new Postman account. This step is optional but recommended for syncing your data across devices and with the Postman cloud.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fztufsqksa3j3rvbnyjwu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fztufsqksa3j3rvbnyjwu.png" alt="Postman" width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Troubleshooting API Integration - Multimodal Model
&lt;/h3&gt;

&lt;p&gt;To start troubleshooting API integration, I will refer to the following common error messages while verifying the integration:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;Resource Not Found&lt;/code&gt; Error&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Timeout&lt;/code&gt; Error&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Incorrect API key provided&lt;/code&gt; Error&lt;/li&gt;
&lt;/ol&gt;

&lt;h5&gt;
  
  
  Step 0: Collect OpenAI related information
&lt;/h5&gt;

&lt;p&gt;Let's retrieve the following information before starting our troubleshooting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI Endpoint = &lt;code&gt;https://[endpoint_url]/openai/deployments/[deployment_name]/chat/completions?api-version=[OpenAI_version]&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;OpenAI API Key = &lt;code&gt;API_KEY&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;OpenAI version = &lt;code&gt;[OpenAI_version]&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h5&gt;
  
  
  Step 1: Verify Correct Endpoint
&lt;/h5&gt;

&lt;p&gt;Let's review the OpenAI Endpoint we will use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://[endpoint_url]/openai/deployments/[deployment_name]/chat/completions?api-version=[OpenAI_version]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h6&gt;
  
  
  URL Breakdown
&lt;/h6&gt;

&lt;h6&gt;
  
  
  # 1. Protocol: &lt;code&gt;https&lt;/code&gt;
&lt;/h6&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Description&lt;/strong&gt;: This protocol (&lt;code&gt;https&lt;/code&gt;) stands for HyperText Transfer Protocol Secure, representing a secure version of HTTP. It uses encryption to protect the communication between the client and server.&lt;/li&gt;
&lt;/ul&gt;

&lt;h6&gt;
  
  
  # 2. Host: &lt;code&gt;[endpoint_url]&lt;/code&gt;
&lt;/h6&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Description&lt;/strong&gt;: This part indicates the domain or endpoint where the service is hosted, serving as the base address for the API server. The &lt;code&gt;[endpoint_url]&lt;/code&gt; is a placeholder, replaceable by the actual server domain or IP address.&lt;/li&gt;
&lt;/ul&gt;

&lt;h6&gt;
  
  
  # 3. Path: &lt;code&gt;/openai/deployments/[deployment_name]/chat/completions&lt;/code&gt;
&lt;/h6&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Description&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/openai&lt;/code&gt;: This segment signifies the root directory or base path for the API, related specifically to OpenAI services.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/deployments&lt;/code&gt;: This indicates that the request targets specific deployment features of the services.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/[deployment_name]&lt;/code&gt;: A placeholder for the name of the deployment you're interacting with, replaceable with the actual deployment name.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/chat/completions&lt;/code&gt;: Suggests that the API call is for obtaining text completions within a chat or conversational context.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h6&gt;
  
  
  # 4. Query: &lt;code&gt;?api-version=[OpenAI_version]&lt;/code&gt;
&lt;/h6&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Description&lt;/strong&gt;: This is the query string, beginning with &lt;code&gt;?&lt;/code&gt;, and it includes parameters that affect the request:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;api-version&lt;/code&gt;: Specifies the version of the API in use, with &lt;code&gt;[OpenAI_version]&lt;/code&gt; serving as a placeholder for the actual version number, ensuring compatibility with your application.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;We will go to "Collections" and go to API tests/POST Functional folder. Then we need to verify the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;REST API operation must be set to "POST"&lt;/li&gt;
&lt;li&gt;Endpoint should have all required values, including Endpoint_URL, Deployment_Name and API-version.&lt;/li&gt;
&lt;li&gt;API-key must be added in the "Headers" section&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Find the below image for better reference:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9glkgild28x9ty6gx58g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9glkgild28x9ty6gx58g.png" alt="Postman Setup1" width="800" height="283"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  Step 2: Understand Body Configuration
&lt;/h5&gt;

&lt;p&gt;For this example, I will use the following sample Body data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "messages": [
    {
      "role": "system",
      "content": "You are a mechanic who loves to help customers and responds in a very friendly manner to a car related questions"
    },
    {
        "role": "user",
        "content": "Please explain the role of the radiators in a car."
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h6&gt;
  
  
  Explanation of the &lt;code&gt;messages&lt;/code&gt; Array
&lt;/h6&gt;

&lt;p&gt;The &lt;code&gt;messages&lt;/code&gt; array in the provided JSON object is structured to facilitate a sequence of interactions within a chat or conversational API environment. Each entry in the array represents a distinct message, defined by its &lt;code&gt;role&lt;/code&gt; and &lt;code&gt;content&lt;/code&gt;. Here's a detailed breakdown:&lt;/p&gt;

&lt;p&gt;Message 1 🛠️&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Role&lt;/strong&gt;: &lt;code&gt;"system"&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Description&lt;/strong&gt;: This role typically signifies the application or service's backend logic. It sets the scenario or context for the conversation, directing how the interaction should proceed.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Content&lt;/strong&gt;: &lt;code&gt;"You are a mechanic who loves to help customers and responds in a very friendly manner to car related questions"&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Description&lt;/strong&gt;: The content here acts as a directive or script, informing the recipient of the message about the character they should portray — in this case, a friendly and helpful mechanic, expert in automotive issues.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Message 2 🗣️&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Role&lt;/strong&gt;: &lt;code&gt;"user"&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Description&lt;/strong&gt;: This designates a participant in the dialogue, generally a real human user or an external entity engaging with the system.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Content&lt;/strong&gt;: &lt;code&gt;"Please explain the role of the radiators in a car."&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Description&lt;/strong&gt;: This message poses a direct question intended for the character established previously (the mechanic). It seeks detailed information about the function of radiators in cars, initiating a topic-specific discussion within the established role-play scenario.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Each message in the array is crafted to foster an engaging dialogue by defining roles and providing content cues, which guide responses and interaction dynamics. This methodology is widespread in systems designed to simulate realistic conversations or provide role-based interactive experiences.&lt;/p&gt;

&lt;p&gt;Find the below image for better reference. Note that I also select format as "raw" and the content type as "JSON":&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftes83vshojvchpotd4fn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftes83vshojvchpotd4fn.png" alt="POSTMAN Setup 2" width="800" height="234"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  Step 3: Test OpenAI Endpoint
&lt;/h5&gt;

&lt;p&gt;If you have followed all above steps, you're ready to start testing your OpenAI Endpoint! Refer to the below image for the final steps and a sample result you should see.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9g6sqczqxr4wmypwewg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9g6sqczqxr4wmypwewg.png" alt="Postman_final" width="800" height="515"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h5&gt;
  
  
  Step 4: Test OpenAI Endpoint - VSC
&lt;/h5&gt;

&lt;p&gt;The following Python code replicates above steps. Feel free to use after POSTMAN tests are successful&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import requests
import json

# Define the URL of the API endpoint
url = "https://[endpoint_url]/openai/deployments/[deployment_name]/chat/completions?api-version=[OpenAI_version]"

# Define the API token
headers = {
    "api-key": "API_KEY",
    "Content-Type": "application/json"
}

# Define the JSON body of the request
data = {
    "messages": [
        {
            "role": "system",
            "content": "You are a mechanic who loves to help customers and responds in a very friendly manner to car related questions"
        },
        {
            "role": "user",
            "content": "Please explain the role of the radiators in a car."
        }
    ]
}

# Make the POST request to the API
response = requests.post(url, headers=headers, json=data)

# Check if the request was successful
if response.status_code == 200:
    # Print the response content if successful
    print("Response received:")
    print(json.dumps(response.json(), indent=4))
else:
    # Print the error message if the request was not successful
    print("Failed to get response, status code:", response.status_code)
    print("Response:", response.text)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Troubleshooting API Integration - Embedding Model
&lt;/h3&gt;

&lt;p&gt;Under preparation 🛠️🔧🚧&lt;/p&gt;

&lt;h3&gt;
  
  
  Useful Links:
&lt;/h3&gt;

&lt;p&gt;If you are using Azure AI and OpenAI LLM solutions, the following  link will help you to understand how API integration is done:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models" rel="noopener noreferrer"&gt;OpenAI models&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#chat-completions" rel="noopener noreferrer"&gt;REST API Reference&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>openai</category>
      <category>postman</category>
      <category>webdev</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
