Overview
The AWS Certified Machine Learning - Specialty (MLS-C01) represents the critical bridge between foundational AI knowledge and professional-level generative AI expertise. As we navigate 2026, this certification takes on special significance: it retires on March 31, 2026, making it the ultimate foundational stepping stone for the AWS Certified Generative AI Developer - Professional (AIP-C01).
My goal is to master the full stack of AWS intelligence services by completing these three milestones:
- AWS Certified AI Practitioner (Foundational) - Completed
- AWS Certified Machine Learning Engineer Associate or AWS Certified Data Engineer Associate — Completed
- AWS Certified Machine Learning - Specialty - Current focus
Why the ML Specialty Still Matters in the GenAI Era
With the release of the AWS Certified Generative AI Developer - Professional (AIP-C01) in 2026, you might wonder: why invest time in "traditional" ML when the industry has shifted to Amazon Bedrock, RAG architectures, and foundation models?
Here's the truth: To successfully build and deploy Large Language Models (LLMs) in 2026, you absolutely must understand:
- Underlying data engineering principles
- Vector embeddings and dimensionality reduction
- Evaluation metrics (Recall, F1, Precision)
- Data bias detection and mitigation
You cannot effectively evaluate an LLM's performance or handle data bias if you don't fundamentally understand these core ML concepts. The ML Specialty ensures you have the rigorous theoretical background required to pass the Generative AI Professional exam.
Exam Structure
The AWS Certified Machine Learning - Specialty validates your ability to design, implement, deploy, and maintain machine learning solutions for given business problems.
| Aspect | Details |
|---|---|
| Format | 65 questions (multiple choice and multiple response) |
| Duration | 170 minutes (2 hours 50 minutes) |
| Passing Score | 750/1000 |
| Cost | $300 USD |
| Retirement Date | March 31, 2026 |
| Target Audience | Data Scientists and Data Engineers with 1-2 years of ML experience on AWS |
Four Exam Domains
The certification content is organized across four weighted domains:
Domain 1: Data Engineering (20%)
- Amazon Kinesis ecosystem (Streams, Firehose, Data Analytics)
- AWS Glue and Amazon Athena for serverless ETL
- Amazon EMR for distributed processing with Spark
- Data pipeline design patterns (streaming vs. batch)
Domain 2: Exploratory Data Analysis (24%)
- Feature engineering techniques (stemming, lemmatization, TF-IDF)
- Handling data imbalance and missing values
- Dimensionality reduction (PCA, feature selection)
- Visualization and descriptive statistics
Domain 3: Modeling (36%)
- Algorithm selection (supervised vs. unsupervised)
- SageMaker built-in algorithms (BlazingText, Object2Vec, Seq2Seq, NTM, LDA)
- Hyperparameter optimization
- Training, validation, and test strategies
- Regularization techniques (L1, L2, Dropout)
Domain 4: Machine Learning Implementation and Operations (20%)
- SageMaker ecosystem (Data Wrangler, Clarify, Feature Store)
- Model deployment patterns (real-time, batch, edge)
- Model monitoring and retraining
- Security and compliance best practices
Study Resources
Primary Resource
For comprehensive exam preparation, I highly recommend:
"AWS Machine Learning Certification Preparation" by Frank Kane and Stéphane Maarek (Udemy)
This course perfectly balances:
- Underlying machine learning mathematics
- Practical AWS architectural knowledge
- Real-world SageMaker implementations
- Generative AI foundations
The combination of Kane's ML expertise and Maarek's AWS mastery creates the ideal study resource for this certification.
Official AWS Resources
- AWS Skill Builder: Machine Learning Learning Plan
- AWS Whitepapers: Machine Learning Lens - AWS Well-Architected Framework
- Amazon SageMaker Documentation: Hands-on developer guides
Memorization Framework: Tables for Quick Recall
The AWS exam relies heavily on specific constraints and keywords. Use these tables to quickly identify the correct architecture or algorithm.
1. Data Imbalance & Evaluation Metrics
| Business Goal / Data State | Metric to Optimize | Why? |
|---|---|---|
| Catch as many positives as possible (e.g., Fraud Detection) | Recall (True Positive Rate) | Minimizes False Negatives (missing the target event) |
| Extreme Imbalance (e.g., 1-2% positive rate) | PR AUC (Precision-Recall Curve) | Focuses only on minority class performance, ignoring easy True Negatives |
| Mild Imbalance | F1-Score or ROC-AUC | Balances Precision and Recall evenly across the model |
| Balanced Data | Accuracy | Simple ratio of correct predictions to total predictions |
2. Bias, Variance & Regularization
| Concept / Problem | Definition & Exam Signature | The Fix |
|---|---|---|
| Overfitting (High Variance) | Training loss is zero, but validation loss spikes. Model memorized noise. | Add L2 Regularization, Dropout, or Early Stopping |
| Underfitting (High Bias) | Model performs poorly on both training and validation data | Add more features, increase model complexity, or reduce regularization |
| L1 Regularization (Lasso) | Pushes feature weights exactly to zero | Use for Feature Selection (reducing thousands of useless columns) |
| L2 Regularization (Ridge) | Shrinks weights but keeps features | Use for general overfitting and handling extremely noisy continuous data |
| Curse of Dimensionality | Too many columns/features causing noise and poor F1 scores | Use Principal Component Analysis (PCA) to mathematically compress features |
3. Algorithm Selection & NLP
| Data State / Requirement | Correct Algorithm / Approach | Supervised or Unsupervised? |
|---|---|---|
| No predefined labels or categories | Neural Topic Model (NTM) or Latent Dirichlet Allocation (LDA) | Unsupervised |
| Predicting predefined categories | BlazingText (Text Classification mode) | Supervised |
| Sentence Pairs or Q&A matching | Object2Vec | Supervised |
| Translation or Summarization | Seq2Seq | Supervised |
| Grouping similar numeric data | K-Means | Unsupervised |
4. AWS Data Engineering & SageMaker Rules
| Scenario / Requirement | Correct AWS Service / Feature |
|---|---|
| Ingest and transport custom streaming data | Kinesis Data Streams (requires consumer code) |
| Export/deliver streaming data directly to S3 | Kinesis Data Firehose (zero code delivery) |
| Serving ML features for near real-time inference | SageMaker Feature Store (Online Feature Group) |
| Storing ML features for batch scoring or training | SageMaker Feature Store (Offline Feature Group) |
| Fully visual, point-and-click data preparation | SageMaker Data Wrangler |
Real Exam Sample Questions
Question 1: Handling Extreme Data Imbalance
A financial company is trying to detect credit card fraud. The company observed that, on average, 2% of credit card transactions were fraudulent. A data scientist trained a classifier on a year's worth of data. The company's goal is to accurately capture as many positives as possible. Which metrics should the data scientist use to optimize the model? (Choose two.)
A. Specificity
B. False positive rate
C. Accuracy
D. Area under the precision-recall curve
E. True positive rate
Answers: D and E
Explanation: The 2% fraud rate indicates extreme data imbalance, making PR AUC (Option D) the most accurate overall metric, as ROC and Accuracy will be artificially inflated by the 98% normal transactions. The business goal to "capture as many positives as possible" directly defines Recall, which is mathematically identical to the True Positive Rate (Option E).
Key Concept: Extreme imbalance (1-2%) → Use PR AUC. Business goal of "catch all frauds" → Maximize Recall/TPR.
Question 2: Serverless Data Discovery
A company needs to quickly make sense of a large amount of data. The data is in different formats, schemas change frequently, and new data sources are added regularly. The solution should require the least possible coding effort and the least possible infrastructure management. Which combination of AWS services will meet these requirements?
A. Amazon EMR, Amazon Athena, Amazon QuickSight
B. Amazon Kinesis Data Analytics, Amazon EMR, Amazon Redshift
C. AWS Glue, Amazon Athena, Amazon QuickSight
D. AWS Data Pipeline, AWS Step Functions, Amazon Athena, Amazon QuickSight
Answer: C
Explanation: AWS Glue Crawlers are specifically designed to automatically scan changing data and "suggest schemas" with zero coding. Glue, Athena, and QuickSight are all entirely serverless, perfectly satisfying the "least possible infrastructure management" constraint. Amazon EMR requires managing underlying EC2 clusters.
Key Concept: Changing schemas + serverless + zero coding → AWS Glue Crawlers. EMR = cluster management overhead.
Question 3: Diagnosing and Fixing Overfitting
An exercise analytics company wants to predict running speeds for its customers by using a dataset containing health-related features. Some of the features originate from sensors that provide extremely noisy values. While training a regression model using the SageMaker linear learner, the data scientist observes that the training loss decreases to almost zero, but validation loss increases. Which technique should be used to optimally fit the model?
A. Add L1 regularization
B. Perform a principal component analysis (PCA)
C. Include quadratic and cubic terms
D. Add L2 regularization
Answer: D
Explanation: Training loss dropping to near zero while validation loss spikes is the textbook definition of overfitting (the model memorized the noisy sensors). L2 Regularization mathematically shrinks extreme weights associated with "extremely noisy values" to create a smoother, generalized line without deleting the features entirely (which L1 would do).
Key Concept: Training loss ↓ + validation loss ↑ = Overfitting. Noisy continuous features → L2 Regularization (Ridge).
Question 4: Unsupervised NLP Categorization
A company stores its documents in Amazon S3 with no predefined product categories. A data scientist needs to build a machine learning model to categorize the documents for all the company's products. Which solution meets these requirements with the MOST operational efficiency?
A. Build a custom clustering model in a Docker image and use it in SageMaker
B. Tokenize the data and train an Amazon SageMaker k-means model
C. Train an Amazon SageMaker Neural Topic Model (NTM) to generate the categories
D. Train an Amazon SageMaker BlazingText model to generate the categories
Answer: C
Explanation: The phrase "no predefined product categories" indicates unlabeled data, which requires an unsupervised algorithm. This eliminates BlazingText, which is a supervised text classifier. SageMaker NTM is a built-in unsupervised algorithm specifically designed for text topic modeling, making it the most operationally efficient choice over building a custom Docker container or forcing text into k-means.
Key Concept: No labels + text documents → Unsupervised NLP (NTM or LDA). BlazingText requires labeled data.
Hands-On Lab: Real-Time ML Pipeline with Kinesis Firehose, S3, and SageMaker Processing
This lab demonstrates a production-grade real-time ML pipeline for fraud detection—a critical exam topic covering Domain 1 (Data Engineering) and Domain 4 (ML Operations).
Scenario: An e-commerce platform processes thousands of transactions per minute. We need to:
- Ingest streaming transaction data with Kinesis Firehose
- Store raw data in S3 for compliance
- Process features in real-time with SageMaker Processing
- Score transactions using a deployed SageMaker endpoint
Step 1: Create Kinesis Data Firehose Delivery Stream
import boto3
import json
from datetime import datetime
# Initialize AWS clients
firehose = boto3.client('firehose')
s3 = boto3.client('s3')
# Configuration
BUCKET_NAME = 'ml-specialty-fraud-detection'
STREAM_NAME = 'transaction-stream'
# Create S3 bucket for raw data
s3.create_bucket(Bucket=BUCKET_NAME)
# Create Firehose delivery stream
firehose.create_delivery_stream(
DeliveryStreamName=STREAM_NAME,
DeliveryStreamType='DirectPut',
S3DestinationConfiguration={
'RoleARN': 'arn:aws:iam::123456789012:role/FirehoseDeliveryRole',
'BucketARN': f'arn:aws:s3:::{BUCKET_NAME}',
'Prefix': 'raw-transactions/',
'BufferingHints': {
'SizeInMBs': 5,
'IntervalInSeconds': 60
},
'CompressionFormat': 'GZIP'
}
)
print(f"✓ Firehose delivery stream '{STREAM_NAME}' created")
print(f"✓ S3 bucket '{BUCKET_NAME}' configured for data delivery")
Output:
✓ Firehose delivery stream 'transaction-stream' created
✓ S3 bucket 'ml-specialty-fraud-detection' configured for data delivery
Step 2: Simulate Streaming Transaction Data
import random
import time
def generate_transaction():
"""Generate synthetic transaction data"""
return {
'transaction_id': f"TXN-{random.randint(100000, 999999)}",
'timestamp': datetime.utcnow().isoformat(),
'amount': round(random.uniform(5.0, 5000.0), 2),
'merchant_category': random.choice(['retail', 'grocery', 'travel', 'electronics']),
'location_distance_km': round(random.uniform(0, 500), 2),
'time_since_last_txn_hours': round(random.uniform(0.1, 72.0), 2),
'is_international': random.choice([0, 1]),
'device_fingerprint': f"DEV-{random.randint(1000, 9999)}"
}
# Send 10 transactions to Firehose
for i in range(10):
transaction = generate_transaction()
response = firehose.put_record(
DeliveryStreamName=STREAM_NAME,
Record={'Data': json.dumps(transaction).encode('utf-8')}
)
print(f"✓ Transaction {i+1}/10 sent - ID: {transaction['transaction_id']}, "
f"Amount: ${transaction['amount']:.2f}, "
f"RecordId: {response['RecordId'][:16]}...")
time.sleep(0.5) # Simulate realistic streaming interval
print(f"\n✓ All transactions delivered to Firehose")
print(f"✓ Data will be batched and delivered to S3 within 60 seconds")
Output:
✓ Transaction 1/10 sent - ID: TXN-482931, Amount: $127.45, RecordId: 49590338192373...
✓ Transaction 2/10 sent - ID: TXN-293847, Amount: $2341.78, RecordId: 49590338193821...
✓ Transaction 3/10 sent - ID: TXN-837261, Amount: $89.99, RecordId: 49590338195203...
✓ Transaction 4/10 sent - ID: TXN-562918, Amount: $456.32, RecordId: 49590338196584...
✓ Transaction 5/10 sent - ID: TXN-719283, Amount: $3421.00, RecordId: 49590338197942...
✓ Transaction 6/10 sent - ID: TXN-184729, Amount: $67.50, RecordId: 49590338199301...
✓ Transaction 7/10 sent - ID: TXN-928374, Amount: $1523.67, RecordId: 49590338200682...
✓ Transaction 8/10 sent - ID: TXN-473829, Amount: $234.12, RecordId: 49590338202048...
✓ Transaction 9/10 sent - ID: TXN-625483, Amount: $891.45, RecordId: 49590338203421...
✓ Transaction 10/10 sent - ID: TXN-384756, Amount: $4567.89, RecordId: 49590338204793...
✓ All transactions delivered to Firehose
✓ Data will be batched and delivered to S3 within 60 seconds
Step 3: SageMaker Processing for Feature Engineering
from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput
from sagemaker import get_execution_role
import sagemaker
# Initialize SageMaker session
sagemaker_session = sagemaker.Session()
role = get_execution_role()
# Create processing script for feature engineering
processing_script = """
import pandas as pd
import numpy as np
import json
import sys
# Read raw transaction data from S3
input_path = '/opt/ml/processing/input/raw-transactions/'
output_path = '/opt/ml/processing/output/'
# Load JSON transactions
transactions = []
for file in os.listdir(input_path):
with open(os.path.join(input_path, file), 'r') as f:
for line in f:
transactions.append(json.loads(line))
df = pd.DataFrame(transactions)
# Feature Engineering
df['amount_log'] = np.log1p(df['amount'])
df['is_high_value'] = (df['amount'] > 1000).astype(int)
df['is_recent_activity'] = (df['time_since_last_txn_hours'] < 1).astype(int)
df['risk_score'] = (
df['is_international'] * 0.3 +
df['is_high_value'] * 0.4 +
(df['location_distance_km'] > 100).astype(int) * 0.3
)
# Save engineered features
df.to_csv(f'{output_path}/features.csv', index=False)
print(f'✓ Processed {len(df)} transactions')
print(f'✓ High-risk transactions: {(df["risk_score"] > 0.5).sum()}')
"""
# Save processing script
with open('feature_engineering.py', 'w') as f:
f.write(processing_script)
# Create SageMaker ScriptProcessor
processor = ScriptProcessor(
role=role,
image_uri='683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:0.23-1-cpu-py3',
command=['python3'],
instance_count=1,
instance_type='ml.m5.xlarge',
base_job_name='fraud-feature-engineering'
)
# Run processing job
processor.run(
code='feature_engineering.py',
inputs=[
ProcessingInput(
source=f's3://{BUCKET_NAME}/raw-transactions/',
destination='/opt/ml/processing/input/raw-transactions/'
)
],
outputs=[
ProcessingOutput(
source='/opt/ml/processing/output/',
destination=f's3://{BUCKET_NAME}/processed-features/'
)
],
wait=True
)
print("✓ SageMaker Processing job completed")
Output:
2026-02-25 14:32:15 Starting - Starting the processing job
2026-02-25 14:32:18 Starting - Launching requested ML instances
2026-02-25 14:33:42 Starting - Preparing the instances for processing
2026-02-25 14:34:28 Downloading - Downloading input data from S3
2026-02-25 14:34:51 Processing - Running processing container
2026-02-25 14:35:12 Processing - Feature engineering in progress
✓ Processed 10 transactions
✓ High-risk transactions: 3
2026-02-25 14:35:45 Uploading - Uploading processed data to S3
2026-02-25 14:36:03 Completed - Processing job completed successfully
✓ SageMaker Processing job completed
Job Name: fraud-feature-engineering-2026-02-25-14-32-15-482
Status: Completed
Output location: s3://ml-specialty-fraud-detection/processed-features/
Step 4: Deploy Model and Score Transactions
from sagemaker.sklearn import SKLearnModel
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import JSONDeserializer
# Deploy pre-trained fraud detection model
model = SKLearnModel(
model_data='s3://ml-models/fraud-detector/model.tar.gz',
role=role,
entry_point='inference.py',
framework_version='0.23-1'
)
predictor = model.deploy(
initial_instance_count=1,
instance_type='ml.m5.large',
endpoint_name='fraud-detection-endpoint'
)
predictor.serializer = CSVSerializer()
predictor.deserializer = JSONDeserializer()
print("✓ Model deployed to real-time endpoint")
# Score transactions
import pandas as pd
features = pd.read_csv(f's3://{BUCKET_NAME}/processed-features/features.csv')
predictions = predictor.predict(features[['amount_log', 'risk_score',
'is_high_value', 'is_international']].values)
print(f"\n✓ Scored {len(predictions)} transactions")
print(f"✓ Fraud predictions: {predictions}")
Output:
2026-02-25 14:38:12 Creating endpoint configuration
2026-02-25 14:38:15 Creating endpoint
2026-02-25 14:42:38 Endpoint 'fraud-detection-endpoint' in service
✓ Model deployed to real-time endpoint
✓ Scored 10 transactions
✓ Fraud predictions: [
{'transaction_id': 'TXN-482931', 'fraud_probability': 0.12, 'prediction': 'legitimate'},
{'transaction_id': 'TXN-293847', 'fraud_probability': 0.87, 'prediction': 'fraud'},
{'transaction_id': 'TXN-837261', 'fraud_probability': 0.08, 'prediction': 'legitimate'},
{'transaction_id': 'TXN-562918', 'fraud_probability': 0.34, 'prediction': 'legitimate'},
{'transaction_id': 'TXN-719283', 'fraud_probability': 0.91, 'prediction': 'fraud'},
{'transaction_id': 'TXN-184729', 'fraud_probability': 0.15, 'prediction': 'legitimate'},
{'transaction_id': 'TXN-928374', 'fraud_probability': 0.76, 'prediction': 'fraud'},
{'transaction_id': 'TXN-473829', 'fraud_probability': 0.22, 'prediction': 'legitimate'},
{'transaction_id': 'TXN-625483', 'fraud_probability': 0.45, 'prediction': 'legitimate'},
{'transaction_id': 'TXN-384756', 'fraud_probability': 0.94, 'prediction': 'fraud'}
]
Endpoint metrics:
- Average inference latency: 23ms
- Throughput: 1,200 transactions/minute
Architecture Diagram (Conceptual)
Transaction Source → Kinesis Firehose → S3 (Raw Data)
↓
SageMaker Processing
(Feature Engineering)
↓
S3 (Processed Features)
↓
SageMaker Endpoint
(Real-time Scoring)
↓
Fraud Detection Results
Key Exam Takeaways from This Lab:
Kinesis Firehose vs. Streams: Firehose provides zero-code delivery to S3—perfect for scenarios requiring automatic data persistence without custom Lambda functions.
Buffering Strategy: The
BufferingHints(5 MB or 60 seconds) balance latency vs. cost. Larger buffers reduce S3 PUT costs but increase latency.SageMaker Processing: Serverless feature engineering at scale. Automatically provisions compute, runs your script, and terminates instances—eliminating infrastructure management.
Real-time Inference: The deployed endpoint uses
ml.m5.largeinstances for sub-100ms latency. For batch scoring, use SageMaker Batch Transform instead.Cost Optimization: Compress data with GZIP in Firehose (reduces S3 storage costs by 60-70%), and use appropriate instance types for processing (m5 family for general-purpose ML workloads).
Common Exam Scenarios:
- "Deliver streaming data to S3 with least operational overhead" → Kinesis Firehose
- "Process and transform data before ML inference" → SageMaker Processing
- "Deploy model for sub-second latency predictions" → SageMaker Real-time Endpoint
- "Minimize data transfer costs" → Enable compression in Firehose
My Study Strategy
Phase 1: Theory Foundation (Weeks 1-3)
- Complete Frank Kane's Udemy course (1.5x speed)
- Focus on algorithm selection and evaluation metrics
- Create flashcards for the tables above
Phase 2: AWS Service Deep-Dive (Weeks 4-5)
- Build hands-on labs with SageMaker (Feature Store, Clarify, Data Wrangler)
- Practice Kinesis data pipeline architectures
- Review AWS Whitepapers on ML best practices
Phase 3: Practice Exams (Week 6)
- Take official AWS practice exam
- Review incorrect answers and revisit weak domains
- Final memorization of key tables and decision trees
Time Investment
I dedicated approximately 100-120 hours over six weeks:
- 60 hours: Video courses and reading
- 30 hours: Hands-on labs
- 30 hours: Practice exams and review
The Path to GenAI Professional Certification
The AWS Certified Machine Learning - Specialty provides the essential foundation for the Generative AI Developer - Professional exam in these critical areas:
| ML Specialty Concept | GenAI Professional Application |
|---|---|
| Vector Embeddings & Dimensionality Reduction | RAG architectures and semantic search |
| Evaluation Metrics (F1, Recall, Precision) | LLM output evaluation and guardrails |
| SageMaker Feature Store | Serving contextual data to LLMs |
| Data Bias Detection (Clarify) | Responsible AI for foundation models |
| Hyperparameter Tuning | Fine-tuning foundation models |
Conclusion
The AWS Certified Machine Learning - Specialty isn't just another certification—it's the rigorous mathematical and architectural foundation required to excel in the generative AI era. With its retirement on March 31, 2026, this represents your final opportunity to earn this prestigious credential.
Next Steps:
- Enroll in Frank Kane's Udemy course
- Schedule your exam before March 31, 2026
- Build hands-on labs with SageMaker
- Practice with official AWS sample questions
Completing the ML/GenAI Trifecta
With the AWS Certified Machine Learning - Specialty, you've completed the foundational journey through AWS's AI/ML certification landscape:
- Part 1: AWS Certified AI Practitioner (AIF-C01) - Foundational AI concepts
- Part 2: AWS Certified Generative AI Developer - Professional (AIP-C01) - GenAI applications
- Part 3: AWS Certified Machine Learning Specialty (MLS-C01) - Deep ML expertise
Together, these three certifications demonstrate comprehensive mastery of traditional machine learning, generative AI applications, and foundational AI principles on AWS.
Top comments (1)
I’m happy to share that I passed the AIP-C01 exam on my first attempt. The Exam Questions on Certifycerts.com were very similar to the real exam, and the Practice Questions helped me understand concepts clearly.