Ntombizakhona Mabaso

for AWS Community Builders

Posted on Jan 14 • Edited on Jan 15

Describe the ML Development Cycle

#aws #ai #aipractitioner #cloud

🤖 Exam Guide: AI Practitioner
Domain 1: Fundamentals of AI and ML
📘Task Statement 1.3

🎯 Objectives

This task tests whether you understand how ML goes from Idea → Production → Ongoing Operations, what each pipeline stage does, and which AWS SageMaker features commonly map to each stage. It also checks that you can distinguish model metrics from business metrics.

1) Components of an ML Pipeline (end-to-end)

A typical ML lifecycle is iterative because you often loop back after evaluation/monitoring:

1.1 Data Collection

Gather raw data from sources such as applications, databases, logs, sensors, third-party datasets, etc.
Goal: Ensure the data represents the real production environment.

1.2 Exploratory Data Analysis (EDA)

Inspect data shape/quality and discover patterns.
Common EDA Activities:

missing values
outliers
class imbalance
correlations
label distribution

1.3 Data Pre-processing

Clean and prepare data so it can be used by algorithms.
Examples:

handling missing values
normalization/standardization
encoding categories
train/val/test split

1.4 Feature Engineering

Create or transform input variables (“features”) that improve model performance.
Examples:

aggregations (spend last 30 days),
time-based features (day of week),
text vectorization/embeddings.

1.5 Model Training

Fit a model to training data to learn parameters.
Output: a trained model artifact.

1.6 Hyperparameter Tuning

Search over hyperparameters to improve performance such as learning rate and tree depth.
Note: Model parameters are learned, hyperparameters are chosen.

1.7 Evaluation

Measure performance on validation/test data.
Compare candidates and check generalization: look for bias, drift risk, failure modes.

1.8 Deployment

Make the model available to applications (real-time endpoint or batch scoring job).

1.9 Monitoring

Track model + data behavior in production such as performance, drift, latency, errors and trigger alerts and retraining when needed.

2) Sources of ML models

You generally get models from two paths:

2.1 Open-source or Pre-trained Models

Start from a model already trained on broad data and adapt it or use it directly.
Pros: faster, cheaper, strong baseline.
_ Cons:_ may not match your domain, requires governance/licensing review.

2.2 Custom-trained models

Train using your organization’s labeled data and problem definition.
Pros: tailored to your use case.
Cons: needs data, time, ML expertise, and ongoing maintenance.

know when “use existing” vs “train custom” is appropriate based on data availability, uniqueness, and cost.

3) Methods to use a model in production

Common production serving approaches:

3.1 Managed API service

Cloud provider hosts the endpoint and handles scaling/ops.
Pros: faster to deploy, less operational burden, built-in monitoring integrations.
Cons: less low-level control, cost tradeoffs.

3.2 Self-hosted API

You run the model server yourself (e.g., on containers/VMs).
Pros: maximum control over runtime, networking, custom dependencies.
Cons: you manage scaling, reliability, patching, monitoring.

Also remember the inference mode choice:
Real-time inference → low latency per request
Batch inference → high throughput for large jobs

4) AWS Services/Features Mapped to Pipeline Stages

Pipeline stage	AWS service / feature	What it helps with
Data prep / EDA	Amazon SageMaker Data Wrangler	Visual/automated data cleaning, transformations, basic analysis workflows
Feature management	Amazon SageMaker Feature Store	Central place to store, reuse, and serve features consistently for training and inference
Training / tuning / deployment	Amazon SageMaker (SageMaker AI)	Managed environment to train models, run tuning jobs, and deploy endpoints
Monitoring	Amazon SageMaker Model Monitor	Detect data quality issues and model drift; monitor inference data vs baseline

(Just fyi. other AWS services exist, but these are the ones explicitly called out in the task objectives, you should always head over to the AWS Console and peruse, observe, experiment & build!)

5) Fundamental Concepts of MLOps

MLOps is all about applying engineering practices to ML so it can run reliably in production.

Key themes you should be able to explain:

5.1 Experimentation

Track runs:

datasets
features
model versions
hyperparameters
metrics

Goal: reproducibility and learning faster.

5.2 Repeatable Processes

Automated pipelines reduce manual errors (consistent preprocessing, training, evaluation).

5.3 Scalable Systems

Ability to train/serve as data and traffic grow.

5.4 Managing Technical Debt

ML systems can become fragile due to hidden dependencies on data, features, and environment.

Debt Examples:

undocumented preprocessing steps
inconsistent features between training/inference

5.5 Production Readiness

reliability
security
testing
rollback strategy
monitoring
documentation

5.6 Model Monitoring

Monitor input data distribution changes, prediction distribution changes, latency, errors.

5.7 Model Retraining

Refresh models when performance degrades due to drift, seasonality, or business changes.

6) Model Performance Metrics vs Business Metrics

ML / Model metrics (Technical)

Used to measure predictive performance:

1 Accuracy: % of correct predictions (can be misleading with class imbalance).
2 AUC (Area Under the ROC Curve): Measures ability to rank positives higher than negatives across thresholds, and is useful for imbalanced classification.
3 F1 score: Harmonic mean of precision and recall, and is useful when false positives and false negatives both matter.

Business Metrics (Outcome)

Used to measure whether the model is worth it:

1 Cost per user / cost per prediction
2 Development and operational costs
3 Customer feedback / satisfaction
4 ROI (Return on Investment)

A model can have strong ML metrics but fail in business terms (too expensive, too slow, harms user experience, increases risk).

💡 Quick Questions

What’s the difference between training and deployment?
Why can accuracy be misleading in fraud detection?
What problem does Feature Store help prevent?
What’s a sign you might need retraining?
Name one model metric and one business metric.

Additional Resources

✅ Answers to Quick Questions

1. Training vs deployment

Training: learning model parameters from historical data to create a trained model artifact.

Deployment: making the trained model available for use in production (e.g., as an endpoint/API or batch job) so it can serve inference.

2. Why accuracy can be misleading in fraud detection

Fraud is often rare (class imbalance). A model can predict “not fraud” for everything and still achieve very high accuracy, while being useless. Metrics like AUC, precision/recall, and F1 are often more informative.

3. What problem Feature Store helps prevent

Training/serving skew and inconsistent features (e.g., using one definition of a feature during training and a different definition in production), plus duplicated feature logic across teams.

4. One sign you might need retraining

Model performance degrades in production or monitoring shows data drift/concept drift (input distributions or real-world relationships change over time).

5. One model metric and one business metric

Model metric: F1 score (or accuracy/AUC).
Business metric: ROI (or cost per user, development/operating cost, customer satisfaction).

DEV Community