Francisco Escobar

Posted on Sep 18

AWS Certified AI Practitioner (AIF-C01) Study Guide

Francisco Escobar · 2025-09-18T20:50:50Z

Recommended Prerequisites: Familiarity with key AWS services like Amazon EC2, Amazon S3, AWS Lambda, and Amazon SageMaker Understanding of the AWS shared responsibility model Familiarity with AWS Identity and Access Management (IAM) for security Knowledge of AWS global infrastructure (Regions, Availability Zones) Familiarity with AWS service pricing models Exam Details Code: AIF-C01 Duration: 90 minutes Cost: $100 USD Number of Questions: 65 total questions (50 scored, 15 unscored for future evaluation) Scoring: Scale of 100-1000, minimum passing score: 700 Result: Pass/Fail Question Types Multiple choice: One correct answer and three incorrect distractors Multiple response: Two or more correct answers from five or more options Ordering: Place 3-5 answers in the correct order to complete a task Matching: Match answers to a list of 3-7 requests Case study: A scenario with two or more related questions Exam Domains Domain 1: AI and ML Fundamentals (20% of exam) 1.1 Explain basic AI concepts and terminology Key Definitions: Artificial Intelligence (AI): A field of computer science focused on creating systems capable of performing tasks that mimic human intelligence Machine Learning (ML): A subset of AI that teaches computers to learn from data to improve performance without explicit programming Deep Learning: A subset of ML that uses artificial neural networks (ANNs) with multiple layers Generative AI: A subset of deep learning that produces new data (text, images, audio, synthetic data) Large Language Model (LLM): Deep learning models trained on massive volumes of text data Types of Inference: Batch inference: Processing large volumes of data at once, prioritizing efficiency over speed Real-time inference: Fast processing for applications requiring immediate responses Data Types: Labeled and unlabeled: With or without assigned categories Structured: Tabular data Semi-structured: With metadata like JSON or XML Unstructured: Without defined model (text, images, videos) Machine Learning Paradigms: Supervised Learning: Model trained with labeled data (classification and regression) Unsupervised Learning: Finds patterns in unlabeled data (clustering, dimensionality reduction) Reinforcement Learning: Agent learns through trial and error with rewards/penalties 1.2 Identify practical AI use cases When to use AI/ML: Assist human decision-making Scale solutions Automate repetitive tasks When NOT to use AI/ML: Unfavorable cost-benefit analysis Deterministic solutions required Highly regulated areas requiring strict explainability AWS Managed AI/ML Services: Amazon SageMaker: Comprehensive platform for building, training, and deploying ML models Amazon Transcribe: Speech-to-text (ASR) Amazon Translate: Neural machine translation Amazon Comprehend: NLP for extracting insights from text Amazon Lex: Conversational interfaces (chatbots) Amazon Polly: Text-to-speech (TTS) 1.3 Describe the ML development lifecycle ML Pipeline Components: Data collection Exploratory data analysis (EDA) Data preprocessing Feature engineering Model training Hyperparameter tuning Model evaluation Deployment Monitoring Deployment Methods: Managed API service: Amazon SageMaker endpoints Self-hosted API: Deploy on own servers (Amazon EC2) AWS Services by Stage: Data preparation: Amazon SageMaker Data Wrangler, Feature Store Build and train: Amazon SageMaker Notebooks, Model Training Deploy and monitor: Amazon SageMaker Pipelines, Model Monitor Domain 2: Generative AI Fundamentals (24% of exam) 2.1 Explain basic generative AI concepts Fundamental Concepts: Tokens: Smallest units of text a model can process Embeddings: Numerical representations that capture semantic meaning Prompt Engineering: Designing inputs to guide the model Foundation Models: Large-scale ML models pre-trained on massive datasets Multimodal Models: Process multiple data types (text, images, audio) Diffusion Models: Create realistic data by reversing a noise-adding process Use Cases: Image, video, and audio generation Text summarization Chatbots Translation Code generation Search and recommendation engines 2.2 Understand capabilities and limitations of generative AI Advantages: Adaptability to diverse tasks Real-time response capability Simplicity in automating content generation Disadvantages and Challenges: Hallucinations: Incorrect information that appears plausible Interpretability: "Black box" models Inaccuracy and lack of determinism Toxicity: Potential for offensive content 2.3 Describe AWS infrastructure and technologies for generative AI AWS Services: Amazon Bedrock: Access to foundation models through API Amazon SageMaker JumpStart: ML hub with pre-trained models Amazon Q: Generative AI assistant for businesses PartyRock: Amazon Bedrock playground Domain 3: Foundation Model Applications (28% of exam) 3.1 Design considerations for foundation model applications Model Selection Criteria: Cost: Subscription prices, compute resources Modality: Supported data types Latency: Response speed Model size and complexity Input/output length Inference Parameters: Temperature: Controls randomness (high = creative, low = deterministic) Retrieval-Augmented Generation (RAG): Optimizes LLM output by referencing external knowledge base Combines information retrieval with text generation Knowledge Bases for Amazon Bedrock is a RAG implementation Vector Databases: Store embeddings for fast similarity searches AWS services: OpenSearch Service, Aurora, Neptune, DocumentDB, RDS PostgreSQL Foundation Model Customization: Pre-training: Extremely expensive, from scratch Fine-tuning: Adapt pre-trained model, less expensive In-context learning: Guide with examples in prompt, cost-effective RAG: Balanced approach, updates knowledge without retraining 3.2 Choose effective prompt engineering techniques Prompting Techniques: Zero-shot: No prior examples One-shot / Few-shot: With one or several examples Chain-of-thought: Step-by-step reasoning Risks: Prompt injection: Malicious prompt manipulation Jailbreak: Bypassing security filters Poisoning: Malicious data degrading performance 3.3 Describe foundation model training and fine-tuning process Fine-tuning Methods: Instruction tuning: With instruction-response examples Domain-specific adaptation: Data from particular field RLHF (Reinforcement Learning from Human Feedback): Align with human preferences 3.4 Describe methods for evaluating foundation model performance Evaluation Approaches: Human evaluation: Gold standard but expensive Benchmark datasets: GLUE, SuperGLUE Metrics: ROUGE: For text summarization BLEU: For machine translation BERTScore: Semantic similarity with BERT embeddings Domain 4: Guidelines for Responsible AI (14% of exam) 4.1 Explain responsible AI system development Responsible AI Characteristics: Fairness Inclusivity Robustness Security Veracity Transparency Governance Bias and Fairness: Types: algorithmic, data, sampling, prejudice bias Effects: unfair results, overfitting/underfitting AWS Tools: Amazon SageMaker Clarify: Detects bias and explains predictions Amazon SageMaker Model Monitor: Monitors models in production Amazon Augmented AI (A2I): Human reviews Guardrails for Amazon Bedrock: Security policies for generative AI 4.2 Recognize the importance of transparent and explainable models Transparency vs. "Black Box" Models: Transparent: easy to interpret (decision trees) Opaque: difficult to understand (deep neural networks) Tools: Amazon SageMaker Model Cards: Document model information AWS AI Service Cards: Information about responsible use Domain 5: Security, Compliance, and Governance for AI Solutions (14% of exam) 5.1 Explain methods to secure AI systems AWS Security Services: AWS IAM: Access management with principle of least privilege Encryption: At rest and in transit with AWS KMS Amazon Macie: Discovers and protects sensitive data (PII) AWS PrivateLink: Private connectivity between VPCs Shared responsibility model: AWS secures the cloud, customer secures in the cloud Data Citation and Lineage: Data lineage: Track origin and transformations Data cataloging: AWS Glue Data Catalog for organizing metadata 5.2 Recognize governance and compliance regulations for AI systems Standards: ISO: ISO/IEC 27001 (information security), ISO/IEC 42001 (AI management) SOC: Service Organization Controls AWS Compliance Services: AWS Artifact: Access to compliance reports AWS Config: Evaluates and monitors configurations AWS Audit Manager: Helps with continuous audits AWS CloudTrail: Logs account activity and API calls AWS Trusted Advisor: Optimization recommendations Governance Strategies: Data lifecycle policies Logging Data residency Monitoring Retention Exam Tips Practice with real use cases for each mentioned AWS service Understand the differences between machine learning types Get familiar with evaluation metrics and when to use each Study ethical aspects and responsible AI Know AWS security and compliance features Practice prompt engineering and understand RAG Review pricing models and model selection factors Good luck with your certification!

#aws #ai #certification #learning

Recommended Prerequisites:

Familiarity with key AWS services like Amazon EC2, Amazon S3, AWS Lambda, and Amazon SageMaker
Understanding of the AWS shared responsibility model
Familiarity with AWS Identity and Access Management (IAM) for security
Knowledge of AWS global infrastructure (Regions, Availability Zones)
Familiarity with AWS service pricing models

Exam Details

Code: AIF-C01
Duration: 90 minutes
Cost: $100 USD
Number of Questions: 65 total questions (50 scored, 15 unscored for future evaluation)
Scoring: Scale of 100-1000, minimum passing score: 700
Result: Pass/Fail

Question Types

Multiple choice: One correct answer and three incorrect distractors
Multiple response: Two or more correct answers from five or more options
Ordering: Place 3-5 answers in the correct order to complete a task
Matching: Match answers to a list of 3-7 requests
Case study: A scenario with two or more related questions

Exam Domains

Domain 1: AI and ML Fundamentals (20% of exam)

1.1 Explain basic AI concepts and terminology

Key Definitions:

Artificial Intelligence (AI): A field of computer science focused on creating systems capable of performing tasks that mimic human intelligence
Machine Learning (ML): A subset of AI that teaches computers to learn from data to improve performance without explicit programming
Deep Learning: A subset of ML that uses artificial neural networks (ANNs) with multiple layers
Generative AI: A subset of deep learning that produces new data (text, images, audio, synthetic data)
Large Language Model (LLM): Deep learning models trained on massive volumes of text data

Types of Inference:

Batch inference: Processing large volumes of data at once, prioritizing efficiency over speed
Real-time inference: Fast processing for applications requiring immediate responses

Data Types:

Labeled and unlabeled: With or without assigned categories
Structured: Tabular data
Semi-structured: With metadata like JSON or XML
Unstructured: Without defined model (text, images, videos)

Machine Learning Paradigms:

Supervised Learning: Model trained with labeled data (classification and regression)
Unsupervised Learning: Finds patterns in unlabeled data (clustering, dimensionality reduction)
Reinforcement Learning: Agent learns through trial and error with rewards/penalties

1.2 Identify practical AI use cases

When to use AI/ML:

Assist human decision-making
Scale solutions
Automate repetitive tasks

When NOT to use AI/ML:

Unfavorable cost-benefit analysis
Deterministic solutions required
Highly regulated areas requiring strict explainability

AWS Managed AI/ML Services:

Amazon SageMaker: Comprehensive platform for building, training, and deploying ML models
Amazon Transcribe: Speech-to-text (ASR)
Amazon Translate: Neural machine translation
Amazon Comprehend: NLP for extracting insights from text
Amazon Lex: Conversational interfaces (chatbots)
Amazon Polly: Text-to-speech (TTS)

1.3 Describe the ML development lifecycle

ML Pipeline Components:

Data collection
Exploratory data analysis (EDA)
Data preprocessing
Feature engineering
Model training
Hyperparameter tuning
Model evaluation
Deployment
Monitoring

Deployment Methods:

Managed API service: Amazon SageMaker endpoints
Self-hosted API: Deploy on own servers (Amazon EC2)

AWS Services by Stage:

Data preparation: Amazon SageMaker Data Wrangler, Feature Store
Build and train: Amazon SageMaker Notebooks, Model Training
Deploy and monitor: Amazon SageMaker Pipelines, Model Monitor

Domain 2: Generative AI Fundamentals (24% of exam)

2.1 Explain basic generative AI concepts

Fundamental Concepts:

Tokens: Smallest units of text a model can process
Embeddings: Numerical representations that capture semantic meaning
Prompt Engineering: Designing inputs to guide the model
Foundation Models: Large-scale ML models pre-trained on massive datasets
Multimodal Models: Process multiple data types (text, images, audio)
Diffusion Models: Create realistic data by reversing a noise-adding process

Use Cases:

Image, video, and audio generation
Text summarization
Chatbots
Translation
Code generation
Search and recommendation engines

2.2 Understand capabilities and limitations of generative AI

Advantages:

Adaptability to diverse tasks
Real-time response capability
Simplicity in automating content generation

Disadvantages and Challenges:

Hallucinations: Incorrect information that appears plausible
Interpretability: "Black box" models
Inaccuracy and lack of determinism
Toxicity: Potential for offensive content

2.3 Describe AWS infrastructure and technologies for generative AI

AWS Services:

Amazon Bedrock: Access to foundation models through API
Amazon SageMaker JumpStart: ML hub with pre-trained models
Amazon Q: Generative AI assistant for businesses
PartyRock: Amazon Bedrock playground

Domain 3: Foundation Model Applications (28% of exam)

3.1 Design considerations for foundation model applications

Model Selection Criteria:

Cost: Subscription prices, compute resources
Modality: Supported data types
Latency: Response speed
Model size and complexity
Input/output length

Inference Parameters:

Temperature: Controls randomness (high = creative, low = deterministic)

Retrieval-Augmented Generation (RAG):

Optimizes LLM output by referencing external knowledge base
Combines information retrieval with text generation
Knowledge Bases for Amazon Bedrock is a RAG implementation

Vector Databases:

Store embeddings for fast similarity searches
AWS services: OpenSearch Service, Aurora, Neptune, DocumentDB, RDS PostgreSQL

Foundation Model Customization:

Pre-training: Extremely expensive, from scratch
Fine-tuning: Adapt pre-trained model, less expensive
In-context learning: Guide with examples in prompt, cost-effective
RAG: Balanced approach, updates knowledge without retraining

3.2 Choose effective prompt engineering techniques

Prompting Techniques:

Zero-shot: No prior examples
One-shot / Few-shot: With one or several examples
Chain-of-thought: Step-by-step reasoning

Risks:

Prompt injection: Malicious prompt manipulation
Jailbreak: Bypassing security filters
Poisoning: Malicious data degrading performance

3.3 Describe foundation model training and fine-tuning process

Fine-tuning Methods:

Instruction tuning: With instruction-response examples
Domain-specific adaptation: Data from particular field
RLHF (Reinforcement Learning from Human Feedback): Align with human preferences

3.4 Describe methods for evaluating foundation model performance

Evaluation Approaches:

Human evaluation: Gold standard but expensive
Benchmark datasets: GLUE, SuperGLUE

Metrics:

ROUGE: For text summarization
BLEU: For machine translation
BERTScore: Semantic similarity with BERT embeddings

Domain 4: Guidelines for Responsible AI (14% of exam)

4.1 Explain responsible AI system development

Responsible AI Characteristics:

Fairness
Inclusivity
Robustness
Security
Veracity
Transparency
Governance

Bias and Fairness:

Types: algorithmic, data, sampling, prejudice bias
Effects: unfair results, overfitting/underfitting

AWS Tools:

Amazon SageMaker Clarify: Detects bias and explains predictions
Amazon SageMaker Model Monitor: Monitors models in production
Amazon Augmented AI (A2I): Human reviews
Guardrails for Amazon Bedrock: Security policies for generative AI

4.2 Recognize the importance of transparent and explainable models

Transparency vs. "Black Box" Models:

Transparent: easy to interpret (decision trees)
Opaque: difficult to understand (deep neural networks)

Tools:

Amazon SageMaker Model Cards: Document model information
AWS AI Service Cards: Information about responsible use

Domain 5: Security, Compliance, and Governance for AI Solutions (14% of exam)

5.1 Explain methods to secure AI systems

AWS Security Services:

AWS IAM: Access management with principle of least privilege
Encryption: At rest and in transit with AWS KMS
Amazon Macie: Discovers and protects sensitive data (PII)
AWS PrivateLink: Private connectivity between VPCs
Shared responsibility model: AWS secures the cloud, customer secures in the cloud

Data Citation and Lineage: