🅷🅰🆁🅳🅸🅺 🅹🅾🆂🅷🅸 for AWS Community Builders

Posted on Sep 7, 2024

AWS AI Practitioner - Last Min Revision Tips

#aws #awsaipractitioner

Are you ready to give the AWS AI Practitioner Certification Exam?

Below is the list of all the material and prep notes that helped me pass the exam.

Hope it will be helpful to you.

KEY NOTES: PLEASE READ FIRST

Below are my personal notes taken from the course I have taken around this

It is not exhaustive list and may not be specific sequence also

Usually my idea was to write everything down while learning and then go through it again and in case I don't recall the given topic then I would go in depth to understand it further.

Hope it helps you in prep, however way it can and most importantly WISH YOU GOOD LUCK..

In case the indentation is not showing correctly, you can go to this link - or message me and can send it to you.

What are Transformers in Artificial Intelligence? -> aws.amazon.com/what-is/transformers-in-artificial-intelligence/

What are Foundation Models? -> aws.amazon.com/what-is/foundation-models/

What is Artificial Intelligence (AI)? -> aws.amazon.com/what-is/artificial-intelligence/

What is Machine Learning? -> aws.amazon.com/what-is/machine-learning/

What is Deep Learning? -> aws.amazon.com/what-is/deep-learning/

What is Generative AI? -> aws.amazon.com/what-is/generative-ai/

What’s the Difference Between Supervised and Unsupervised Learning? -> aws.amazon.com/compare/the-difference-between-machine-learning-supervised-and-unsupervised/

Machine Learning Concepts -> docs.aws.amazon.com/machine-learning/latest/dg/machine-learning-concepts.html

AWS AI Use Case Explorer -> aws.amazon.com/machine-learning/ai-use-cases/?use-cases

What is Amazon SageMaker? -> docs.aws.amazon.com/sagemaker/latest/dg/whatis.html

AWS Services - Machine Learning (ML) and Artificial Intelligence (AI) -> docs.aws.amazon.com/whitepapers/latest/aws-overview/machine-learning.html

AWS Deploy Serverless ML ->aws.amazon.com/blogs/machine-learning/deploy-a-serverless-ml-inference-endpoint-of-large-language-models-using-fastapi-aws-lambda-and-aws-cdk/

AWS Sagemaker - API Gateway - AWS Lambda -> aws.amazon.com/blogs/machine-learning/call-an-amazon-sagemaker-model-endpoint-using-amazon-api-gateway-and-aws-lambda/

Inference parameters ->docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html

Inference parameters -> docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html?icmpid=docs_bedrock_help_panel_playgrounds

Amazon Bedrock or Amazon SageMaker? -> docs.aws.amazon.com/decision-guides/latest/bedrock-or-sagemaker/bedrock-or-sagemaker.html

Choosing a generative AI service -> docs.aws.amazon.com/decision-guides/latest/generative-ai-on-aws-how-to-choose/guide.html

AWS Bedrock Agents -> aws.amazon.com/bedrock/agents/

What is RAG? - Retrieval-Augmented Generation AI Explained - AWS (amazon.com)

docs.aws.amazon.com/awscloudtrail/latest/userguide/how-cloudtrail-works.html

docs.aws.amazon.com/bedrock/latest/userguide/usingVPC.html

aws.amazon.com/blogs/machine-learning/use-aws-privatelink-to-set-up-private-access-to-amazon-bedrock/

Known Data -> Features -> Algorithm -> Output

Adjustments

Inference

ML models can be trained on various types of data.

Structured data on RDS, S3 or Redshift

S3 is primary source of training data

Semi-structures = DynamoDB & DocumentDB

Unstructured data - tokenization

Timeseries - sequential data

Model Training - Algorithm

Inference 2 options

Real time

Low Latency

High throughput

persistent endpoint

-- Batch Transform

   Offline

   Large datasets

   Infrequent use

ML Types

Supervised Learning

  Amazon Sagemaker GroundTruth -> Amazon Mechanical Turk

Unsupervised Learning

Reinforcement Learning

  Reward - AWS DeepRacer

Overfitting

Model does well on training data but not outside it

Underfitting

Model cannot determine meaningful results. It gives negative results for training data and new inputs

Bias and fairness

Diversity of training data

Feature importance

Fairness constraints

Deep Learning

Neural Networks

Input Layer -> Hidden Layers -> Output Layer

Machine Learning vs Deep Learning

Consider alternatives when

Costs outweigh the benefits

Models cannot meet the interpretability requirements

Systems must be deterministic rather than probabilistic

ML Models are probabilistic

Supervised learning -

Classification

 Binary          - Diabetic or not diabetic

 MultiClass

Regression

  Simple Linear regression

  Multiple Linear regression

  Logistic regression

Unsupervised Learning

Clustering

 Define features

 Similarity function

 Number of clusters

Anomaly detection

  Data points that diverge

Amazon Rekognition

Facial comparison and analysis

Text detection

Object detection and labelling

Content moderation

Can find out explicit text from images and videos

Amazon Textract

Extract text from scanned documents

Amazon Comprehend

Extract key phrases, entities and sentiment.

Main is finding PII data

Amazon Lex

Conversational voice and text

Amazon Transcribe

Converts speech to text

Amazon Polly

Converts Text to speech

Amazon Kendra

Intelligent document search

Amazon Personalize

Personalized product recommendations

Amazon Translate

Translates between 75 languages

Amazon Forecast

Predicts future points in time-series data

Amazon Fraud Detector

Detects fraud and fraudulent activities

Amazon Bedrock

Amazon Sagemaker

ML Pipeline

Identify Business Goal -> Frame ML Problem -> Collect Data -> Pre-process Data -> Engineer Features -> Train, Tune Evaluate -> Deploy -> Monitor

Collect Data

AWS Glue -

  Cloud optimized ETL service

  Contains its own data catalog

  Built in transformations

AWS Glue DataBrew

  Point and click data transformation

  200+ transformations

AWS SageMaker Ground Truth

 Uses ML to label your training data

 Can automatically label

AWS SageMaker Canvas

 Import, Prepare, Transform, Visualize and analyze

AWS Sagemaker Feature Store

 Processes raw data into features by using a processing workflow

Amazon Sagemaker Experiments

 visual interface

Amazon Sagemaker automatic model tuning

Deploy

 Batch inference

 Real-time inference

 Self-managed

 Hosted

Amazon Sagemaker inference

Batch Transform

           Offline inference

           Large datasets

Asynchronous

           Long processing times

           Large payloads

Serverless

           Intermittent traffic

           Periods of no traffic

Real-time

           Live predictions

           Sustained traffic

           Low latency

           Consistent

Monitor the model

         Configure alerts to notify and initiate actions if any drift

         data drift / concept drift

Amazon Sagemaker Model Monitor

MLOps

  Amazon SageMaker Model Building Pipelines

  Repository Options

         AWS Codecommit

         AWS Sagemaker feature store

         AWS Sagemaker model registry

        3rd party repository

  Orchestration options

         Amazon Sagemaker pipelines

         Amazon managed workflows for apache airflow

         AWS Step functions

Accuracy = (True Positives + Ture Negatives) / Total

Precision = True Positives / (True Positivies + False Positives)

Recall = True Positives / (True Positives + False Negatives)

F1 = Precision Recall 2 / (Precision + Recall)

False Positive Rate FPR = False Positives / (True Negatives + False Positives)

True Negative Rate = True Negatives / (True Negatives + False Positives)

Area Under Curve - AUC

Regression Model Errors

  Mean Squared Error

   Root mean squared error

   Mean absolute error

A Framework to Mitigate Bias and Improve Outcomes in the New Age of AI(opens in a new tab) (opens in a new tab) (opens in a new tab)

2 What Are Transformers in Artificial Intelligence?(opens in a new tab)

3 What Is Overfitting?(opens in a new tab) (opens in a new tab)

4 What Are Large Language Models (LLMs)?(opens in a new tab)

5 Responsible Use of Machine Learning(opens in a new tab)

6 Easily Add Intelligence to Your Applications(opens in a new tab)

7 What Is MLOps?(opens in a new tab) (opens in a new tab) (opens in a new tab)

8 Amazon SageMaker MLOps: From Idea to Production in Six Steps(opens in a new tab)

9 Machine Learning Lens

Domain 2::

AI - ML - DL - GAI

Model

In-context learning

Prompts, prompt tuning, prompt engineering

Every NLP has a tokenizer which converts texts into token ID's.

Vector - ordered list of numbers.

     Ability to encode related relationships and collect associations

Embeddings

      Numerical vectorized representations of type that capture the semantic meaning of the token

Self-attention

LLMs

Deep learning foundation models

Transformers

Unimodal or multimodal

Multimodal use cases

Multimodal tasks

Diffusion Models

   Forward Diffusion

   Reverse Diffusion

  Stable Diffusion

          Does not use pixel space of the image, uses a reduced-definition latent space

SageMaker + Amazon Q Developer

Amazon Nimble studio and amazon samarian

Gen AI Architectures

    Generative Adversarial Networks GANs

    Variational autoencoders VAE

    Transformers

AI Project lifecycle

   Identify User case

   Experiment and select

   Adapt, align and augment

   Evaluate

   Deploy and integrate

  Monitor

Interpretability

   Intrinsic analysis

   Post hoc analysis

ML outputs are deterministic

Gen AI outputs are non-deterministic

Gen AI Performance metrics

    Recall - Oriented Understudy for Gisting Evaluation (ROUGE)

    Bilingual Evaluation Understudy (BLEU)

Transfer learning

SageMaker JumpStart

Selecting the Right Foundation Model for Your Startup(opens in a new tab) (opens in a new tab) (opens in a new tab) (opens in a new tab)

2 Generative Adversarial Networks Applications and its Benefits (opens in a new tab)

3 The Complete Guide to Generative AI Architecture(opens in a new tab) (opens in a new tab)

4 PartyRock.aws(opens in a new tab)

5 Monitoring Generative AI Applications Using Amazon Bedrock and Amazon CloudWatch Integration(opens in a new tab)

6 What Is a GAN?(opens in a new tab)

7 AWS Cloud Adoption Framework for Artificial Intelligence, Machine Learning, and Generative AI

Considerations

   Architecture

   Complexity

   Availability

  Compatibility

  Explainability

  Interpretability

Inference

   It is the process of generating an output from an input that you provided to the model.

   Input = Prompt and inference parameters

  Randomness and Diversity

         Temperature  (Lower value = high probability outputs and Higher value = Low probability outputs)

         Top K (Lower value = decrease the size of pool)

         Top P

   Length

         Response Length

        Penalties

        Stop sequences

  Prompt

        A specific set of inputs to guide LLMs to generate an appropriate output or completion

  RAG - Retrieval Augmented Generation (RAG)

         Prompt enrichment and appending external data to your prompt

         Vector Database

               Collection of data stored as mathematical representations



  AWS Services for Vector search databases

          Amazon OpenSearch Service

         Amazon OpenSearch Serverless

         Amazon Aurora PostgreSQL

         Amazon RDS PostgreSQL

        Amazon Aurora

       Amazon Neptune

      Amazon DocumentDB [with MongoDB compatibility]



 Amazon Bedrock AGENTS

     Orchestrate prompt completion workflows



Prompt

    Zero shot prompting

    Few shot prompting

    Prompt Template

   Chain-of-thought prompting

   Prompt tuning



Latent space

      The encoded knowledge of language in LLMs or the stored patterns of data that capture relationships and reconstruct the language from the patterns when prompted

      Statistical database

Prompt Engineering risks and limitations

     Exposure

     Prompt Injection

     Jailbreaking

    Hijacking

   Poisoning

Training process for foundation models

     Pretraining         - Self supervised learning

    Fine-tuning        - Supervised learning            :: Catastrophic forgetting

    Continuous pre-training



Fine-tuning techniques

     Parameter-efficient fine-tuning (PEFT)

     Low-Rank Adaptation (LoRA)

     Representation fine-tuning (ReFT)

    Multitask fine-tuning

    Domain adaption fine-tuning

    Reinforcement learning from human feedback (RLHF)



Data preparation fine-tuning

     Prepare your training data

    Select prompts

   Calculate loss

   Update weights

  Define evaluation steps

Data preparation AWS Services

   Amazon SageMaker Canvas

  Open-source frameworks

  Amazon Sagemaker studio - integration with EMR, can use jupyter labs

 Amazon Glue

 Amazon SageMaker Feature Store

 Amazon SageMaker Clarify  -- if you have bias in your data

Amazon SageMaker Ground Truth  -- manage data labelling

Model performance

    One option to reduce inference latency is to decrease the size of LLMs but might decrease its performance

Gen AI Performance Metrics

    Recall Oriented Understudy for Gisting Evaluation (ROUGE)

            Automatic summarization tasks

           Machine translation software

  Bilingual Evaluation Understudy (BLEU)

           Used for translation tasks

  General Language Understanding Evaluation (GLUE)

         Compare against benchmarks set by the experts

        Access model generalization across multiple tasks

Holistic Evaluation of Language Models (HELM)

      Help improve model transparency

Massive Multitask Language Understanding (MMLU)

     Evaluates knowledge and problem solving capabilities of the model

    Tested against history, mathematics, laws, computer science and more

Beyond the Imitation Game Benchmark (BIG-bench)

    Focuses on tasks that are beyond the capabilities of the current language models

AWS Services for model evaluation

  Amazon SageMaker JumpStart

 Amazon SageMaker Clarify

Review these materials to learn more about the topics covered in this exam domain:

1 What Are Foundation Models?(opens in a new tab) (opens in a new tab) (opens in a new tab)

2 Inference Parameters(opens in a new tab)

3 Knowledge Bases for Amazon Bedrock(opens in a new tab) (opens in a new tab)

4 Agents for Amazon Bedrock(opens in a new tab)

5 Amazon OpenSearch Service’s Vector Database Capabilities Explained(opens in a new tab)

6 The Role of Vector Datastores in Generative AI Applications(opens in a new tab)

7 Vector Engine for Amazon OpenSearch Serverless (opens in a new tab) (opens in a new tab) (opens in a new tab)

8 What Is Prompt Engineering?(opens in a new tab)

9 Domain-Adaptation Fine-Tuning of Foundation Models in Amazon SageMaker JumpStart on Financial Data(opens in a new tab) (opens in a new tab)

10 Metric: bleu(opens in a new tab)

11 Metric: rouge(opens in a new tab)

12 ReFT: Representation Fine-Tuning for Language Models

Responsible AI

    Fairness

   Explainability

  Robustness

  Privacy and security

 Governance

Transparency

Effects of bias and variance

  Demographic disparities

  Inaccuracy

 Overfitting

 Underfitting

User Trust

Responsible datasets

 Inclusivity

 Diversity

 Balanced datasets

 Privacy protection

 Consent and transparency

 Regular audits

Responsible practices

  Environmental considerations

  Sustainability

  Transparency

  Accountability

 Stakeholder engagement

AWS service for this

Amazon SageMaker Clarify

   Detect bias

  Explainability

 SageMaker Processing jobs

SageMaker pre-training bias analysis

 Class imbalance

 Label imbalance

 Demographic disparity

Difference in positive proportions

Specificity difference

Recall difference

Accuracy difference

Treatment equality

Gen AI Risks

Hallucinations

Intellectual Property

Bias

Toxicity

Data privacy

Guardrails for Amazon Bedrock

Hate

Insults

Sexual

Violence

Denied topics

Model transparency

Interpretability   - Deep analysis

Explainability - black box analysis

AI Service Card

Amazon SageMaker Model Cards

Sagemaker provides

    Feature attributions - SHAP Values

   Partial dependence plots

Amazon Augmented AI (A2I) - send data to human reviewers to review random predictions.

   Use your own reviewers or use mechanical turf

Responsible AI in the Generative Era(opens in a new tab) (opens in a new tab) (opens in a new tab)

2 Transform Responsible AI from Theory into Practice(opens in a new tab)

3 Tools and Resources to Build AI Responsibly(opens in a new tab) (opens in a new tab)

4 What Is RLHF?(opens in a new tab)

5 Responsible AI Best Practices: Promoting Responsible and Trustworthy AI Systems

IAM Identity Center

      Workforce users, Workforce identities

Logging with CloudTrail

     Captures API calls and related events

     Integrated with SageMaker

Amazon SageMaker Role Manager

   Preconfigured permissions for 12 activities

Encryption at rest

  Amazon SageMaker

       Data is encrypted by default on ML storage volumes

      Notebook instances, SageMaker jobs, and endpoints

AWS Key Management Service - KMS

Amazon Macie

    Identifies and alerts you to sensitive data

   Remove PII during ingestion

AI System Vulnerabilities

   Training Data

   Input Data

  Output Data

  Models

      Inversion

     Theft

LLM's

     Prompt Injection

Amazon SageMaker Model Monitor

  Capture data

  Create a baseline

  Define data quality monitoring jobs

  Evaluate statistics

Amazon SageMaker Model Registry

Amazon SageMaker Model Cards

Amazon SageMaker ML Lineage Tracking

Amazon SageMaker Feature Store

Amazon SageMaker Model Dashboard

Emerging AI compliance standards

   ISO 42001 and ISO 23894

  EU Artificial Intelligence Act

 NIST AI Risk Management Framework (RMF)

AI Risk Management

 Probability of occurrence

 Severity of occurrence

Algorithmic Accountability Act

 Transparency and explainability

 Monitor for Bias

AWS Audit Manager

  Audits AWS usage to assess compliance

 Choose a framework

       Gen AI

      Customer frameworks

Collect evidence and add to audit report

Guardrails for Amazon Bedrock

  Apply guardrails to any foundation model and agents for Amazon Bedrock

 Configure harmful content filtering

 Define and disallow denied topics

 PII data

AWS Config

 Continuously monitors and records configurations

 AWS Config rules

Conformance packs

        Operational best practices for AI and ML

       Security best practices for Amazon SageMaker

Amazon Inspector

   Works at application level

  Performs automated security assessments on your applications

AWS Trusted Advisor

   Provides guidance to help you

      Reduce cost

      Increase performance

      Improve security

Data Governance

  Curation

  Discovery and understanding

 Protection

Define roles

    Data steward

   Data owner

   IT Roles

AWS Glue DataBrew for data goverance

 Data profiling

 Data Lineage

AWS Glue Data Catalog

AWS Glue Data Quality

Curation

    Data Quality Management

   Data Integration

   Data Management

Protection

 Data Security

 Data Compliance

 Data Lifecycle management

Review these materials to learn more about the topics covered in this exam domain:

1 Shared Responsibility Model(opens in a new tab) (opens in a new tab) (opens in a new tab)

2 Securing Generative AI: Applying Relevant Security Controls(opens in a new tab)

3 AWS Cloud Adoption Framework for Artificial Intelligence, Machine Learning, and Generative AI(opens in a new tab)

4 AWS Compliance(opens in a new tab)

5 Customer Compliance Center(opens in a new tab) (opens in a new tab) (opens in a new tab)

6 NIST Artificial Intelligence Risk Management Framework(opens in a new tab)

7 ISO 42001: A New Foundational Global Standard to Advance Responsible AI(opens in a new tab)

8 The EU Artificial Intelligence Act(opens in a new tab) (opens in a new tab)

9 Learn How to Assess the Risk of AI Systems(opens in a new tab)

10 What Is Data Governance?(opens in a new tab)

11 Data Governance in the Age of Generative AI

How to Choose a Machine Learning Algorithm? (serokell.io

DEV Community

AWS AI Practitioner - Last Min Revision Tips

Top comments (0)

Read next

Karpenter to EKS Auto Mode, worth it?

Step-by-Step Guide: Deploying a Static Web Application in OpenShift Using a Custom S2I Builder Image

🧽 Cleaning up Security Hub with AWS Resource Explorer 🫧

Aurora DSQL - Simple Inserts Workload from an AWS CloudShell