DEV Community

Cover image for AWS Certified Generative AI Developer Professional AIP-C01: Study Reference

AWS Certified Generative AI Developer Professional AIP-C01: Study Reference

I put this together while preparing for AIP-C01. Daily work with Bedrock, Agents, and Knowledge Bases kept the prep short.

This is a concept-level study reference: service distinctions, decision trees, and common gotchas drawn from the official exam guide and AWS documentation. It contains no exam questions and no reproduced exam content.

Exam: AWS Certified Generative AI Developer – Professional (AIP-C01)
Format: 65 questions, 180 minutes. Scenario-based, long questions. Passing: 750/1000.
Level: Professional (assumes ~2+ years of AWS experience and 1+ year hands-on generative AI).


Study Approach

About the Exam

The AIP-C01 tests whether you can architect, implement, and secure generative AI applications on AWS. Questions present business scenarios with a specific constraint (cost, latency, compliance, scale, minimal effort) and ask you to select the right service or pattern. The skill is recognizing that constraint word and mapping it to the right decision, not memorizing service lists.

Second-best answers are designed to look right. The difference is usually one word in the scenario ("managed," "minimal code," "real-time," "non-real-time"). When two options seem equally correct, one works but is overkill; prefer the simpler or more managed choice.

Recommended Study Order

Work through the five domains in the order listed below. Domain 1 is the heaviest (31%) and provides foundational concepts that everything else builds on.

Domain 1: FM Integration, Data & Compliance (31%). Cover this first. The most frequently tested distinction is RAG vs fine-tuning. Focus on: Knowledge Bases sync behavior, vector store scale patterns (pgvector vs OpenSearch Service), and prompt engineering techniques.

Domain 2: Implementation & Integration (26%). Agents and deployment patterns. Focus on: Bedrock Agents vs AgentCore vs Step Functions, Converse API vs InvokeModel, Return of Control, and streaming architectures.

Domain 3: AI Safety, Security & Governance (20%). Guardrails mechanics (all four filter types and their modes), IAM access control patterns for Bedrock, VPC endpoint vs NAT gateway, Q Business vs Knowledge Bases.

Domains 4 + 5: Optimization & Testing (23% combined). More approachable once the first three domains are solid. Cost traps (Provisioned vs On-demand), evaluation metrics (ROUGE/BLEU/BERTScore), and throttling recovery patterns.

Final Review

Before sitting the exam, read through "Exam Traps: Deep Dive" in full, then drill "Quick Pattern Recognition" until each row is instant recall. Review "Wrong Answer Patterns" once; they flag the reliable trap answers.

Tips for Exam Day

  • Read the last sentence of each scenario first; it states the actual question.
  • Identify the specific constraint word: "minimize cost," "minimize development effort," "real-time," "compliance," "no internet access."
  • Flag and skip questions taking more than ~3 minutes; return after completing the rest.
  • 180 minutes / 65 questions is roughly 2.5–3 minutes per question; there's time to revisit.

Domain 1: FM Integration, Data & Compliance (31%)

1.1 Foundation Model Selection

Core: Match model capabilities to use case while balancing cost, latency, accuracy.

Services:

  • Amazon Bedrock: managed access to Claude, Titan, Llama, Mistral, Cohere
  • Amazon Nova: Pro (complex reasoning), Lite (high-volume/cheap), Micro (text-only), Premier (most capable), Sonic (voice), Canvas (images), Reel (video)
  • Amazon SageMaker JumpStart: deploy open-source models with full control
  • Amazon Bedrock Cross-Region Inference: route to regions with capacity

Decision Tree:

  • Managed + pay-per-token → Bedrock
  • Custom/open-source model → SageMaker
  • Cost-effective high volume → Nova Lite
  • Complex multi-step reasoning → Nova Pro / Claude
  • Multimodal (text+image) → Claude 3, Nova Pro
  • Real-time voice → Nova Sonic

Traps:

  • Amazon Bedrock Intelligent Prompt Routing automatically picks the cheapest model meeting a quality threshold.
  • Amazon Bedrock Custom Model Import brings fine-tuned models INTO Bedrock (not just SageMaker).
  • Provisioned Throughput ≠ Reserved Instances; it's dedicated model capacity.
  • Cross-Region Inference = availability, NOT cost optimization.

1.2 RAG (Retrieval-Augmented Generation)

Core: Augment FM responses with external knowledge at query time. Avoids hallucinations, keeps answers current without retraining.

Services:

  • Amazon Bedrock Knowledge Bases: managed RAG: auto-chunks, embeds, stores, retrieves
  • Amazon OpenSearch Service: vector search with HNSW, hybrid (keyword+semantic)
  • Amazon Aurora PostgreSQL + pgvector: vector store in relational DB
  • Amazon S3 Vectors: billions of vectors, cost-effective
  • Amazon Titan Text Embeddings V2: 1024-dim, normalized
  • Amazon Kendra: enterprise search with semantic + keyword hybrid

Decision Tree:

  • Managed RAG, minimal code → Bedrock Knowledge Bases
  • Hybrid search (keyword + vector) → OpenSearch Service or Kendra
  • Already have PostgreSQL → Aurora + pgvector
  • Billions of vectors, cost-sensitive → S3 Vectors
  • Re-ranking for precision → Bedrock Knowledge Bases with Cohere Rerank

Traps:

  • Chunking strategy matters: fixed-size (simple), semantic (better relevance), hierarchical (parent-child for context).
  • RAG = dynamic knowledge; Fine-tuning = style/format/domain adaptation.
  • Bedrock Knowledge Bases support metadata filtering; narrow search BEFORE vector similarity.
  • Hybrid search = BM25 (keyword) + kNN (vector) scores combined.
  • Scale: pgvector suits moderate scale (millions); OpenSearch Service suits massive scale (hundreds of millions) under strict latency.
  • Data freshness: Bedrock Knowledge Bases need a sync step; for near-immediate updates, prefer OpenSearch Service + a real-time indexing pipeline.
  • Scale + latency pattern: very large corpora (hundreds of millions of records/vectors) under a strict sub-second latency SLA → OpenSearch Service; moderate scale or an existing PostgreSQL footprint → pgvector.

1.3 Prompt Engineering

Core: Design inputs to FMs to get desired outputs.

Techniques:

  • Zero-shot: simple task, clear instruction
  • Few-shot: need specific output format (provide examples)
  • Chain-of-Thought: complex reasoning (step-by-step)
  • ReAct: reason + act (agents)

Services:

  • Amazon Bedrock Prompt Management: version, store, manage prompt templates
  • Amazon Bedrock Flows (formerly Prompt Flows): chain prompts into workflows with branching
  • Amazon Bedrock Converse API: unified multi-model API with system prompts, tool use

Traps:

  • System prompts set behavior/persona; user prompts are the actual query.
  • Temperature: 0 = deterministic, 1 = creative.
  • Bedrock Flows can include conditions, parallel branches, iterators.
  • Converse API normalizes tool_use across all models.

1.4 Vector Stores & Embeddings

Core: Embeddings convert text/images into dense vectors. Vector stores enable similarity search.

Services:

  • Titan Text Embeddings V2: text, 1024-dim, normalized
  • Amazon Titan Multimodal Embeddings: text + image in same vector space
  • Cohere Embed: multilingual (100+ languages)
  • OpenSearch Service k-NN: HNSW algorithm
  • pgvector: PostgreSQL extension, IVFFlat or HNSW

Traps:

  • HNSW = approximate nearest neighbor, faster but more memory than IVFFlat.
  • Cosine = direction; L2 = distance; inner product = magnitude+direction.
  • Dimension mismatch between embedding model and vector store = errors.
  • Re-indexing required when changing embedding model.
  • Titan V2 produces normalized vectors; V1 does not. CANNOT mix in same index.

1.5 Data Pipelines for GenAI

Services:

  • AWS Glue: ETL, crawlers, data catalog
  • Amazon Bedrock Data Automation: extract structured data from unstructured docs
  • Amazon Textract: OCR for documents
  • AWS Step Functions: orchestrate multi-step pipelines
  • Amazon EventBridge: trigger pipelines on new data

Traps:

  • Bedrock Knowledge Bases can sync from Amazon S3 automatically; no custom pipeline needed for basic RAG.
  • For custom chunking logic, you need an AWS Lambda-based pipeline before Knowledge Bases ingestion.
  • Glue is for structured/semi-structured ETL, not directly for vector embedding.

Domain 2: Implementation & Integration (26%)

2.1 Agentic AI & Bedrock Agents

Core: Agents reason, plan, and take actions autonomously using tools.

Services:

  • Amazon Bedrock Agents: managed agents with action groups (Lambda as tools)
  • Amazon Bedrock AgentCore: composable building blocks (Runtime, Memory, Identity, Gateway, Observability, built-in tools)
  • Strands Agents SDK: open-source Python SDK for custom agents
  • Agent Squad: open-source multi-agent orchestration, formerly Multi-Agent Orchestrator (supervisor/specialist routing)
  • Model Context Protocol (MCP): standardized tool interface
  • AWS Step Functions: deterministic workflow orchestration

Decision Tree:

  • Managed agent, minimal code → Bedrock Agents
  • Full control over agent logic → Strands Agents SDK
  • Multiple specialized agents collaborating → Agent Squad
  • Deterministic multi-step workflow → Step Functions
  • Agent needs external tool access → Action Groups (Lambda) or MCP servers
  • Custom agent with memory + identity + events → AgentCore

Traps:

  • Action Groups = AWS Lambda functions defined by OpenAPI schema.
  • Return of Control = agent pauses, returns the action to the client, client executes and returns the result.
  • Bedrock Agents use the ReAct pattern internally.
  • AgentCore vs Agents: AgentCore = composable infrastructure; Agents = fully managed turnkey.
  • Step Functions guarantee execution order, not AI decision-making.

2.2 Deployment Patterns

Decision Tree:

  • Simple Bedrock calls, spiky traffic → AWS Lambda + Amazon API Gateway
  • Long-running agent sessions → Amazon Elastic Container Service (Amazon ECS) / AWS Fargate
  • Custom model hosting → Amazon SageMaker Real-time Endpoint
  • Batch inference (non-real-time) → SageMaker Async or Bedrock Batch
  • Predictable high throughput → Provisioned Throughput
  • Streaming responses → WebSocket API or Lambda Response Streaming

Traps:

  • Lambda 15-min timeout is a problem for complex agent chains.
  • SageMaker Serverless = cold starts, NOT for latency-sensitive workloads.
  • Multi-model endpoints share an instance, reducing cost for many models.
  • Inference Components = fine-grained resource allocation on SageMaker.
  • Step Functions Standard vs Express: Standard = long-lived, exactly-once, Wait for Callback. Express = short, at-least-once, NO Wait states.
  • Clarification workflows + human-in-the-loop = Step Functions Standard with Wait for Callback.
  • Amazon DynamoDB for conversation state: on-demand + server-side encryption + session ID as key.
  • Amazon Augmented AI (Amazon A2I): route low-confidence results to human reviewers.

2.3 Enterprise Integration

Decision Tree:

  • Enterprise search/Q&A over internal docs → Amazon Q Business
  • Developer productivity → Amazon Q Developer
  • Sync REST API → API Gateway + Lambda + Bedrock
  • Real-time streaming → WebSocket or AWS AppSync subscriptions
  • Async processing → Amazon Simple Queue Service (Amazon SQS) + Lambda + Bedrock

Traps:

  • Q Business respects existing IAM/SSO permissions for document access.
  • API Gateway can cache responses for repeated identical prompts.
  • Use SQS for decoupling when Bedrock throttles (queue and retry).
  • Converse API supports streaming via InvokeModelWithResponseStream.

2.4 Amazon Bedrock APIs

Decision Tree:

  • Simple single call → InvokeModel
  • Multi-model support, tool use → Converse API (RECOMMENDED)
  • Need streaming → InvokeModelWithResponseStream
  • RAG with generation → RetrieveAndGenerate
  • Custom RAG logic → Retrieve + your own generation call

Traps:

  • Converse API is the recommended approach; works across all Bedrock models.
  • InvokeModel requires model-specific JSON format.
  • tool_use in Converse = function calling.
  • RetrieveAndGenerate handles the full RAG pipeline in one call but is less customizable.

2.5 AgentCore & Streaming Architectures

Decision Tree:

  • Custom agent with memory + identity + events → AgentCore
  • Managed agent, less control → Bedrock Agents
  • Real-time voice → text → FM → UI → Amazon Transcribe streaming + InvokeModelWithResponseStream + WebSocket
  • React app with streaming → AWS Amplify AI Kit
  • Native voice conversation → Nova Sonic

Traps:

  • AgentCore ≠ Bedrock Agents.
  • Transcribe partial results = text fragments BEFORE the speaker finishes.
  • One synchronous component in a streaming chain kills real-time latency.
  • WebSocket API (not REST) for bidirectional streaming.

2.6 Canary Deployments & Traffic Management

Pattern: EventBridge trigger → Step Functions → staged shift → Lambda metric check → rollback.

Traps:

  • API Gateway canary alone doesn't check Bedrock-specific metrics or auto-rollback.
  • Step Functions Standard (not Express) for long-running deployment workflows.
  • Cross-Region inference profiles solve throughput bottlenecks, not just DR.
  • Token batching reduces API overhead during high-traffic periods.

Domain 3: AI Safety, Security & Governance (20%)

3.1 Document Processing Pipelines

Pattern: Extract → Redact PII → FM Inference → Human Review (low confidence).

Decision Tree:

  • Scanned PDFs → structured data → Textract or Bedrock Data Automation
  • Low-confidence results → human review → Amazon A2I
  • PII redaction before FM → Lambda + Amazon Comprehend or Amazon Bedrock Guardrails PII filter
  • Regional data residency → Amazon S3 bucket per region + AWS Identity and Access Management (IAM) region conditions + service control policies (SCPs)

Traps:

  • A2I routes to reviewers IN THE SAME REGION as the data.
  • Lambda PII redaction happens BEFORE Bedrock inference, not after.
  • Guardrails PII = runtime on model I/O. Lambda redaction = pre-processing on source docs.
  • Pattern: high daily document throughput plus a high-availability SLA → fully managed extraction + review (Textract + A2I) over self-managed infrastructure.

3.2 Amazon Q Business & Q Developer

Decision Tree:

  • Non-technical employees need doc Q&A with access control → Q Business
  • Developer productivity + org-specific code patterns → Q Developer with customizations
  • Enforce approved libraries/resources → Q Developer customizations
  • Custom RAG app with full control → Bedrock Knowledge Bases (not Q Business)

Traps:

  • Q Business vs Bedrock Knowledge Bases: Q Business = end-user product with connectors + SSO. Bedrock Knowledge Bases = developer API.
  • Q Business respects SOURCE permissions; if a user can't access a doc, Q won't show its content.
  • Q Developer customizations connect to your repos; suggestions match your org's patterns.

3.3 Conversation State & Multi-turn Apps

Correct Pattern: DynamoDB on-demand + AWS Key Management Service (AWS KMS) + Step Functions Standard + Wait for Callback.

Traps:

  • Express workflows CANNOT use Wait states; instant disqualifier for clarification flows.
  • DynamoDB on-demand auto-scales for thousands of concurrent users.
  • Amazon S3 for conversation history is too slow for real-time lookups (WRONG).
  • Amazon ElastiCache alone is not durable enough for compliance.
  • Amazon RDS is overkill for session data.

3.4 Bedrock Guardrails

Features:

  • Content Filters: hate, violence, sexual, misconduct, prompt attacks (configurable thresholds)
  • Denied Topics: block specific subjects (e.g., competitor discussion)
  • Word Filters: profanity or custom word lists
  • PII Filters: detect and redact/block PII (ANONYMIZE vs BLOCK)
  • Contextual Grounding: check if a response is grounded in source
  • ApplyGuardrail API: apply independently of model invocation

Traps:

  • Guardrails apply to ANY model in Bedrock.
  • ApplyGuardrail API works with SageMaker or self-hosted models too.
  • Contextual Grounding NEEDS a source reference to check against.
  • PII ANONYMIZE = replace with a placeholder & continue. BLOCK = reject the entire response.
  • Guardrails are evaluated BEFORE and AFTER model invocation.
  • Content filters ≠ Denied Topics: Content filters = hate/violence categories. Denied Topics = custom business rules.
  • Grounding threshold: HIGH = strict (blocks more hallucinations but may over-block).
  • DETECT vs BLOCK mode: DETECT = flag/notify but allow through. BLOCK = reject entirely.

3.5 IAM & Access Control for GenAI

Decision Tree:

  • Restrict model access per team → IAM policies with bedrock:InvokeModel + condition on bedrock:ModelId
  • No internet access → Amazon Virtual Private Cloud (Amazon VPC) endpoint for Bedrock (AWS PrivateLink)
  • Encrypt Knowledge Bases data → AWS KMS customer managed key
  • Audit who called what model → AWS CloudTrail
  • Block certain models org-wide → SCP

Traps:

  • bedrock:ModelId condition key restricts which models a role can invoke.
  • Model invocation logging captures input/output; encrypt with AWS KMS.
  • Cross-region inference still respects IAM in the calling region.
  • Bedrock Agents need their own IAM role with permissions to call action group Lambda functions.
  • A VPC endpoint ≠ NAT gateway (NAT still routes through the internet).

3.6 Responsible AI & Compliance

Decision Tree:

  • Detect bias in model outputs → Amazon SageMaker Clarify
  • Document a model for governance → Model Cards
  • No PII in training data → Amazon Macie scan of Amazon S3
  • Runtime content safety → Guardrails
  • Compliance audit trail → AWS Audit Manager + CloudTrail

Traps:

  • Clarify = bias measurement for traditional ML. GenAI fairness needs custom evaluation.
  • Model Cards are documentation, not enforcement.
  • Bedrock model evaluation jobs can assess toxicity, accuracy, robustness.
  • Human-in-the-loop = Amazon A2I.

Domain 4: Operational Efficiency & Optimization (12%)

4.1 Cost Optimization

Decision Tree:

  • Variable quality needs → Intelligent Prompt Routing
  • Predictable high volume → Provisioned Throughput
  • Non-real-time bulk processing → Batch Inference (~50% cheaper)
  • Long system prompts reused → Prompt Caching
  • Simple classification/extraction → Nova Lite

Traps:

  • Input tokens are cheaper than output tokens; keep outputs concise.
  • Prompt caching saves cost on repeated long contexts.
  • Intelligent Prompt Routing needs a quality threshold defined.
  • Batch inference has NO SLA on completion time.
  • Spiky traffic + "optimize cost" → on-demand is already optimal (common trap).
  • Semantic caching (vector-based) for near-identical queries, not DynamoDB/ElastiCache.

4.2 Performance & Monitoring

Decision Tree:

  • Track token usage/cost → Amazon CloudWatch metrics (InputTokenCount, OutputTokenCount)
  • Debug slow responses → AWS X-Ray traces
  • Alert on throttling → CloudWatch alarm on ThrottledCount
  • Improve UX → Response Streaming (TTFT is the primary metric)
  • Audit inputs/outputs → Model Invocation Logging (opt-in!)

Traps:

  • Model invocation logging must be explicitly enabled, NOT on by default.
  • Logging captures full prompts/responses; encrypt with AWS KMS, restrict access.
  • Time-to-first-token (TTFT) is the primary UX metric for streaming.
  • Throttling → request a limit increase or use Provisioned Throughput.
  • CloudTrail = API metadata. Invocation logging = actual prompts/responses.

Domain 5: Testing, Validation & Troubleshooting (11%)

5.1 Model Evaluation

Decision Tree:

  • Compare two models on the same task → Bedrock Model Evaluation job
  • Need human reviewers → Bedrock Human Evaluation (uses Amazon SageMaker Ground Truth)
  • Track experiments over time → Amazon SageMaker Experiments
  • Automated quality gate in CI/CD → Lambda + custom metrics
  • Scale evaluation cheaply → LLM-as-judge pattern

Traps:

  • Bedrock Model Evaluation is a BATCH job, not real-time monitoring.
  • Human evaluation uses the SageMaker Ground Truth workforce under the hood.
  • LLM-as-judge: use a stronger model to evaluate a weaker one.
  • RAGAS metrics for RAG: faithfulness, answer relevancy, context precision.

5.2 Troubleshooting & Debugging

Common Errors:

  • ThrottlingException → exponential backoff + jitter, request limit increase
  • ValidationException → malformed request (wrong model ID, bad JSON)
  • AccessDeniedException → check bedrock:InvokeModel permission
  • ModelTimeoutException → increase timeout or use async
  • Context window exceeded → truncate input or summarize

Quality Issues:

  • Hallucinations → improve RAG (better chunking, grounding-check guardrail)
  • Context overflow → summarize history, sliding window
  • Poor retrieval → check embedding model, chunking strategy, metadata filters
  • High latency → enable streaming, smaller model, check cold starts
  • Wrong source cited → context-precision issue; improve retrieval with metadata filtering

5.3 Evaluation Metrics

When to use which metric:

  • ROUGE (Recall-Oriented Understudy for Gisting Evaluation) → summarization. Measures overlap of n-grams between generated summary and reference. ROUGE-1 (unigrams), ROUGE-2 (bigrams), ROUGE-L (longest common subsequence).
  • BLEU (Bilingual Evaluation Understudy) → translation. Measures precision of n-grams in generated text against a reference. Higher = better translation.
  • BERTScore → semantic similarity. Uses BERT embeddings to compare meaning rather than exact word overlap. Good when paraphrasing is acceptable.
  • Perplexity → language-model quality. Lower = the model is more confident in predicting next tokens. Not directly useful for task evaluation.
  • RAGAS metrics for RAG specifically:
    • Faithfulness: is the answer supported by the retrieved context?
    • Answer relevancy: does the answer address the question?
    • Context precision: are the retrieved chunks from the right documents?
    • Context recall: did we retrieve all relevant information?

Traps:

  • ROUGE measures recall (did we capture the key info?). BLEU measures precision (is the output clean?).
  • BERTScore handles paraphrasing; ROUGE/BLEU don't (exact word match only).
  • Perplexity is a model-level metric, not a task-level metric; wrong answer for "evaluate output quality."

5.4 Testing Patterns for Production GenAI

Prompt Regression Testing:

  • Maintain a test suite of input/expected-output pairs.
  • Run after every prompt change to catch regressions.
  • Automate with Lambda + Bedrock + assertions in CI/CD.
  • Track scores over time (SageMaker Experiments or a custom DynamoDB table).

Load Testing GenAI APIs:

  • GenAI has unique load characteristics: variable response times, token-based throughput.
  • Test with realistic prompt lengths and expected concurrency.
  • Monitor: TTFT, total latency, throttling rate, error rate under load.
  • Use this to determine whether you need Provisioned Throughput.

A/B Testing Models/Prompts:

  • Route a percentage of traffic to variant B.
  • Measure quality metrics (not just latency/errors).
  • Bedrock Model Evaluation for offline comparison; production A/B for real-user validation.

5.5 Additional Topics

Structured Output & JSON Schema Enforcement:

  • Use system prompts with explicit JSON schema instructions.
  • Converse API tool_use can enforce structured responses.
  • Bedrock Flows can validate output format between steps.
  • For strict enforcement: parse output in Lambda, retry if malformed.

Watermarking & Provenance:

  • Track AI-generated content origin for compliance.
  • Amazon Nova Canvas and the Amazon Titan Image Generator include invisible watermarks.
  • For text: log model invocations with full input/output (invocation logging).
  • Provenance = audit trail of which model, which prompt, which version generated content.

LangChain / LlamaIndex with Bedrock:

  • Both frameworks integrate with Bedrock as an LLM provider.
  • LangChain: chains, agents, memory abstractions on top of Bedrock.
  • LlamaIndex: data framework for RAG pipelines with Bedrock.
  • When "minimize operational overhead" is the constraint, Bedrock-native features (Knowledge Bases, Agents, Flows) are the preferred answers.

Amazon Bedrock Flows:

  • Visual/no-code workflow builder for GenAI pipelines.
  • Chain prompts with conditions, parallel branches, iterators.
  • Different from Step Functions: Flows = prompt-centric. Step Functions = service orchestration.
  • Use when: a multi-step prompt pipeline without custom code.

Exam Traps: Deep Dive

Scan the bold title for quick review. Read the explanation to build the mental model.


Guardrails & Safety

1. Guardrails ≠ Fairness/Bias Measurement

Guardrails are a runtime safety gate; they sit between the user and the model and filter content in real time. Think of them as a bouncer at a club door. They check: "Is this toxic? Is there PII? Is this an off-limits topic?" But they don't measure statistical fairness across demographic groups. That's a different job: measuring whether your model treats Group A differently from Group B requires running evaluation datasets through the model and computing metrics like disparate impact. That's what SageMaker Clarify does. Mental model: Guardrails = real-time filter. Clarify = offline measurement.

2. Guardrails Evaluate BOTH Input AND Output

This is counterintuitive; most people think "filter the response." But Guardrails have two checkpoints. The input filter catches prompt injection attacks and inappropriate requests BEFORE they reach the model (saving tokens and preventing the model from even seeing bad content). The output filter catches cases where the model generates something harmful despite a clean input. If either checkpoint triggers, the request is blocked. Mental model: Two gates, one before the model and one after.

3. PII Modes: ANONYMIZE vs BLOCK: completely different UX

ANONYMIZE replaces "John Smith, SSN 123-45-6789" with "[NAME], [SSN]" and continues processing. The user gets a response, just with PII scrubbed. BLOCK rejects the ENTIRE request; the user gets an error, no response at all. In a customer-communication app, BLOCK is too aggressive (users can't even ask about their own account). In a public-facing chatbot, BLOCK might be appropriate to prevent any PII leakage. Mental model: ANONYMIZE = surgeon (removes the problem, patient lives). BLOCK = bouncer (you're not coming in at all).

4. Contextual Grounding Needs a Source Document

This is NOT a magic hallucination detector. It works by comparing the model's response against a specific source document you provide. It asks: "Is claim X in the response supported by evidence in document Y?" Without a source document, it has nothing to compare against, so it only works in RAG scenarios where you've retrieved documents. Open-ended generation with no retrieval gets no help from it. Mental model: It's a fact-checker that needs the reference material. No reference = can't check.

5. ApplyGuardrail API: works with any model

Most people assume Guardrails are locked to Bedrock. But the ApplyGuardrail API is a standalone text-in/text-out safety filter. You can send it text from SageMaker endpoints, self-hosted models on Amazon EC2, or even third-party APIs; pass the text and get back whether it passes or fails. This lets you standardize safety across your entire AI stack, not just Bedrock. Mental model: Guardrails = independent safety service, not a Bedrock-only feature.

6. Content Filters vs Denied Topics: different mechanisms

Content Filters are pre-built categories: hate speech, violence, sexual content, misconduct, prompt attacks. They use AWS's built-in classifiers with configurable thresholds (NONE/LOW/MEDIUM/HIGH). Denied Topics are YOUR custom business rules described in natural language: "never provide specific investment recommendations" or "never discuss competitor products." The model understands the intent, not just keywords. Mental model: Content Filters = AWS's safety categories. Denied Topics = your company's rules.

7. InvocationsIntervened ≠ Errors or Throttling

This CloudWatch metric specifically counts how many times Guardrails stepped in and modified or blocked a response. It's a safety metric, not an error metric. A high value means users are frequently hitting safety boundaries; maybe the guardrails are too strict, or users are testing limits. ThrottledCount is the separate metric for rate limiting. Mental model: Intervened = safety triggered. Throttled = rate limit hit. Errors = something broke.


RAG & Retrieval

8. RAG vs fine-tuning: the fundamental distinction

RAG retrieves external knowledge at query time; the model's weights don't change. Fine-tuning changes the model's weights to alter its behavior. Use RAG when knowledge changes frequently, you need citations, or you want updates without retraining. Use fine-tuning when you need a specific style, a specific format, or deep domain jargon. "Company has internal docs" scenarios almost always point to RAG, not fine-tuning. Mental model: RAG = giving the model a reference book. Fine-tuning = teaching the model a new skill.

9. Bedrock Knowledge Bases Sync is NOT Automatic

You upload a new PDF to Amazon S3. It sits there. The Knowledge Base doesn't know about it until you call StartIngestionJob (or it runs on a schedule you configured). This is critical for "data freshness" questions. If documents update frequently and must be searchable immediately, Bedrock Knowledge Bases may not be the answer; you'd want OpenSearch Service with a real-time indexing pipeline (EventBridge → Lambda → embed → index). Mental model: S3 upload ≠ indexed. There's a "sync" step between them.

10. Amazon Q Business vs Bedrock Knowledge Bases

Q Business is a finished product, essentially deploying an enterprise ChatGPT. It has a UI, 40+ data connectors (SharePoint, Confluence, Salesforce, Amazon S3), SSO integration, and respects existing document permissions. Non-technical employees use it directly. Bedrock Knowledge Bases is a developer building block: an API that returns relevant chunks; you build your own UI, auth, and everything else on top. Use Q Business when employees need to ask questions over internal docs under existing access controls; use Bedrock Knowledge Bases when a development team is building a custom RAG application. Mental model: Q Business = product for end users. Bedrock Knowledge Bases = API for developers.

11. pgvector vs OpenSearch Service: scale matters

pgvector is a PostgreSQL extension. It's great if you already run PostgreSQL and need vector search for millions of vectors. But PostgreSQL wasn't designed for vector search at massive scale; at hundreds of millions of vectors with sub-second latency requirements, it struggles. OpenSearch Service with HNSW was purpose-built for this: distributed, horizontally scalable, optimized for approximate nearest neighbor at massive scale. Rule of thumb: hundreds of millions of vectors + a tight latency SLA → OpenSearch Service; moderate scale or an existing PostgreSQL footprint → pgvector. Mental model: pgvector = good enough for moderate scale. OpenSearch Service = purpose-built for massive scale.

12. Chunking Strategy: fixed vs semantic vs hierarchical

Fixed-size chunking splits every N tokens regardless of content; it can split a legal argument mid-sentence or separate a function from its docstring. Semantic chunking splits on natural boundaries (paragraphs, sections, topic shifts), keeping related content together. Hierarchical chunking creates parent-child relationships: small specific chunks for precise retrieval, linked to larger parent chunks for context. Apply it when reports describe missing surrounding context → hierarchical; long technical documents with weak relevance scores → semantic. Mental model: Fixed = dumb scissors. Semantic = smart scissors. Hierarchical = scissors + table of contents.

13. Graph RAG for Multi-hop Relationships

Standard vector RAG finds documents SIMILAR to your query. But "which suppliers are connected to Company X through shared board members?" is a relationship traversal, not a similarity search. Graph RAG uses Amazon Neptune Analytics to store entities and relationships as a graph, then traverses connections. Vector search would just find documents mentioning Company X; it can't traverse relationships. Mental model: Vector RAG = "find similar things." Graph RAG = "follow the connections between things."

14. Knowledge Bases Source Attribution vs Extended Thinking

Source attribution in Bedrock Knowledge Bases returns citations: "this claim comes from document X, page Y." It's about provenance: where did the answer come from? Extended Thinking (Claude) shows the model's internal reasoning, its chain-of-thought. Completely different features; you can have both, neither, or either. Mental model: Source attribution = footnotes/citations. Extended Thinking = showing your work.


Agents & Orchestration

15. Step Functions vs Bedrock Agents: deterministic vs AI-driven

Step Functions execute a pre-defined workflow: "first do A, then if condition B do C, else D." The flow is set at design time. Bedrock Agents use AI reasoning to decide what to do next: "given the request, should I look up the order, check inventory, or process a return?" The agent decides at runtime. Known exact sequence → Step Functions. AI figures out what to do → Bedrock Agent. Mental model: Step Functions = flowchart you drew. Agent = employee who figures it out.

16. AgentCore vs Bedrock Agents: infrastructure vs product

Bedrock Agents = fully managed, turnkey. You define action groups and instructions; AWS handles the ReAct loop, memory, everything. AgentCore = composable infrastructure building blocks: managed memory, session identity, event handling, observability, but YOU write the agent logic. Need custom agent logic with managed memory and identity → AgentCore. Need a working agent with minimal code → Bedrock Agents. Mental model: Agents = turnkey product. AgentCore = managed infrastructure, custom logic.

17. Action Groups Need an OpenAPI Schema

A Bedrock Agent can't just "call a Lambda function." It needs to know what the tool does, what parameters it accepts, and what it returns. The OpenAPI schema provides this contract. Without it, the agent has no way to reason about when to use the tool or what arguments to pass; like giving someone a phone number without saying who's on the other end. Mental model: OpenAPI schema = the tool's instruction manual for the agent.

18. Step Functions Standard vs Express: wait states are the deciding factor

Express Workflows are fast, cheap, and short-lived (5 min max), but they CANNOT pause and wait. Standard Workflows can run up to a year and support "Wait for Callback": the workflow pauses, sends a token to an external system, and resumes when that system calls back with the token. Essential for human-in-the-loop: "pause until the human approves" or "wait for the user to clarify." Anything mentioning clarification, human review, or waiting for external input → Standard. Mental model: Express = fire and forget. Standard = can pause and wait (durable).

19. Amazon A2I vs SageMaker Ground Truth

Both involve humans reviewing AI outputs, but at different stages. Ground Truth = humans label training data BEFORE you train a model. A2I = humans review production predictions AFTER deployment, triggered by low confidence: "Textract is only 60% sure about this field → route to a human reviewer." Ground Truth is for building datasets; A2I is quality control in production. Mental model: Ground Truth = building the training set. A2I = quality control in production.

20. Step Functions 256 KB Payload Limit

Each state can only pass 256 KB of data to the next state. GenAI outputs (reasoning traces, multi-agent conversations) can easily exceed this. The pattern: store large data in Amazon S3, pass the S3 URI between states, and have the next state read from S3. A common "why is my workflow failing?" debugging scenario. Mental model: States pass references (S3 URIs), not the actual large data.


Cost & Performance

21. Cross-Region Inference = Availability, NOT Cost

Pricing is the same regardless of which region serves your request. Cross-Region Inference automatically routes to regions with available capacity when your primary region is saturated; it's a scaling/availability mechanism. The cost levers are Intelligent Prompt Routing (cheaper model) and Batch Inference (~50% off). Mental model: Cross-Region = "find me a region that's not busy." Intelligent Routing = "find me a cheaper model."

22. Provisioned Throughput: only for steady, predictable load

You pay for dedicated capacity whether you use it or not. If traffic is high during the day and minimal at night, you're paying for peak capacity 24/7. On-demand charges per token; at night you pay almost nothing. Provisioned makes sense only with consistent high volume where the per-token discount outweighs idle cost. Common trap: "variable traffic" + "optimize costs" → on-demand is already optimal. Mental model: Provisioned = gym membership (pay monthly regardless). On-demand = pay-per-class.

23. Prompt Caching vs Prompt Management: money vs organization

Bedrock Prompt Management is a filing cabinet; it stores, versions, and organizes prompt templates. It doesn't save you any money on inference. Prompt Caching is a computational optimization: when a long system prompt is identical across requests, caching means the model doesn't re-process those tokens each time; you pay for the cached prefix once and reuse it. Mental model: Management = organizing recipes in a binder. Caching = pre-heating the oven so every dish cooks faster.

24. Intelligent Prompt Routing Needs a Quality Threshold

It doesn't blindly pick the cheapest model. You define a quality bar ("responses must score at least 0.8 on my metric"), then it routes to the cheapest model meeting that bar; simple queries go to a cheap model, complex ones to an expensive one. Without a threshold, it can't make the tradeoff. Mental model: A smart dispatcher: "what's the cheapest taxi that still gets there on time?"

25. Semantic Caching ≠ Traditional Caching

Amazon DynamoDB or Amazon ElastiCache cache exact key matches. "What is AWS Lambda?" and "Tell me about AWS Lambda" are different keys = cache miss. Semantic caching embeds the query into a vector, searches against cached query vectors, and returns the cached response if similarity is above a threshold; it handles paraphrasing. This needs a vector store (OpenSearch Service k-NN, Amazon MemoryDB), not a key-value store. Mental model: Traditional cache = exact match. Semantic cache = similar meaning (same intent, different words).

26. Provisioned Throughput Requires the ARN

After you purchase Provisioned Throughput, you get back a provisioned model ARN. You MUST use this ARN in your InvokeModel calls. If you keep using the base model ID, your requests still go to on-demand; you're paying for provisioned capacity you're not using. Mental model: Buying a reserved parking spot doesn't help if you keep parking in the general lot.

27. PerformanceConfigLatency vs Provisioned Throughput

These solve different problems. PerformanceConfigLatency: optimized tells Bedrock to prioritize speed for this request (potentially faster hardware paths). Provisioned Throughput guarantees dedicated capacity so you don't get throttled. You can be throttled but fast (need Provisioned) or have capacity but slow (need PerformanceConfig). Mental model: PerformanceConfig = "drive faster." Provisioned = "guarantee there's a lane open for you."


Security & Access

28. VPC endpoint vs NAT gateway: the internet question

A NAT gateway lets private-subnet resources reach the internet: traffic goes out to the public internet and back. Even for AWS services, packets traverse the public internet. A VPC endpoint (AWS PrivateLink) creates a private connection directly to the AWS service; traffic never leaves the AWS private network. When the requirement is "no data can leave the VPC" or "no internet access," the answer is a VPC endpoint. A NAT gateway is a trap because it sounds private (it's in your VPC) but still uses the internet. Mental model: NAT = private door to the public street. VPC endpoint = private tunnel directly to the destination.

29. Lake Formation for Column-Level Access

Amazon S3 bucket policies work at the object level; grant access to a file, but not to specific columns within a Parquet file. IAM policies can't do column-level filtering either. AWS Lake Formation provides LF-tag-based access control at table AND column level, even across accounts. When the requirement is "cross-account" + "column-level" + "data lake" → Lake Formation. Mental model: S3 policies = "you can read this file." Lake Formation = "you can read columns A and B but not C."

30. Cross-Region Inference Uses Inference Profile ARNs

You don't just "enable" Cross-Region Inference. You create an inference profile (e.g., eu.amazon.nova-pro-v1:0) that defines which regions can serve requests. Your IAM policies and SCPs must allow this profile ARN, not the base model ID. If your SCP allows only the base model ID but you're calling the regional inference profile, it will be denied. Mental model: The inference profile is a new "address" for the model that includes the routing logic.


APIs & Integration

31. Converse API is the standard: InvokeModel is legacy

InvokeModel requires you to format the request body differently for each model provider (Claude one way, Titan another, Llama another). Converse API provides ONE format across all models, including standardized tool_use (function calling). When the requirement is multi-model support or unified integration → Converse. Mental model: InvokeModel = speaking each model's native language. Converse = universal translator.

32. RetrieveAndGenerate vs Retrieve: convenience vs control

RetrieveAndGenerate does everything in one call: retrieves chunks from the Knowledge Base, builds the prompt with context, calls the model, returns the answer; convenient but inflexible (no re-ranking, filtering, different generation model, or custom post-processing). The Retrieve API just returns chunks; you build the prompt and call InvokeModel separately: more code, full control. Mental model: RetrieveAndGenerate = microwave meal. Retrieve + InvokeModel = cooking from scratch.

33. Q Developer Customizations: org-specific code

Out of the box, Q Developer suggests code from its general training. With customizations, you connect it to your internal repositories and define approved resource lists, so it suggests code matching YOUR patterns, libraries, and conventions. When the requirement is "developers must only use approved libraries" or "suggestions should match internal patterns" → Q Developer customizations. Mental model: Default Q Developer = generic cookbook. Customized = your company's internal cookbook.


Data & Embeddings

34. Titan Embeddings V1 vs V2: cannot mix

V2 produces normalized vectors (unit length, always magnitude 1) and supports configurable dimensions; V1 doesn't normalize. Search a V2 index with V1 embeddings (or vice versa) and similarity scores are meaningless because the vector spaces are incompatible. Switching embedding models means re-embedding your ENTIRE corpus and rebuilding the index; expensive and slow. Mental model: V1 and V2 speak different "vector languages." You can't mix languages in one conversation.

35. Nova Forge vs SageMaker for Fine-tuning

The Amazon Nova Forge SDK is a Python SDK for customizing Amazon Nova models across both SageMaker AI and Amazon Bedrock, useful for advanced workflows (continued pre-training, SFT, DPO, RFT). You can also fine-tune Nova directly in Bedrock for simpler supervised/reinforcement fine-tuning. SageMaker handles open-source models (Llama, Mistral, Falcon) where you need full control over training infrastructure. Mental model: Nova Forge = full-lifecycle customization toolkit for Nova; SageMaker = bring-any-open-model workshop.

36. HNSW vs Flat Index: scale determines choice

HNSW (Hierarchical Navigable Small World) is an approximate algorithm: fast but may miss the true nearest neighbor; optimized for millions/billions of vectors where exact search is impossible. Flat index does brute-force exact search, checking every vector; slow at scale but 100% accurate. For small proprietary datasets (thousands to low millions), Flat gives perfect results with acceptable latency. Mental model: HNSW = GPS navigation (fast, usually right). Flat = checking every possible route (slow, always finds the best one).


Monitoring & Ops

37. Model Invocation Logging is Opt-In

By default, Bedrock only logs API metadata to CloudTrail: who called InvokeModel, when, which model. The actual prompt and response text are NOT logged anywhere. You must explicitly enable it to capture full content; AWS defaults this to off because prompts often contain sensitive data. Once enabled, encrypt the logs with AWS KMS and restrict access tightly. Mental model: CloudTrail = security camera showing who entered. Invocation logging = recording what they said inside.

38. Model Evaluation Jobs ≠ Production Monitoring

Bedrock Model Evaluation is a batch job you run offline: "here are 1000 test inputs, compare Model A vs Model B on accuracy and toxicity." It produces a report; it doesn't run continuously in production. For production monitoring, use CloudWatch metrics (latency, token counts, throttling), custom quality metrics, and alarms. Mental model: Model Evaluation = lab test before launch. CloudWatch = dashboard after launch.

39. Canary Deployments Need the Full Pattern

API Gateway has a "canary" feature that splits traffic by percentage, but it doesn't know about Bedrock-specific metrics (hallucination rate, response quality). A proper canary for GenAI needs: (1) EventBridge triggers on a new model version, (2) Step Functions orchestrates a staged traffic shift (e.g., 10% → 25% → 50% → 100%), (3) Lambda checks CloudWatch metrics at each stage, (4) automatic rollback if metrics degrade. The full pattern matters, not just "use API Gateway canary." Mental model: API Gateway canary = splitting traffic. Full canary = splitting traffic + watching metrics + auto-rollback.

40. Guardrails Don't Manage Token Quotas

Guardrails filter content (safety). They have nothing to do with token counting, cost management, or quota enforcement. For proactive token management: deploy a tokenizer in Lambda to estimate token count BEFORE sending to Bedrock, publish custom metrics to CloudWatch, set alarms on thresholds, and track per-team usage in DynamoDB. Mental model: Guardrails = content police. Token management = accounting department. Different departments.


Quick Pattern Recognition

Scenario Keywords → Answer
"minimize development effort" + RAG Bedrock Knowledge Bases
"multiple models, one integration" Converse API
"long-running API call" + agent Return of Control
"multi-agent, supervisor" Agent Squad
"non-real-time, reduce cost" Batch Inference
"same system prompt, many requests" Prompt Caching
"human review, low confidence" Amazon A2I
"clarification workflow, wait for user" Step Functions Standard + Wait for Callback
"conversation history + scale + encrypt" DynamoDB on-demand + AWS KMS
"block topics + reduce hallucination" Denied Topics + Contextual Grounding
"text + image search" Titan Multimodal Embeddings
"enterprise employees, internal docs, SSO" Amazon Q Business
"custom agent, memory, identity, events" AgentCore
"near-identical queries, reduce cost" Semantic caching (vector-based)
"real-time voice AI" Transcribe streaming + InvokeModelWithResponseStream + WebSocket
"React + streaming" Amplify AI Kit
"approved libraries for developers" Q Developer customizations
"dynamic config, feature flags" AWS AppConfig
"multi-hop entity relationships" Graph RAG + Neptune Analytics
"cross-account column-level access" Lake Formation
"data lineage, traceability" AWS Glue Data Catalog + CloudTrail
"parallel analysis tasks" Step Functions Parallel state
"unpredictable/spiky traffic" On-demand (already optimal)
"evaluate summarization quality" ROUGE
"evaluate translation quality" BLEU
"evaluate semantic similarity" BERTScore
"RAG answer grounded in source?" Faithfulness (RAGAS)
"enforce JSON output format" System prompt + tool_use / Lambda validation
"track AI content origin" Invocation logging + provenance metadata
"no-code prompt pipeline" Bedrock Flows
"minimize operational overhead" + RAG Bedrock-native (Knowledge Bases, Agents) over LangChain

Wrong Answer Patterns (Reliable Anti-Patterns)

  • Amazon S3 for real-time conversation lookups
  • Amazon ElastiCache alone for compliance-grade storage
  • Amazon RDS for session data at scale
  • Express Workflows for human-in-the-loop
  • API Gateway canary alone (without metric checks + rollback)
  • NAT gateway for "no internet" requirements
  • Fine-tuning for frequently-changing knowledge
  • Separate accounts per team for model access control
  • Guardrails for bias measurement
  • CloudTrail alone for prompt/response auditing

From the actual exam

Three things I didn't expect to be as heavily tested:

AWS AppConfig came up in feature-flag and dynamic configuration scenarios: controlling which model variant or guardrail profile an application uses without redeployment. It's easy to skip in a GenAI study pass because it reads like a general ops topic, but it appeared repeatedly in agent and deployment questions.

PII redaction had more coverage than the domain breakdown suggests. The ANONYMIZE vs BLOCK distinction came up in multiple contexts, and the exam specifically tests the difference between Guardrails PII (applied at inference time, on model I/O) and Lambda-based pre-processing (applied before ingestion, on source documents). They're not interchangeable, and the scenario usually makes clear which layer is the right one.

Model Evaluation was the heaviest single topic in the actual exam. Domain 5 is weighted at 11%, but evaluation scenarios appear in Domain 1 questions about choosing between models and validating RAG pipelines, and in Domain 4 questions about proving cost-quality tradeoffs. Don't de-prioritize it based on the domain percentage alone.

Top comments (0)